Hi list!
My problem is quite simple.
I need to access several S3 buckets, using different credentials.:
```
val c1 =
sc.textFile(s3n://[ACCESS_KEY_ID1:SECRET_ACCESS_KEY1]@bucket/file1.csv).count
val c2 =
sc.textFile(s3n://[ACCESS_KEY_ID2:SECRET_ACCESS_KEY2]@bucket/file1.csv).count
val c3 =
Hi list!
My problem is quite simple.
I need to access several S3 buckets, using different credentials.:
```
val c1 =
sc.textFile(s3n://[ACCESS_KEY_ID1:SECRET_ACCESS_KEY1]@bucket1/file.csv).count
val c2 =
sc.textFile(s3n://[ACCESS_KEY_ID2:SECRET_ACCESS_KEY2]@bucket2/file.csv).count
val c3 =
Using Spark 1.2.0, we are facing some weird behaviour when performing self
join on a table with some ArrayType field.
(potential bug ?)
I have set up a minimal non working example here:
https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f
Same problem here...
Did u find a solution for this?
P.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ScalaReflectionException-when-using-saveAsParquetFile-in-sbt-tp21020p21150.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I found this, which might be useful:
https://github.com/deanwampler/spark-workshop/blob/master/project/Build.scala
I seems that forking is needed.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/MissingRequirementError-with-spark-tp21149p21153.html
Sent
Hi!
The RANK function is available in hive since version 0.11.
When trying to use it in SparkSQL, I'm getting the following exception (full
stacktrace below):
java.lang.ClassCastException:
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$RankBuffer cannot be
cast to
Ok thanks Michael.
In general, what's the easy way to figure out what's already implemented?
The exception I was getting was not really helpful here?
Also, is there a roadmap document somewhere ?
Thanks!
P.
--
View this message in context:
Is it planned in a near future ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-custom-aggregation-function-UDAF-tp15784p16275.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
To add a bit on this one, if you look at RDD.scala in Spark code, you'll see
that both parent and firstParent methods are protected[spark].
I guess, for good reasons, that I must admit I don't understand completely,
you are not supposed to explore an RDD lineage programmatically...
I had a
Hi there!
1/ Is there a way to convert a SchemaRDD (for instance loaded from a parquet
file) back to a RDD of a given case class?
2/ Even better, is there a way to get the schema information from a
SchemaRDD ? I am trying to figure out how to properly get the various fields
of the Rows of a
Cool Thanks Michael!
Message sent from a mobile device - excuse typos and abbreviations
Le 8 juil. 2014 à 22:17, Michael Armbrust [via Apache Spark User List]
ml-node+s1001560n9084...@n3.nabble.com a écrit :
On Tue, Jul 8, 2014 at 12:43 PM, Pierre B [hidden email] wrote:
1/ Is there a way
Hi Michaël,
Thanks for this. We could indeed do that.
But I guess the question is more about the change of behaviour from 0.9.1 to
1.0.0.
We never had to care about that in previous versions.
Does that mean we have to manually remove existing files or is there a way
to aumotically overwrite
Hi all!
We'be been using the sbt-pack sbt plugin
(https://github.com/xerial/sbt-pack) for building our standalone Spark
application for a while now. Until version 1.0.0, that worked nicely.
For those who don't know the sbt-pack plugin, it basically copies all the
dependencies JARs from your
I was annoyed by this as well.
It appears that just permuting the order of decencies inclusion solves this
problem:
first spark, than your cdh hadoop distro.
HTH,
Pierre
--
View this message in context:
Hi everyone!
Any recommendation anyone?
Pierre
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Summit-2014-Hotel-suggestions-tp5457p6424.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
On May 22, 2014, at 8:02 AM, Pierre B [hidden email] wrote:
Hi Andy!
Yes Spark UI provides a lot of interesting informations for debugging
purposes.
Here I’m trying to integrate a simple progress monitoring in my app ui.
I’m typically running a few “jobs” (or rather actions), and I’d
information in a somewhat arbitrary format and will be deprecated soon. If
you find this feature useful, you can test it out by building the master
branch of Spark yourself, following the instructions in
https://github.com/apache/spark/pull/42.
On 05/22/2014 08:51 AM, Pierre B wrote
Is there a simple way to monitor the overall progress of an action using
SparkListener or anything else?
I see that one can name an RDD... Could that be used to determine which
action triggered a stage, ... ?
Thanks
Pierre
--
View this message in context:
,
aℕdy ℙetrella
about.me/noootsab
On Thu, May 22, 2014 at 4:51 PM, Pierre B [hidden email] wrote:
Is there a simple way to monitor the overall progress of an action using
SparkListener or anything else?
I see that one can name an RDD... Could that be used to determine which
action
Hi!
I understand the usual Task not serializable issue that arises when
accessing a field or a method that is out of scope of a closure.
To fix it, I usually define a local copy of these fields/methods, which
avoids the need to serialize the whole class:
class MyClass(val myField: Any) {
def
20 matches
Mail list logo