Access several s3 buckets, with credentials containing /

2015-06-05 Thread Pierre B
Hi list! My problem is quite simple. I need to access several S3 buckets, using different credentials.: ``` val c1 = sc.textFile(s3n://[ACCESS_KEY_ID1:SECRET_ACCESS_KEY1]@bucket/file1.csv).count val c2 = sc.textFile(s3n://[ACCESS_KEY_ID2:SECRET_ACCESS_KEY2]@bucket/file1.csv).count val c3 =

Access several s3 buckets, with credentials containing /

2015-06-05 Thread Pierre B
Hi list! My problem is quite simple. I need to access several S3 buckets, using different credentials.: ``` val c1 = sc.textFile(s3n://[ACCESS_KEY_ID1:SECRET_ACCESS_KEY1]@bucket1/file.csv).count val c2 = sc.textFile(s3n://[ACCESS_KEY_ID2:SECRET_ACCESS_KEY2]@bucket2/file.csv).count val c3 =

[SQL] Self join with ArrayType columns problems

2015-01-26 Thread Pierre B
Using Spark 1.2.0, we are facing some weird behaviour when performing self join on a table with some ArrayType field. (potential bug ?) I have set up a minimal non working example here: https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f

Re: ScalaReflectionException when using saveAsParquetFile in sbt

2015-01-15 Thread Pierre B
Same problem here... Did u find a solution for this? P. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ScalaReflectionException-when-using-saveAsParquetFile-in-sbt-tp21020p21150.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: MissingRequirementError with spark

2015-01-15 Thread Pierre B
I found this, which might be useful: https://github.com/deanwampler/spark-workshop/blob/master/project/Build.scala I seems that forking is needed. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MissingRequirementError-with-spark-tp21149p21153.html Sent

[SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Pierre B
Hi! The RANK function is available in hive since version 0.11. When trying to use it in SparkSQL, I'm getting the following exception (full stacktrace below): java.lang.ClassCastException: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$RankBuffer cannot be cast to

Re: [SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Pierre B
Ok thanks Michael. In general, what's the easy way to figure out what's already implemented? The exception I was getting was not really helpful here? Also, is there a roadmap document somewhere ? Thanks! P. -- View this message in context:

Re: Spark SQL - custom aggregation function (UDAF)

2014-10-13 Thread Pierre B
Is it planned in a near future ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-custom-aggregation-function-UDAF-tp15784p16275.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Is there a way to look at RDD's lineage? Or debug a fault-tolerance error?

2014-10-09 Thread Pierre B
To add a bit on this one, if you look at RDD.scala in Spark code, you'll see that both parent and firstParent methods are protected[spark]. I guess, for good reasons, that I must admit I don't understand completely, you are not supposed to explore an RDD lineage programmatically... I had a

[Spark SQL]: Convert SchemaRDD back to RDD

2014-07-08 Thread Pierre B
Hi there! 1/ Is there a way to convert a SchemaRDD (for instance loaded from a parquet file) back to a RDD of a given case class? 2/ Even better, is there a way to get the schema information from a SchemaRDD ? I am trying to figure out how to properly get the various fields of the Rows of a

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2014-07-08 Thread Pierre B
Cool Thanks Michael! Message sent from a mobile device - excuse typos and abbreviations Le 8 juil. 2014 à 22:17, Michael Armbrust [via Apache Spark User List] ml-node+s1001560n9084...@n3.nabble.com a écrit : On Tue, Jul 8, 2014 at 12:43 PM, Pierre B [hidden email] wrote: 1/ Is there a way

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Pierre B
Hi Michaël, Thanks for this. We could indeed do that. But I guess the question is more about the change of behaviour from 0.9.1 to 1.0.0. We never had to care about that in previous versions. Does that mean we have to manually remove existing files or is there a way to aumotically overwrite

Using sbt-pack with Spark 1.0.0

2014-06-01 Thread Pierre B
Hi all! We'be been using the sbt-pack sbt plugin (https://github.com/xerial/sbt-pack) for building our standalone Spark application for a while now. Until version 1.0.0, that worked nicely. For those who don't know the sbt-pack plugin, it basically copies all the dependencies JARs from your

Re: SparkContext startup time out

2014-05-30 Thread Pierre B
I was annoyed by this as well. It appears that just permuting the order of decencies inclusion solves this problem: first spark, than your cdh hadoop distro. HTH, Pierre -- View this message in context:

Re: Spark Summit 2014 (Hotel suggestions)

2014-05-27 Thread Pierre B
Hi everyone! Any recommendation anyone? Pierre -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Summit-2014-Hotel-suggestions-tp5457p6424.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Use SparkListener to get overall progress of an action

2014-05-23 Thread Pierre B
On May 22, 2014, at 8:02 AM, Pierre B [hidden email] wrote: Hi Andy! Yes Spark UI provides a lot of interesting informations for debugging purposes. Here I’m trying to integrate a simple progress monitoring in my app ui. I’m typically running a few “jobs” (or rather actions), and I’d

Re: Use SparkListener to get overall progress of an action

2014-05-23 Thread Pierre B
information in a somewhat arbitrary format and will be deprecated soon. If you find this feature useful, you can test it out by building the master branch of Spark yourself, following the instructions in https://github.com/apache/spark/pull/42. On 05/22/2014 08:51 AM, Pierre B wrote

Use SparkListener to get overall progress of an action

2014-05-22 Thread Pierre B
Is there a simple way to monitor the overall progress of an action using SparkListener or anything else? I see that one can name an RDD... Could that be used to determine which action triggered a stage, ... ? Thanks Pierre -- View this message in context:

Re: Use SparkListener to get overall progress of an action

2014-05-22 Thread Pierre B
, aℕdy ℙetrella about.me/noootsab On Thu, May 22, 2014 at 4:51 PM, Pierre B [hidden email] wrote: Is there a simple way to monitor the overall progress of an action using SparkListener or anything else? I see that one can name an RDD... Could that be used to determine which action

Nested method in a class: Task not serializable?

2014-05-16 Thread Pierre B
Hi! I understand the usual Task not serializable issue that arises when accessing a field or a method that is out of scope of a closure. To fix it, I usually define a local copy of these fields/methods, which avoids the need to serialize the whole class: class MyClass(val myField: Any) { def