Using Spark, SparkR and Ranger, please help.

2016-01-20 Thread Julien Carme
Hello, I have been able to use Spark with Apache Ranger. I had the right configuration files to Spark conf, I add Ranger jars to the classpath and it works, Spark complies to Ranger rules when I access Hive tables. However with SparkR it does not work, which is rather surprising considering

Issues with partitionBy: FetchFailed

2014-09-21 Thread Julien Carme
Hello, I am facing an issue with partitionBy, it is not clear whether it is a problem with my code or with my spark setup. I am using Spark 1.1, standalone, and my other spark projects work fine. So I have to repartition a relatively large file (about 70 million lines). Here is a minimal version

Re: Saving RDD with array of strings

2014-09-21 Thread Julien Carme
Just use flatMap, it does exactly what you need: newLines.flatMap { lines = lines }.saveAsTextFile(...) 2014-09-21 11:26 GMT+02:00 Sarath Chandra sarathchandra.jos...@algofusiontech.com: Hi All, If my RDD is having array/sequence of strings, how can I save them as a HDFS file with each

Strange exception while accessing hdfs from spark.

2014-09-18 Thread Julien Carme
Hello, I have been using Spark for quite some time, and I now get this error (please stderr output below) when accessing hdfs. It seems to come from Hadoop, however, I can access hdfs from the command line without any problem. The WARN on the first seems to be key, because it never appeared

ReduceByKey performance optimisation

2014-09-13 Thread Julien Carme
Hello, I am facing performance issues with reduceByKey. In know that this topic has already been covered but I did not really find answers to my question. I am using reduceByKey to remove entries with identical keys, using, as reduce function, (a,b) = a. It seems to be a relatively

Re: ReduceByKey performance optimisation

2014-09-13 Thread Julien Carme
, .keys.distinct() should be much better. On Sat, Sep 13, 2014 at 10:46 AM, Julien Carme julien.ca...@gmail.com wrote: Hello, I am facing performance issues with reduceByKey. In know that this topic has already been covered but I did not really find answers to my question. I am

Re: ReduceByKey performance optimisation

2014-09-13 Thread Julien Carme
too. On Sat, Sep 13, 2014 at 1:15 PM, Gary Malouf malouf.g...@gmail.com wrote: You need something like: val x: RDD[MyAwesomeObject] x.map(obj = obj.fieldtobekey - obj).reduceByKey { case (l, _) = l } Does that make sense? On Sat, Sep 13, 2014 at 7:28 AM, Julien Carme julien.ca