Hello,
I have been able to use Spark with Apache Ranger. I had the right
configuration files to Spark conf, I add Ranger jars to the classpath and
it works, Spark complies to Ranger rules when I access Hive tables.
However with SparkR it does not work, which is rather surprising
considering
Hello,
I am facing an issue with partitionBy, it is not clear whether it is a
problem with my code or with my spark setup. I am using Spark 1.1,
standalone, and my other spark projects work fine.
So I have to repartition a relatively large file (about 70 million lines).
Here is a minimal version
Just use flatMap, it does exactly what you need:
newLines.flatMap { lines = lines }.saveAsTextFile(...)
2014-09-21 11:26 GMT+02:00 Sarath Chandra
sarathchandra.jos...@algofusiontech.com:
Hi All,
If my RDD is having array/sequence of strings, how can I save them as a
HDFS file with each
Hello,
I have been using Spark for quite some time, and I now get this error
(please stderr output below) when accessing hdfs. It seems to come from
Hadoop, however, I can access hdfs from the command line without any
problem.
The WARN on the first seems to be key, because it never appeared
Hello,
I am facing performance issues with reduceByKey. In know that this topic
has already been covered but I did not really find answers to my question.
I am using reduceByKey to remove entries with identical keys, using, as
reduce function, (a,b) = a. It seems to be a relatively
, .keys.distinct() should be
much better.
On Sat, Sep 13, 2014 at 10:46 AM, Julien Carme julien.ca...@gmail.com
wrote:
Hello,
I am facing performance issues with reduceByKey. In know that this topic
has
already been covered but I did not really find answers to my question.
I am
too.
On Sat, Sep 13, 2014 at 1:15 PM, Gary Malouf malouf.g...@gmail.com
wrote:
You need something like:
val x: RDD[MyAwesomeObject]
x.map(obj = obj.fieldtobekey - obj).reduceByKey { case (l, _) = l }
Does that make sense?
On Sat, Sep 13, 2014 at 7:28 AM, Julien Carme julien.ca