Thanks Matei. We have tested the fix and it's working perfectly.
Andrew, we set spark.shuffle.spill=false but the application goes out of
memory. I think that is expected.
Regards,Ajay
On Friday, June 6, 2014 3:49 AM, Andrew Ash and...@andrewash.com wrote:
Hi Ajay,
Can you please try
Sorry for replying late. It was night here.
Lian/Matei,
Here is the code snippet -
sparkConf.set(spark.executor.memory, 10g)
sparkConf.set(spark.cores.max, 5)
val sc = new SparkContext(sparkConf)
val accId2LocRDD =
Hey Ajay, thanks for reporting this. There was indeed a bug, specifically in
the way join tasks spill to disk (which happened when you had more concurrent
tasks competing for memory). I’ve posted a patch for it here:
https://github.com/apache/spark/pull/986. Feel free to try that if you’d like;
Hi Ajay,
Can you please try running the same code with spark.shuffle.spill=false and
see if the numbers turn out correctly? That parameter controls whether or
not the buggy code that Matei fixed in ExternalAppendOnlyMap is used.
FWIW I saw similar issues in 0.9.0 but no longer in 0.9.1 after I
Hi,
I am doing join of two RDDs which giving different results ( counting number of
records ) each time I run this code on same input.
The input files are large enough to be divided in two splits. When the program
runs on two workers with single core assigned to these, output is consistent
Hi Ajay, would you mind to synthesise a minimum code snippet that can
reproduce this issue and paste it here?
On Wed, Jun 4, 2014 at 8:32 PM, Ajay Srivastava a_k_srivast...@yahoo.com
wrote:
Hi,
I am doing join of two RDDs which giving different results ( counting
number of records ) each
Maybe your two workers have different assembly jar files?
I just ran into a similar problem that my spark-shell is using a different
jar file than my workers - got really confusing results.
On Jun 4, 2014 8:33 AM, Ajay Srivastava a_k_srivast...@yahoo.com wrote:
Hi,
I am doing join of two RDDs
If this isn’t the problem, it would be great if you can post the code for the
program.
Matei
On Jun 4, 2014, at 12:58 PM, Xu (Simon) Chen xche...@gmail.com wrote:
Maybe your two workers have different assembly jar files?
I just ran into a similar problem that my spark-shell is using a