Re: OOM in SizeEstimator while using combineByKey

2015-04-15 Thread Xianjin YE
what is your JVM heap size settings? The OOM in SIzeEstimator is caused by a lot of entry in IdentifyHashMap. A quick guess is that the object in your dataset is a custom class and you didn't implement the hashCode and equals method correctly. On Wednesday, April 15, 2015 at 3:10 PM,

OOM in SizeEstimator while using combineByKey

2015-04-15 Thread Aniket Bhatnagar
I am aggregating a dataset using combineByKey method and for a certain input size, the job fails with the following error. I have enabled head dumps to better analyze the issue and will report back if I have any findings. Meanwhile, if you guys have any idea of what could possibly result in this

Re: OOM in SizeEstimator while using combineByKey

2015-04-15 Thread Aniket Bhatnagar
I am setting spark.executor.memory as 1024m on a 3 node cluster with each node having 4 cores and 7 GB RAM. The combiner functions are taking scala case classes as input and are generating mutable.ListBuffer of scala case classes. Therefore, I am guessing hashCode and equals should be taken care