I'm seeing a similar thing with a slightly different stack trace. Ideas?
org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:150)
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
It could also be that your hash function is expensive. What is the key class
you have for the reduceByKey / groupByKey?
Matei
On May 12, 2015, at 10:08 AM, Night Wolf nightwolf...@gmail.com wrote:
I'm seeing a similar thing with a slightly different stack trace. Ideas?
This is the stack trace of the worker thread:
org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:150)
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
Looks like it is spending a lot of time doing hash probing. It could be a
number of the following:
1. hash probing itself is inherently expensive compared with rest of your
workload
2. murmur3 doesn't work well with this key distribution
3. quadratic probing (triangular sequence) with a
Do you have any more specific profiling data that you can share? I'm
curious to know where AppendOnlyMap.changeValue is being called from.
On Fri, May 8, 2015 at 1:26 PM, Michal Haris michal.ha...@visualdna.com
wrote:
+dev
On 6 May 2015 10:45, Michal Haris michal.ha...@visualdna.com wrote: