What is the relationship between reduceByKey and spark.driver.maxResultSize?

Tom Seddon Fri, 11 Dec 2015 06:15:45 -0800

I have a job that is running into intermittent errors with  [SparkDriver]
java.lang.OutOfMemoryError: Java heap space.  Before I was getting this
error I was getting errors saying the result size exceed the
spark.driver.maxResultSize.
This does not make any sense to me, as there are no actions in my job that
send data to the driver - just a pull of data from S3, a map and
reduceByKey and then conversion to dataframe and saveAsTable action that
puts the results back on S3.


I've found a few references to reduceByKey and spark.driver.maxResultSize
having some importance, but cannot fathom how this setting could be related.

Would greatly appreciated any advice.

Thanks in advance,

Tom

What is the relationship between reduceByKey and spark.driver.maxResultSize?

Reply via email to