Do you have a large number of tasks? This can happen if you have a large number of tasks and a small driver or if you use accumulators of lists like datastructures.
2015-12-11 11:17 GMT-08:00 Zhan Zhang <zzh...@hortonworks.com>: > I think you are fetching too many results to the driver. Typically, it is > not recommended to collect much data to driver. But if you have to, you can > increase the driver memory, when submitting jobs. > > Thanks. > > Zhan Zhang > > On Dec 11, 2015, at 6:14 AM, Tom Seddon <mr.tom.sed...@gmail.com> wrote: > > I have a job that is running into intermittent errors with [SparkDriver] > java.lang.OutOfMemoryError: Java heap space. Before I was getting this > error I was getting errors saying the result size exceed the > spark.driver.maxResultSize. > This does not make any sense to me, as there are no actions in my job that > send data to the driver - just a pull of data from S3, a map and > reduceByKey and then conversion to dataframe and saveAsTable action that > puts the results back on S3. > > I've found a few references to reduceByKey and spark.driver.maxResultSize > having some importance, but cannot fathom how this setting could be related. > > Would greatly appreciated any advice. > > Thanks in advance, > > Tom > > >