I suggest taking a heap dump of driver process using jmap. Then open that
dump in a tool like Visual VM to see which object(s) are taking up heap
space. It is easy to do. We did this and found out that in our case it was
the data structure that stores info about stages, jobs and tasks. There can
happening?
Andrew
From: Ashish Rangole
Date: Thursday, 27 August 2015 15:24
To: Andrew Rowson
Cc: user, ewan.le...@realitymine.com
Subject: Re: Driver running out of memory - caused by many tasks?
I suggest taking a heap dump of driver process using jmap. Then open that dump
in a tool like
Are you using the Kryo serializer? If not, have a look at it, it can save a lot
of memory during shuffles
https://spark.apache.org/docs/latest/tuning.html
I did a similar task and had various issues with the volume of data being
parsed in one go, but that helped a lot. It looks like the main
I should have mentioned: yes I am using Kryo and have registered KeyClass and
ValueClass.
I guess it’s not clear to me what is actually taking up space on the driver
heap - I can’t see how it can be data with the code that I have.
On 27/08/2015 12:09, Ewan Leith ewan.le...@realitymine.com