Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread Ashish Rangole
I suggest taking a heap dump of driver process using jmap. Then open that dump in a tool like Visual VM to see which object(s) are taking up heap space. It is easy to do. We did this and found out that in our case it was the data structure that stores info about stages, jobs and tasks. There can

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread andrew.rowson
happening? Andrew From: Ashish Rangole Date: Thursday, 27 August 2015 15:24 To: Andrew Rowson Cc: user, ewan.le...@realitymine.com Subject: Re: Driver running out of memory - caused by many tasks? I suggest taking a heap dump of driver process using jmap. Then open that dump in a tool like

RE: Driver running out of memory - caused by many tasks?

2015-08-27 Thread Ewan Leith
Are you using the Kryo serializer? If not, have a look at it, it can save a lot of memory during shuffles https://spark.apache.org/docs/latest/tuning.html I did a similar task and had various issues with the volume of data being parsed in one go, but that helped a lot. It looks like the main

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread andrew.rowson
I should have mentioned: yes I am using Kryo and have registered KeyClass and ValueClass. I guess it’s not clear to me what is actually taking up space on the driver heap - I can’t see how it can be data with the code that I have. On 27/08/2015 12:09, Ewan Leith ewan.le...@realitymine.com