Hi Xiangrui,

(2014/07/16 15:05), Xiangrui Meng wrote:
I don't remember I wrote that but thanks for bringing this issue up!
There are two important settings to check: 1) driver memory (you can
see it from the executor tab), 2) number of partitions (try to use
small number of partitions). I put two PRs to fix the problem:

For the driver memory, I used 16GB/24GB and it was enough for the execution (full GC was not happen). I check it by using jmap and top command.

BTW, I was faced that the required memory for driver was oddly proportional to # of tasks/executors. When I used 8GB for the driver memory, I got OOM in the task serialization. It could be considered as a possible memory leak in the task serialization to be addressed in the future.

Each task size is about 24MB and # of tasks/executors is 280.
The size of each task result was about 120MB or so.

> 1) use broadcast in task closure: https://github.com/apache/spark/pull/1427

Does this PR reduce the required memory for the driver?

Is there a big difference in explicit broadcast of feature weights and implicit task serialization including feature weights?

> 2) use treeAggregate to get the result:
> https://github.com/apache/spark/pull/1110

treeAggregate would reduce the time for aggregation and the required memory of a driver for sure. I would test it.

However, the problem that I am facing now is an akka connection issue on GC, or under heavy loads. And thus, I think the problem is lurking behind even though the consumed memory size is reduced by treeAggregate.

Best,
Makoto

Reply via email to