Hi Xiangrui,
(2014/07/16 15:05), Xiangrui Meng wrote:
I don't remember I wrote that but thanks for bringing this issue up!
There are two important settings to check: 1) driver memory (you can
see it from the executor tab), 2) number of partitions (try to use
small number of partitions). I put two PRs to fix the problem:
For the driver memory, I used 16GB/24GB and it was enough for the
execution (full GC was not happen). I check it by using jmap and top
command.
BTW, I was faced that the required memory for driver was oddly
proportional to # of tasks/executors. When I used 8GB for the driver
memory, I got OOM in the task serialization. It could be considered as a
possible memory leak in the task serialization to be addressed in the
future.
Each task size is about 24MB and # of tasks/executors is 280.
The size of each task result was about 120MB or so.
> 1) use broadcast in task closure:
https://github.com/apache/spark/pull/1427
Does this PR reduce the required memory for the driver?
Is there a big difference in explicit broadcast of feature weights and
implicit task serialization including feature weights?
> 2) use treeAggregate to get the result:
> https://github.com/apache/spark/pull/1110
treeAggregate would reduce the time for aggregation and the required
memory of a driver for sure. I would test it.
However, the problem that I am facing now is an akka connection issue on
GC, or under heavy loads. And thus, I think the problem is lurking
behind even though the consumed memory size is reduced by treeAggregate.
Best,
Makoto