Re: akka disassociated on GC

2014-07-23 Thread Xiangrui Meng
Yes, that's the plan. If you use broadcast, please also make sure TorrentBroadcastFactory is used, which became the default broadcast factory very recently. -Xiangrui On Tue, Jul 22, 2014 at 10:47 PM, Makoto Yui yuin...@gmail.com wrote: Hi Xiangrui, By using your treeAggregate and broadcast

Re: akka disassociated on GC

2014-07-22 Thread Makoto Yui
Hi Xiangrui, By using your treeAggregate and broadcast patch, the evaluation has been processed successfully. I expect that these patches are merged in the next major release (v1.1?). Without them, it would be hard to use mllib for a large dataset. Thanks, Makoto (2014/07/16 15:05),

Re: akka disassociated on GC

2014-07-16 Thread Xiangrui Meng
Hi Makoto, I don't remember I wrote that but thanks for bringing this issue up! There are two important settings to check: 1) driver memory (you can see it from the executor tab), 2) number of partitions (try to use small number of partitions). I put two PRs to fix the problem: 1) use broadcast

Re: akka disassociated on GC

2014-07-16 Thread Makoto Yui
Hi Xiangrui, (2014/07/16 15:05), Xiangrui Meng wrote: I don't remember I wrote that but thanks for bringing this issue up! There are two important settings to check: 1) driver memory (you can see it from the executor tab), 2) number of partitions (try to use small number of partitions). I put

akka disassociated on GC

2014-07-15 Thread Makoto Yui
Hello, (2014/06/19 23:43), Xiangrui Meng wrote: The execution was slow for more large KDD cup 2012, Track 2 dataset (235M+ records of 16.7M+ (2^24) sparse features in about 33.6GB) due to the sequential aggregation of dense vectors on a single driver node. It took about 7.6m for aggregation