Yes, that's the plan. If you use broadcast, please also make sure
TorrentBroadcastFactory is used, which became the default broadcast
factory very recently. -Xiangrui
On Tue, Jul 22, 2014 at 10:47 PM, Makoto Yui yuin...@gmail.com wrote:
Hi Xiangrui,
By using your treeAggregate and broadcast
Hi Xiangrui,
By using your treeAggregate and broadcast patch, the evaluation has been
processed successfully.
I expect that these patches are merged in the next major release
(v1.1?). Without them, it would be hard to use mllib for a large dataset.
Thanks,
Makoto
(2014/07/16 15:05),
Hi Makoto,
I don't remember I wrote that but thanks for bringing this issue up!
There are two important settings to check: 1) driver memory (you can
see it from the executor tab), 2) number of partitions (try to use
small number of partitions). I put two PRs to fix the problem:
1) use broadcast
Hi Xiangrui,
(2014/07/16 15:05), Xiangrui Meng wrote:
I don't remember I wrote that but thanks for bringing this issue up!
There are two important settings to check: 1) driver memory (you can
see it from the executor tab), 2) number of partitions (try to use
small number of partitions). I put
Hello,
(2014/06/19 23:43), Xiangrui Meng wrote:
The execution was slow for more large KDD cup 2012, Track 2 dataset (235M+
records of 16.7M+ (2^24) sparse features in about 33.6GB) due to the sequential
aggregation of dense vectors on a single driver node.
It took about 7.6m for aggregation