Re: Handling tree reduction algorithm with Spark in parallel

2014-10-03 Thread Boromir Widas
Thanks Matei, will check out the MLLib implementation. On Wed, Oct 1, 2014 at 2:24 PM, Andy Twigg andy.tw...@gmail.com wrote: Yes, that makes sense. It's similar to the all reduce pattern in vw. On Wednesday, 1 October 2014, Matei Zaharia matei.zaha...@gmail.com wrote: Some of the MLlib

Re: Handling tree reduction algorithm with Spark in parallel

2014-10-01 Thread Andy Twigg
Yes, that makes sense. It's similar to the all reduce pattern in vw. On Wednesday, 1 October 2014, Matei Zaharia matei.zaha...@gmail.com wrote: Some of the MLlib algorithms do tree reduction in 1.1: http://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html. You can

Handling tree reduction algorithm with Spark in parallel

2014-09-30 Thread Boromir Widas
Hello Folks, I have been trying to implement a tree reduction algorithm recently in spark but could not find suitable parallel operations. Assuming I have a general tree like the following - I have to do the following - 1) Do some computation at each leaf node to get an array of doubles.(This

Re: Handling tree reduction algorithm with Spark in parallel

2014-09-30 Thread Andy Twigg
Hi Boromir, Assuming the tree fits in memory, and what you want to do is parallelize the computation, the 'obvious' way is the following: * broadcast the tree T to each worker (ok since it fits in memory) * construct an RDD for the deepest level - each element in the RDD is (parent,data_at_node)

Re: Handling tree reduction algorithm with Spark in parallel

2014-09-30 Thread Debasish Das
If the tree is too big build it on graphxbut it will need thorough analysis so that the partitions are well balanced... On Tue, Sep 30, 2014 at 2:45 PM, Andy Twigg andy.tw...@gmail.com wrote: Hi Boromir, Assuming the tree fits in memory, and what you want to do is parallelize the