[ 
https://issues.apache.org/jira/browse/SPARK-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-2174.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1.0

> Implement treeReduce and treeAggregate
> --------------------------------------
>
>                 Key: SPARK-2174
>                 URL: https://issues.apache.org/jira/browse/SPARK-2174
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib, Spark Core
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>             Fix For: 1.1.0
>
>
> In `reduce` and `aggregate`, the driver node spends linear time on the number 
> of partitions. It becomes a bottleneck when there are many partitions and the 
> data from each partition is big.
> SPARK-1485 tracks the progress of implementing AllReduce on Spark. I did 
> several implementations including butterfly, reduce + broadcast, and 
> treeReduce + broadcast. treeReduce + BT broadcast seems to be right way to go 
> for Spark. Using binary tree may introduce some overhead in communication, 
> because the driver still need to coordinate on data shuffling. In my 
> experiments, n -> sqrt(n) -> 1 gives the best performance in general. But it 
> certainly needs more testing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to