Qiping Li created SPARK-3920: -------------------------------- Summary: Add option to support aggregation using treeAggregate in decision tree Key: SPARK-3920 URL: https://issues.apache.org/jira/browse/SPARK-3920 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Qiping Li Fix For: 1.2.0
In [SPARK-3366|https://issues.apache.org/jira/browse/SPARK-3366], we used distribute aggregation to aggregate node stats, which can save computation and communication time when the shuffle size is very large. But experiments have shown that if shuffle size is not large enough(e.g, shallow trees), this will cause some performance loss(greater than 20% in some cases). We should support both options for aggregation so that user can choose a proper one based on their needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org