[ https://issues.apache.org/jira/browse/SPARK-3158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163223#comment-14163223 ]
Apache Spark commented on SPARK-3158: ------------------------------------- User 'chouqin' has created a pull request for this issue: https://github.com/apache/spark/pull/2708 > Avoid 1 extra aggregation for DecisionTree training > --------------------------------------------------- > > Key: SPARK-3158 > URL: https://issues.apache.org/jira/browse/SPARK-3158 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Joseph K. Bradley > > Improvement: computation > Currently, the implementation does one unnecessary aggregation step. The > aggregation step for level L (to choose splits) gives enough information to > set the predictions of any leaf nodes at level L+1. We can use that info and > skip the aggregation step for the last level of the tree (which only has leaf > nodes). > This update could be done by: > * allocating a root node before the loop in the main train() method > * allocating nodes for level L+1 while choosing splits for level L > * caching stats in these newly allocated nodes, so that we can calculate > predictions if we know they will be leaves > * DecisionTree.findBestSplits can just return doneTraining > This will let us cache impurity and avoid re-calculating it in > calculateGainForSplit. > Some above notes were copied from discussion in > [https://github.com/apache/spark/pull/2341] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org