[ https://issues.apache.org/jira/browse/SPARK-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192166#comment-16192166 ]
Apache Spark commented on SPARK-3162: ------------------------------------- User 'smurching' has created a pull request for this issue: https://github.com/apache/spark/pull/19433 > Train DecisionTree locally when possible > ---------------------------------------- > > Key: SPARK-3162 > URL: https://issues.apache.org/jira/browse/SPARK-3162 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Joseph K. Bradley > Priority: Critical > > Improvement: communication > Currently, every level of a DecisionTree is trained in a distributed manner. > However, at deeper levels in the tree, it is possible that a small set of > training data will be matched with any given node. If the node’s training > data can fit on one machine’s memory, it may be more efficient to shuffle the > data and do local training for the rest of the subtree rooted at that node. > Note: It is possible that local training would become possible at different > levels in different branches of the tree. There are multiple options for > handling this case: > (1) Train in a distributed fashion until all remaining nodes can be trained > locally. This would entail training multiple levels at once (locally). > (2) Train branches locally when possible, and interleave this with > distributed training of the other branches. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org