[jira] [Comment Edited] (SPARK-3162) Train DecisionTree locally when possible

2017-10-04 Thread Siddharth Murching (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192201#comment-16192201
 ] 

Siddharth Murching edited comment on SPARK-3162 at 10/4/17 11:35 PM:
-

Commenting here to note that I'd like to resume work on this issue; I've made a 
new PR^


was (Author: siddharth murching):
Commenting here to note that I'm resuming work on this issue; I've made a new 
PR^

> Train DecisionTree locally when possible
> 
>
> Key: SPARK-3162
> URL: https://issues.apache.org/jira/browse/SPARK-3162
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> Improvement: communication
> Currently, every level of a DecisionTree is trained in a distributed manner.  
> However, at deeper levels in the tree, it is possible that a small set of 
> training data will be matched with any given node.  If the node’s training 
> data can fit on one machine’s memory, it may be more efficient to shuffle the 
> data and do local training for the rest of the subtree rooted at that node.
> Note: It is possible that local training would become possible at different 
> levels in different branches of the tree.  There are multiple options for 
> handling this case:
> (1) Train in a distributed fashion until all remaining nodes can be trained 
> locally.  This would entail training multiple levels at once (locally).
> (2) Train branches locally when possible, and interleave this with 
> distributed training of the other branches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3162) Train DecisionTree locally when possible

2016-08-23 Thread Siddharth Murching (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434031#comment-15434031
 ] 

Siddharth Murching edited comment on SPARK-3162 at 8/24/16 1:37 AM:


Here's a design doc with proposed changes - any comments/feedback are much 
appreciated :)
Design doc link: 
[Link|https://docs.google.com/document/d/1baU5KeorrmLpC4EZoqLuG-E8sUJqmdELLbr8o6wdbVM/edit?usp=sharing]



was (Author: siddharth murching):
Here's a design doc with proposed changes - any comments/feedback are much 
appreciated :)
[Link|https://docs.google.com/document/d/1baU5KeorrmLpC4EZoqLuG-E8sUJqmdELLbr8o6wdbVM/edit?usp=sharing]


> Train DecisionTree locally when possible
> 
>
> Key: SPARK-3162
> URL: https://issues.apache.org/jira/browse/SPARK-3162
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> Improvement: communication
> Currently, every level of a DecisionTree is trained in a distributed manner.  
> However, at deeper levels in the tree, it is possible that a small set of 
> training data will be matched with any given node.  If the node’s training 
> data can fit on one machine’s memory, it may be more efficient to shuffle the 
> data and do local training for the rest of the subtree rooted at that node.
> Note: It is possible that local training would become possible at different 
> levels in different branches of the tree.  There are multiple options for 
> handling this case:
> (1) Train in a distributed fashion until all remaining nodes can be trained 
> locally.  This would entail training multiple levels at once (locally).
> (2) Train branches locally when possible, and interleave this with 
> distributed training of the other branches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3162) Train DecisionTree locally when possible

2016-08-23 Thread Siddharth Murching (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434031#comment-15434031
 ] 

Siddharth Murching edited comment on SPARK-3162 at 8/24/16 1:37 AM:


Here's a design doc with proposed changes - any comments/feedback are much 
appreciated :)
[Link|https://docs.google.com/document/d/1baU5KeorrmLpC4EZoqLuG-E8sUJqmdELLbr8o6wdbVM/edit?usp=sharing]



was (Author: siddharth murching):
Here's a design doc with proposed changes - any comments/feedback are much 
appreciated :)
[Link](https://docs.google.com/document/d/1baU5KeorrmLpC4EZoqLuG-E8sUJqmdELLbr8o6wdbVM/edit?usp=sharing)


> Train DecisionTree locally when possible
> 
>
> Key: SPARK-3162
> URL: https://issues.apache.org/jira/browse/SPARK-3162
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> Improvement: communication
> Currently, every level of a DecisionTree is trained in a distributed manner.  
> However, at deeper levels in the tree, it is possible that a small set of 
> training data will be matched with any given node.  If the node’s training 
> data can fit on one machine’s memory, it may be more efficient to shuffle the 
> data and do local training for the rest of the subtree rooted at that node.
> Note: It is possible that local training would become possible at different 
> levels in different branches of the tree.  There are multiple options for 
> handling this case:
> (1) Train in a distributed fashion until all remaining nodes can be trained 
> locally.  This would entail training multiple levels at once (locally).
> (2) Train branches locally when possible, and interleave this with 
> distributed training of the other branches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3162) Train DecisionTree locally when possible

2016-08-23 Thread Siddharth Murching (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434031#comment-15434031
 ] 

Siddharth Murching edited comment on SPARK-3162 at 8/24/16 1:37 AM:


Here's a design doc with proposed changes - any comments/feedback are much 
appreciated :)
https://docs.google.com/document/d/1baU5KeorrmLpC4EZoqLuG-E8sUJqmdELLbr8o6wdbVM/edit?usp=sharing



was (Author: siddharth murching):
Here's a design doc with proposed changes - any comments/feedback are much 
appreciated :)
Design doc link: 
[Link|https://docs.google.com/document/d/1baU5KeorrmLpC4EZoqLuG-E8sUJqmdELLbr8o6wdbVM/edit?usp=sharing]


> Train DecisionTree locally when possible
> 
>
> Key: SPARK-3162
> URL: https://issues.apache.org/jira/browse/SPARK-3162
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> Improvement: communication
> Currently, every level of a DecisionTree is trained in a distributed manner.  
> However, at deeper levels in the tree, it is possible that a small set of 
> training data will be matched with any given node.  If the node’s training 
> data can fit on one machine’s memory, it may be more efficient to shuffle the 
> data and do local training for the rest of the subtree rooted at that node.
> Note: It is possible that local training would become possible at different 
> levels in different branches of the tree.  There are multiple options for 
> handling this case:
> (1) Train in a distributed fashion until all remaining nodes can be trained 
> locally.  This would entail training multiple levels at once (locally).
> (2) Train branches locally when possible, and interleave this with 
> distributed training of the other branches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org