[jira] [Commented] (SPARK-12326) Move GBT implementation from spark.mllib to spark.ml

2016-03-29 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216999#comment-15216999
 ] 

Seth Hendrickson commented on SPARK-12326:
--

[~josephkb] When moving the helper classes to ML, I agree it will be good to 
make classes private where possible. However,  I am not sure what you mean by 
"change the APIs." Also, could you give an example of what you had in mind as 
far as eliminating duplicate data stored in the final model? Thanks!

> Move GBT implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-12326
> URL: https://issues.apache.org/jira/browse/SPARK-12326
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>Assignee: Seth Hendrickson
>
> Several improvements can be made to gradient boosted trees, but are not 
> possible without moving the GBT implementation to spark.ml (e.g. 
> rawPrediction column, feature importance). This Jira is for moving the 
> current GBT implementation to spark.ml, which will have roughly the following 
> steps:
> 1. Copy the implementation to spark.ml and change spark.ml classes to use 
> that implementation. Current tests will ensure that the implementations learn 
> exactly the same models. 
> 2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, 
> InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since 
> eventually all tree implementations will reside in spark.ml, the helper 
> classes should as well.
> 3. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation. The spark.ml tests will again 
> ensure that we do not change any behavior.
> 4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12326) Move GBT implementation from spark.mllib to spark.ml

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060887#comment-15060887
 ] 

Joseph K. Bradley commented on SPARK-12326:
---

The plan sounds good.  The critical item is #1 of course since that will let us 
improve GBTs in spark.ml.

For #2, I'd also recommend we take this opportunity to make some of those 
helper classes private when possible (especially if they are only needed during 
training) and maybe change the APIs (especially if we can eliminate duplicate 
data stored in the final model).

Can you please make 1 subtask for each of these 4 steps? Thanks!

> Move GBT implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-12326
> URL: https://issues.apache.org/jira/browse/SPARK-12326
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Several improvements can be made to gradient boosted trees, but are not 
> possible without moving the GBT implementation to spark.ml (e.g. 
> rawPrediction column, feature importance). This Jira is for moving the 
> current GBT implementation to spark.ml, which will have roughly the following 
> steps:
> 1. Copy the implementation to spark.ml and change spark.ml classes to use 
> that implementation. Current tests will ensure that the implementations learn 
> exactly the same models. 
> 2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, 
> InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since 
> eventually all tree implementations will reside in spark.ml, the helper 
> classes should as well.
> 3. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation. The spark.ml tests will again 
> ensure that we do not change any behavior.
> 4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> Steps 2, 3, and 4 should be in separate Jiras. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12326) Move GBT implementation from spark.mllib to spark.ml

2015-12-14 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056715#comment-15056715
 ] 

Seth Hendrickson commented on SPARK-12326:
--

[~josephkb] Could you review the plan above? I couldn't find any other Jira for 
moving GBTs to ML and it seems like it would be good to get this done so we can 
move on some other improvements that are needed as well. Thanks!

> Move GBT implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-12326
> URL: https://issues.apache.org/jira/browse/SPARK-12326
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Seth Hendrickson
>
> Several improvements can be made to gradient boosted trees, but are not 
> possible without moving the GBT implementation to spark.ml (e.g. 
> rawPrediction column, feature importance). This Jira is for moving the 
> current GBT implementation to spark.ml, which will have roughly the following 
> steps:
> 1. Copy the implementation to spark.ml and change spark.ml classes to use 
> that implementation. Current tests will ensure that the implementations learn 
> exactly the same models. 
> 2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, 
> InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since 
> eventually all tree implementations will reside in spark.ml, the helper 
> classes should as well.
> 3. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation. The spark.ml tests will again 
> ensure that we do not change any behavior.
> 4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> Steps 2, 3, and 4 should be in separate Jiras. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org