[ 
https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054000#comment-15054000
 ] 

Joseph K. Bradley commented on SPARK-7131:
------------------------------------------

Yes, I'm sorry about how long this has taken, but I have enough confidence in 
the API now proceed.  I've created a JIRA for doing this in the next release: 
[SPARK-12301], though I may not be able to look at this issue until January.  
Please post your thoughts there, and ping in early January if there is no 
activity.  Thank you!

> Move tree,forest implementation from spark.mllib to spark.ml
> ------------------------------------------------------------
>
>                 Key: SPARK-7131
>                 URL: https://issues.apache.org/jira/browse/SPARK-7131
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 1.4.0
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>             Fix For: 1.5.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We want to change and improve the spark.ml API for trees and ensembles, but 
> we cannot change the old API in spark.mllib.  To support the changes we want 
> to make, we should move the implementation from spark.mllib to spark.ml.  We 
> will generalize and modify it, but will also ensure that we do not change the 
> behavior of the old API.
> There are several steps to this:
> 1. Copy the implementation over to spark.ml and change the spark.ml classes 
> to use that implementation, rather than calling the spark.mllib 
> implementation.  The current spark.ml tests will ensure that the 2 
> implementations learn exactly the same models.  Note: This should include 
> performance testing to make sure the updated code does not have any 
> regressions. --> *UPDATE*: I have run tests using spark-perf, and there were 
> no regressions.
> 2. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation.  The spark.ml tests will again 
> ensure that we do not change any behavior.
> 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> This JIRA is now for step 1 only.  Steps 2 and 3 will be in separate JIRAs.
> After these updates, we can more safely generalize and improve the spark.ml 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to