[ https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054000#comment-15054000 ]
Joseph K. Bradley commented on SPARK-7131: ------------------------------------------ Yes, I'm sorry about how long this has taken, but I have enough confidence in the API now proceed. I've created a JIRA for doing this in the next release: [SPARK-12301], though I may not be able to look at this issue until January. Please post your thoughts there, and ping in early January if there is no activity. Thank you! > Move tree,forest implementation from spark.mllib to spark.ml > ------------------------------------------------------------ > > Key: SPARK-7131 > URL: https://issues.apache.org/jira/browse/SPARK-7131 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Affects Versions: 1.4.0 > Reporter: Joseph K. Bradley > Assignee: Joseph K. Bradley > Fix For: 1.5.0 > > Original Estimate: 168h > Remaining Estimate: 168h > > We want to change and improve the spark.ml API for trees and ensembles, but > we cannot change the old API in spark.mllib. To support the changes we want > to make, we should move the implementation from spark.mllib to spark.ml. We > will generalize and modify it, but will also ensure that we do not change the > behavior of the old API. > There are several steps to this: > 1. Copy the implementation over to spark.ml and change the spark.ml classes > to use that implementation, rather than calling the spark.mllib > implementation. The current spark.ml tests will ensure that the 2 > implementations learn exactly the same models. Note: This should include > performance testing to make sure the updated code does not have any > regressions. --> *UPDATE*: I have run tests using spark-perf, and there were > no regressions. > 2. Remove the spark.mllib implementation, and make the spark.mllib APIs > wrappers around the spark.ml implementation. The spark.ml tests will again > ensure that we do not change any behavior. > 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to > verify model equivalence. > This JIRA is now for step 1 only. Steps 2 and 3 will be in separate JIRAs. > After these updates, we can more safely generalize and improve the spark.ml > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org