[ https://issues.apache.org/jira/browse/SPARK-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334721#comment-15334721 ]
Joseph K. Bradley commented on SPARK-15767: ------------------------------------------- [~vectorijk] Notes from sync: Can you please write more about the possible APIs? I'd like to do a comparison of: * the rpart API * the MLlib DecisionTreeClassifier and DecisionTreeRegressor APIs The comparison should list all parameters and their meaning. The idea is to figure out which of the following we can do: * Best option: Mimic rpart exactly so that R users can switch to spark.rpart easily * Worst option: Sort of mimic rpart, but not exactly because of a difference in functionality, such as new parameters from MLlib or differences in behavior. * Medium option: Avoid rpart API, and instead offer APIs matching DecisionTreeClassifier and DecisionTreeRegressor in the Scala/Java/Python APIs > Decision Tree Regression wrapper in SparkR > ------------------------------------------ > > Key: SPARK-15767 > URL: https://issues.apache.org/jira/browse/SPARK-15767 > Project: Spark > Issue Type: New Feature > Components: ML, SparkR > Reporter: Kai Jiang > Assignee: Kai Jiang > > Implement a wrapper in SparkR to support decision tree regression. R's naive > Decision Tree Regression implementation is from package rpart with signature > rpart(formula, dataframe, method="anova"). I propose we could implement API > like spark.decisionTreeRegression(dataframe, formula, ...) . After having > implemented decision tree classification, we could refactor this two into an > API more like rpart() -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org