[ 
https://issues.apache.org/jira/browse/SPARK-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334721#comment-15334721
 ] 

Joseph K. Bradley commented on SPARK-15767:
-------------------------------------------

[~vectorijk] Notes from sync: Can you please write more about the possible 
APIs?  I'd like to do a comparison of:
* the rpart API
* the MLlib DecisionTreeClassifier and DecisionTreeRegressor APIs

The comparison should list all parameters and their meaning.  The idea is to 
figure out which of the following we can do:
* Best option: Mimic rpart exactly so that R users can switch to spark.rpart 
easily
* Worst option: Sort of mimic rpart, but not exactly because of a difference in 
functionality, such as new parameters from MLlib or differences in behavior.
* Medium option: Avoid rpart API, and instead offer APIs matching 
DecisionTreeClassifier and DecisionTreeRegressor in the Scala/Java/Python APIs

> Decision Tree Regression wrapper in SparkR
> ------------------------------------------
>
>                 Key: SPARK-15767
>                 URL: https://issues.apache.org/jira/browse/SPARK-15767
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, SparkR
>            Reporter: Kai Jiang
>            Assignee: Kai Jiang
>
> Implement a wrapper in SparkR to support decision tree regression. R's naive 
> Decision Tree Regression implementation is from package rpart with signature 
> rpart(formula, dataframe, method="anova"). I propose we could implement API 
> like spark.decisionTreeRegression(dataframe, formula, ...) .  After having 
> implemented decision tree classification, we could refactor this two into an 
> API more like rpart()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to