[ 
https://issues.apache.org/jira/browse/SPARK-14033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216756#comment-15216756
 ] 

Michael ZieliƄski commented on SPARK-14033:
-------------------------------------------

Re: ML vs MLLib, I also think about it in terms RDDs versus DataFrames.

Re: Estimator/Model I prefer the current version that preserves immutability to 
a larger degree. That said, maybe merging those concepts would make it easier 
for the next stage of a Pipeline to use outputs from previous stage. Currently 
if you have:

val a1 = new Estimator1
val a2 = new Estimator2.setParamAbc(a1.getParamCde)

You can only get the members from Estimator1, but not Estimator1Model. If 
they're the same class it would make things easier. As an example you want to 
take top K variables from Random Forest model as input to Logistic Regression. 



> Merging Estimator & Model
> -------------------------
>
>                 Key: SPARK-14033
>                 URL: https://issues.apache.org/jira/browse/SPARK-14033
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>         Attachments: StyleMutabilityMergingEstimatorandModel.pdf
>
>
> This JIRA is for merging the spark.ml concepts of Estimator and Model.
> Goal: Have clearer semantics which match existing libraries (such as 
> scikit-learn).
> For details, please see the linked design doc.  Comment on this JIRA to give 
> feedback on the proposed design.  Once the proposal is discussed and this 
> work is confirmed as ready to proceed, this JIRA will serve as an umbrella 
> for the merge tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to