[ 
https://issues.apache.org/jira/browse/SPARK-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137758#comment-14137758
 ] 

Eustache commented on SPARK-3530:
---------------------------------

Great to see the design docs !

A few questions/remarks:

- Big +1 for Pipeline and Dataset as first-class abstractions - being a long 
time sklearn user Pipelines are a very convenient way to think for many 
problems, e.g. implementing Cascades of models integrate unsupervised steps for 
feature transformation in a supervised task etc

- Isn't the "fit multiple models at once" part a bit of an early optimization ? 
How many users would benefit from it ? IMHO it complicates the API for most 
users.

- I'm also wondering if a meta class wouldn't be capable of doing multiple 
models. AFAICT fitting multiple models at once resembles a parameter grid 
search isn't it? I assume the later would return evaluation metrics for each 
parameter set as well as the model itself, right ?

- It seems to me that multi-task learning would be a good example for the 
"multiple models at once" but is maybe not a typical example of what most users 
would want. Also I'm not 100% sure the implementation should necessarily profit 
from such an API

> Pipeline and Parameters
> -----------------------
>
>                 Key: SPARK-3530
>                 URL: https://issues.apache.org/jira/browse/SPARK-3530
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Critical
>
> This part of the design doc is for pipelines and parameters. I put the design 
> doc at
> https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing
> I will copy the proposed interfaces to this JIRA later. Some sample code can 
> be viewed at: https://github.com/mengxr/spark-ml/
> Please help review the design and post your comments here. Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to