[ https://issues.apache.org/jira/browse/SPARK-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137758#comment-14137758 ]
Eustache commented on SPARK-3530: --------------------------------- Great to see the design docs ! A few questions/remarks: - Big +1 for Pipeline and Dataset as first-class abstractions - being a long time sklearn user Pipelines are a very convenient way to think for many problems, e.g. implementing Cascades of models integrate unsupervised steps for feature transformation in a supervised task etc - Isn't the "fit multiple models at once" part a bit of an early optimization ? How many users would benefit from it ? IMHO it complicates the API for most users. - I'm also wondering if a meta class wouldn't be capable of doing multiple models. AFAICT fitting multiple models at once resembles a parameter grid search isn't it? I assume the later would return evaluation metrics for each parameter set as well as the model itself, right ? - It seems to me that multi-task learning would be a good example for the "multiple models at once" but is maybe not a typical example of what most users would want. Also I'm not 100% sure the implementation should necessarily profit from such an API > Pipeline and Parameters > ----------------------- > > Key: SPARK-3530 > URL: https://issues.apache.org/jira/browse/SPARK-3530 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > Priority: Critical > > This part of the design doc is for pipelines and parameters. I put the design > doc at > https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing > I will copy the proposed interfaces to this JIRA later. Some sample code can > be viewed at: https://github.com/mengxr/spark-ml/ > Please help review the design and post your comments here. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org