[ https://issues.apache.org/jira/browse/SPARK-19071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836957#comment-15836957 ]
Bryan Cutler commented on SPARK-19071: -------------------------------------- Thanks for the comments [~josephkb] and [~mlnick]. I'll start by opening a JIRA for the naive approach in step 1 and submit the PR I have. There we can discuss the issues of how to handle the default parallelism and caching. > Optimizations for ML Pipeline Tuning > ------------------------------------ > > Key: SPARK-19071 > URL: https://issues.apache.org/jira/browse/SPARK-19071 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Bryan Cutler > > This is a parent task to plan the addition of optimizations in ML tuning for > parallel model evaluation and more efficiency with pipelines. They will > benefit Crossvalidator and TrainValidationSplit when performing a parameter > grid search. The proposal can be broken into 3 steps in order of simplicity: > 1. Add ability to evaluate models in parallel. > 2. Optimize param grid for pipelines, as described in SPARK-5844 > 3. Add parallel model evaluation to the optimized pipelines from step 2 > See the linked design document for details on the proposed implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org