[ https://issues.apache.org/jira/browse/SPARK-19071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796709#comment-15796709 ]
Bryan Cutler commented on SPARK-19071: -------------------------------------- I have a working version of parallel model evaluation, and a POC for pipeline optimization that I've done some initial testing on, and both seem to be working well with expected speedups. > Optimizations for ML Pipeline Tuning > ------------------------------------ > > Key: SPARK-19071 > URL: https://issues.apache.org/jira/browse/SPARK-19071 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Bryan Cutler > > This is a parent task to plan the addition of optimizations in ML tuning for > parallel model evaluation and more efficiency with pipelines. They will > benefit Crossvalidator and TrainValidationSplit when performing a parameter > grid search. The proposal can be broken into 3 steps in order of simplicity: > 1. Add ability to evaluate models in parallel. > 2. Optimize param grid for pipelines, as described in SPARK-5844 > 3. Add parallel model evaluation to the optimized pipelines from step 2 > See the linked design document for details on the proposed implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org