[ https://issues.apache.org/jira/browse/SPARK-19071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802260#comment-15802260 ]
Joseph K. Bradley commented on SPARK-19071: ------------------------------------------- Thanks @Bryan for the thoughtful design doc! These are all useful-sounding optimizations, though I agree with your observation that we will need to worry about caching. @TimHunter actually did some testing of these sorts of optimizations and found that it's pretty easy to find sets of jobs which should be parallelized and sets which should be sequential. You're right that we could sidestep this issue by adding a Param for specifying desired parallelism, though I'm unsure of how to set a good default. About #2 (Pipeline optimizations): I don't think we need to construct the tree explicitly. It can be created implicitly by the DFS done by recursion or a loop. Also, I just added a few more thoughts to [SPARK-5844]. Btw, I unfortunately do not have much bandwidth to work on this right now, but I'll definitely try to keep up with the conversations. Thanks! > Optimizations for ML Pipeline Tuning > ------------------------------------ > > Key: SPARK-19071 > URL: https://issues.apache.org/jira/browse/SPARK-19071 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Bryan Cutler > > This is a parent task to plan the addition of optimizations in ML tuning for > parallel model evaluation and more efficiency with pipelines. They will > benefit Crossvalidator and TrainValidationSplit when performing a parameter > grid search. The proposal can be broken into 3 steps in order of simplicity: > 1. Add ability to evaluate models in parallel. > 2. Optimize param grid for pipelines, as described in SPARK-5844 > 3. Add parallel model evaluation to the optimized pipelines from step 2 > See the linked design document for details on the proposed implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org