[ https://issues.apache.org/jira/browse/SPARK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng updated SPARK-1486: --------------------------------- Assignee: (was: Burak Yavuz) > Support multi-model training in MLlib > ------------------------------------- > > Key: SPARK-1486 > URL: https://issues.apache.org/jira/browse/SPARK-1486 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Xiangrui Meng > Priority: Critical > > It is rare in practice to train just one model with a given set of > parameters. Usually, this is done by training multiple models with different > sets of parameters and then select the best based on their performance on the > validation set. MLlib should provide native support for multi-model > training/scoring. It requires decoupling of concepts like problem, > formulation, algorithm, parameter set, and model, which are missing in MLlib > now. MLI implements similar concepts, which we can borrow. There are > different approaches for multi-model training: > 0) Keep one copy of the data, and train models one after another (or maybe in > parallel, depending on the scheduler). > 1) Keep one copy of the data, and train multiple models at the same time > (similar to `runs` in KMeans). > 2) Make multiple copies of the data (still stored distributively), and use > more cores to distribute the work. > 3) Collect the data, make the entire dataset available on workers, and train > one or more models on each worker. > Users should be able to choose which execution mode they want to use. Note > that 3) could cover many use cases in practice when the training data is not > huge, e.g., <1GB. > This task will be divided into sub-tasks and this JIRA is created to discuss > the design and track the overall progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org