[jira] [Comment Edited] (SPARK-21086) CrossValidator, TrainValidationSplit should preserve all models after fitting

yuhao yang (JIRA) Tue, 13 Jun 2017 22:23:22 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16048647#comment-16048647
 ]


yuhao yang edited comment on SPARK-21086 at 6/14/17 5:22 AM:
-------------------------------------------------------------

Sounds good. About the default path for saving different models, how about we 
use the flatten parameter as the file name. 
e.g. LogisticRegressionModel-maxIter-100-regParam-0.1

And I would not implement it with the ML Persistence Framework, simply because 
caching the models in memory would be expensive (especially impractical for 
driver memory) and would impact the existing usage of CrossValidator (Slower or 
OOM). I would recommend adding an expert param and save the models during 
training.


was (Author: yuhaoyan):
Sounds good. About the default path for saving different models, how about we 
use the flatten parameter as the file name. 
e.g. LogisticRegressionModel-maxIter-100-regParam-0.1

And I would not implement it with the ML Persistence Framework, simply because 
caching the models in memory would be expensive and would impact the existing 
usage of CrossValidator (Slower or OOM). I would recommend adding an expert 
param and save the models during training.

> CrossValidator, TrainValidationSplit should preserve all models after fitting
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-21086
>                 URL: https://issues.apache.org/jira/browse/SPARK-21086
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Joseph K. Bradley
>
> I've heard multiple requests for having CrossValidatorModel and 
> TrainValidationSplitModel preserve the full list of fitted models.  This 
> sounds very valuable.
> One decision should be made before we do this: Should we save and load the 
> models in ML persistence?  That could blow up the size of a saved Pipeline if 
> the models are large.
> * I suggest *not* saving the models by default but allowing saving if 
> specified.  We could specify whether to save the model as an extra Param for 
> CrossValidatorModelWriter, but we would have to make sure to expose 
> CrossValidatorModelWriter as a public API and modify the return type of 
> CrossValidatorModel.write to be CrossValidatorModelWriter (but this will not 
> be a breaking change).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21086) CrossValidator, TrainValidationSplit should preserve all models after fitting

Reply via email to