GitHub user WeichenXu123 opened a pull request:

    https://github.com/apache/spark/pull/19208

    [SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all 
models after fitting: Scala

    ## What changes were proposed in this pull request?
    
    1. We add a parameter whether to collect the full model list when 
CrossValidator/TrainValidationSplit training (Default is NOT, avoid the change 
cause OOM)
    
    - Add a method in CrossValidatorModel/TrainValidationSplitModel, allow user 
to get the model list
    
    - CrossValidatorModelWriter add a “option”, allow user to control 
whether to persist the model list to disk.
    
    - Note: when persisting the model list, use indices as the sub-model path
    
    2. We add a parameter indicating whether to persist models to disk during 
training (default = off).  
    
    - This will use ML persistence to dump models to a directory so they are 
available later but do not consume memory.
    
    - Note: when persisting the model list, use indices as the sub-model path
    
    
    ## How was this patch tested?
    
    Test cases added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/WeichenXu123/spark expose-model-list

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19208
    
----
commit 46d3ab3899c196311368b3383338b3d4e6d5aeaa
Author: WeichenXu <weichen...@databricks.com>
Date:   2017-09-11T13:28:53Z

    init pr

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to