yuhao yang created SPARK-21535:
----------------------------------

             Summary: Reduce memory requirement for CrossValidator and 
TrainValidationSplit 
                 Key: SPARK-21535
                 URL: https://issues.apache.org/jira/browse/SPARK-21535
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.2.0
            Reporter: yuhao yang


CrossValidator and TrainValidationSplit both use 
{code}models = est.fit(trainingDataset, epm) {code} to fit the models, where 
epm is Array[ParamMap].

Even though the training process is sequential, current implementation consumes 
extra driver memory for holding the trained models, which is not necessary and 
often leads to memory exception for both CrossValidator and 
TrainValidationSplit. My proposal is to changing the training implementation to 
train one model at a time, thus that used local model can be collected by GC, 
and avoid the unnecessary OOM exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to