yuhao yang created SPARK-21535: ---------------------------------- Summary: Reduce memory requirement for CrossValidator and TrainValidationSplit Key: SPARK-21535 URL: https://issues.apache.org/jira/browse/SPARK-21535 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.2.0 Reporter: yuhao yang
CrossValidator and TrainValidationSplit both use {code}models = est.fit(trainingDataset, epm) {code} to fit the models, where epm is Array[ParamMap]. Even though the training process is sequential, current implementation consumes extra driver memory for holding the trained models, which is not necessary and often leads to memory exception for both CrossValidator and TrainValidationSplit. My proposal is to changing the training implementation to train one model at a time, thus that used local model can be collected by GC, and avoid the unnecessary OOM exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org