Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18733#discussion_r131268294
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
    @@ -112,16 +112,16 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") 
override val uid: String)
           val validationDataset = sparkSession.createDataFrame(validation, 
schema).cache()
           // multi-model training
           logDebug(s"Train split $splitIndex with multiple sets of 
parameters.")
    -      val models = est.fit(trainingDataset, 
epm).asInstanceOf[Seq[Model[_]]]
    -      trainingDataset.unpersist()
           var i = 0
           while (i < numModels) {
    +        val model = est.fit(trainingDataset, epm(i)).asInstanceOf[Model[_]]
             // TODO: duplicate evaluator to take extra params from input
    -        val metric = eval.evaluate(models(i).transform(validationDataset, 
epm(i)))
    +        val metric = eval.evaluate(model.transform(validationDataset, 
epm(i)))
             logDebug(s"Got metric $metric for model trained with ${epm(i)}.")
             metrics(i) += metric
             i += 1
           }
    +      trainingDataset.unpersist()
    --- End diff --
    
    Ah you're right. I was under the wrong impression that validationDataset is 
always in the memory.
    
    Even though the size of validationDataset is `1/kfold` of the 
trainingDataset's and it's only used in the `transform` but not the `fit` 
process, I still cannot prove that the new implementation is better in all the 
circumstance. 
    
    I'll close the PR unless there's a better way to resolve the concern. 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to