Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19208#discussion_r148925715
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
    @@ -117,6 +123,12 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") 
override val uid: String)
         instr.logParams(numFolds, seed, parallelism)
         logTuningParams(instr)
     
    +    val collectSubModelsParam = $(collectSubModels)
    +
    +    var subModels: Option[Array[Array[Model[_]]]] = if 
(collectSubModelsParam) {
    --- End diff --
    
    I don't follow with #1, if we keep all the models (e.g. set 
`collectSubModelsParam`)  then the maximum memory cost will be 
`$(estimatorParamMaps).length * sizeof(model)` in either case? If we don't keep 
the models (e.g. set `collectSubModelsParam` to false) then you don't have to 
collect the future back at the end and there is no additional overhead.
    
    For #2, It's not that mutation impacts performance, its that it makes the 
code less easy to reason about for no gain (unless I've misunderstood something 
about part 1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to