Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16774#discussion_r136868226
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
    @@ -100,31 +113,53 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") 
override val uid: String)
         val eval = $(evaluator)
         val epm = $(estimatorParamMaps)
         val numModels = epm.length
    -    val metrics = new Array[Double](epm.length)
    +
    +    // Create execution context based on $(parallelism)
    +    val executionContext = getExecutionContext
    --- End diff --
    
    In the corresponding PR for PySpark implementation the number of threads is 
limited by the number of models to be trained 
(https://github.com/WeichenXu123/spark/blob/be2f3d0ec50db4730c9e3f9a813a4eb96889f5b6/python/pyspark/ml/tuning.py#L261).
 We might do that for instance by overriding the `getParallelism` method. What 
do you think about this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to