Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/16774#discussion_r136868226 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -100,31 +113,53 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") override val uid: String) val eval = $(evaluator) val epm = $(estimatorParamMaps) val numModels = epm.length - val metrics = new Array[Double](epm.length) + + // Create execution context based on $(parallelism) + val executionContext = getExecutionContext --- End diff -- In the corresponding PR for PySpark implementation the number of threads is limited by the number of models to be trained (https://github.com/WeichenXu123/spark/blob/be2f3d0ec50db4730c9e3f9a813a4eb96889f5b6/python/pyspark/ml/tuning.py#L261). We might do that for instance by overriding the `getParallelism` method. What do you think about this?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org