Repository: spark Updated Branches: refs/heads/master c8b7f97b8 -> d8741b2b0
[SPARK-21911][ML][FOLLOW-UP] Fix doc for parallel ML Tuning in PySpark ## What changes were proposed in this pull request? Fix doc issue mentioned here: https://github.com/apache/spark/pull/19122#issuecomment-340111834 ## How was this patch tested? N/A Author: WeichenXu <weichen...@databricks.com> Closes #19641 from WeichenXu123/fix_doc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d8741b2b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d8741b2b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d8741b2b Branch: refs/heads/master Commit: d8741b2b0fe8b8da74f120859e969326fb170629 Parents: c8b7f97 Author: WeichenXu <weichen...@databricks.com> Authored: Mon Nov 13 17:00:51 2017 -0800 Committer: Joseph K. Bradley <jos...@databricks.com> Committed: Mon Nov 13 17:00:51 2017 -0800 ---------------------------------------------------------------------- docs/ml-tuning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/d8741b2b/docs/ml-tuning.md ---------------------------------------------------------------------- diff --git a/docs/ml-tuning.md b/docs/ml-tuning.md index 64dc46c..54d9cd2 100644 --- a/docs/ml-tuning.md +++ b/docs/ml-tuning.md @@ -55,7 +55,7 @@ for multiclass problems. The default metric used to choose the best `ParamMap` c method in each of these evaluators. To help construct the parameter grid, users can use the [`ParamGridBuilder`](api/scala/index.html#org.apache.spark.ml.tuning.ParamGridBuilder) utility. -By default, sets of parameters from the parameter grid are evaluated in serial. Parameter evaluation can be done in parallel by setting `parallelism` with a value of 2 or more (a value of 1 will be serial) before running model selection with `CrossValidator` or `TrainValidationSplit` (NOTE: this is not yet supported in Python). +By default, sets of parameters from the parameter grid are evaluated in serial. Parameter evaluation can be done in parallel by setting `parallelism` with a value of 2 or more (a value of 1 will be serial) before running model selection with `CrossValidator` or `TrainValidationSplit`. The value of `parallelism` should be chosen carefully to maximize parallelism without exceeding cluster resources, and larger values may not always lead to improved performance. Generally speaking, a value up to 10 should be sufficient for most clusters. # Cross-Validation --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org