GitHub user tengpeng opened a pull request: https://github.com/apache/spark/pull/19660
[SPARK-18755][WIP][ML] Add Randomized Grid Search to Spark ML ## What changes were proposed in this pull request? Python sklearn has a randomized grid search for reducing the time for parameter tuning: 1. If the candidate parameter values are all discrete, sampling with replacement. 2. If at least one candidate parameter is continuous, sampling without replacement. This patch mimic the behavior of case 1 only. If we want to do 2, we need significant changes in `ParamGridBuilder` and other cross validation components. This requires more discussions. Thoughts? Reference: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html ## How was this patch tested? Existing test + a new unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tengpeng/spark CV Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19660.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19660 ---- commit 46086202f14185b18351f50c9c09f0641af4bb4f Author: test <joseph.p...@quetica.com> Date: 2017-11-05T22:40:51Z Initial commit for searchRatio ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org