yuhao yang created SPARK-18755:
----------------------------------

             Summary: Add Randomized Grid Search to Spark ML
                 Key: SPARK-18755
                 URL: https://issues.apache.org/jira/browse/SPARK-18755
             Project: Spark
          Issue Type: Improvement
          Components: ML
            Reporter: yuhao yang


Randomized Grid Search  implements a randomized search over parameters, where 
each setting is sampled from a distribution over possible parameter values. 
This has two main benefits over an exhaustive search:
1. A budget can be chosen independent of the number of parameters and possible 
values.
2. Adding parameters that do not influence the performance does not decrease 
efficiency.

Randomized Grid search usually gives similar result as exhaustive search, while 
the run time for randomized search is drastically lower.

For more background, please refer to:

sklearn: http://scikit-learn.org/stable/modules/grid_search.html
http://blog.kaggle.com/2015/07/16/scikit-learn-video-8-efficiently-searching-for-optimal-tuning-parameters/
http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
https://www.r-bloggers.com/hyperparameter-optimization-in-h2o-grid-search-random-search-and-the-future/.

There're two ways to implement this in Spark as I see:
1. Add searchRatio to ParamGridBuilder and conduct sampling directly during 
build.
2. Add trait RadomizedSearch and create new class RandomizedCrossValidator and 
RandomizedTrainValiationSplit.

I'd prefer option 1 as it's much simpler and straightforward.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to