[ 
https://issues.apache.org/jira/browse/SPARK-18755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247910#comment-16247910
 ] 

Ganesh Sivalingam commented on SPARK-18755:
-------------------------------------------

[~yuhaoyan] No problem, I do have some things to add:

If you have a look in the scikit-learn code base, the {{RandomizedSearchCV}} 
and {{GridSearchCV}} functions are exactly the same, except in the way they 
handle the incoming parameter distributions. 

{{GridSearchCV }} does the same thing as {{ParamGridBuilder.build()}} and 
{{RandomizedSearchCV}} does the equivalent of what 
{{RandomParamGridBuilder.build()}} (which I just submitted) does.

Once the parameter sets have been created they both use {{BaseSearchCV}} for 
everything else, and this is does the same as the current Spark 
{{CrossValidator}} class.

I could create a {{RandomSearchCrossValidator}} class using the logic in 
{{RandomParamGridBuilder}} if you like? I will also be available for doing 
benchmarking.

> Add Randomized Grid Search to Spark ML
> --------------------------------------
>
>                 Key: SPARK-18755
>                 URL: https://issues.apache.org/jira/browse/SPARK-18755
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: yuhao yang
>
> Randomized Grid Search  implements a randomized search over parameters, where 
> each setting is sampled from a distribution over possible parameter values. 
> This has two main benefits over an exhaustive search:
> 1. A budget can be chosen independent of the number of parameters and 
> possible values.
> 2. Adding parameters that do not influence the performance does not decrease 
> efficiency.
> Randomized Grid search usually gives similar result as exhaustive search, 
> while the run time for randomized search is drastically lower.
> For more background, please refer to:
> sklearn: http://scikit-learn.org/stable/modules/grid_search.html
> http://blog.kaggle.com/2015/07/16/scikit-learn-video-8-efficiently-searching-for-optimal-tuning-parameters/
> http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
> https://www.r-bloggers.com/hyperparameter-optimization-in-h2o-grid-search-random-search-and-the-future/.
> There're two ways to implement this in Spark as I see:
> 1. Add searchRatio to ParamGridBuilder and conduct sampling directly during 
> build. Only 1 new public function is required.
> 2. Add trait RadomizedSearch and create new class RandomizedCrossValidator 
> and RandomizedTrainValiationSplit, which can be complicated since we need to 
> deal with the models.
> I'd prefer option 1 as it's much simpler and straightforward. We can support 
> Randomized grid search via some smallest change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to