[ 
https://issues.apache.org/jira/browse/SPARK-18755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195145#comment-16195145
 ] 

Ilya Matiach commented on SPARK-18755:
--------------------------------------

This is a very interesting issue.  I am thinking of implementing a 
hyperparameter tuner in "mmlspark" spark package:

https://github.com/Azure/mmlspark

The scope might be too large to put this in spark directly for now.
I'm interested not only in doing randomized search but in other methods as 
well, eg Nelder-Mead and KDO.  Also, I would like the API to work on multiple 
different learners simultaneously (eg logistic regression and GBT Classifier 
via one easy API).  I'm not quite sure how I could create an API that would 
allow the user to supply multiple estimators in the most user-friendly way.
Are there any similar spark packages out there that do distributed parameter 
sweeping?


> Add Randomized Grid Search to Spark ML
> --------------------------------------
>
>                 Key: SPARK-18755
>                 URL: https://issues.apache.org/jira/browse/SPARK-18755
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: yuhao yang
>
> Randomized Grid Search  implements a randomized search over parameters, where 
> each setting is sampled from a distribution over possible parameter values. 
> This has two main benefits over an exhaustive search:
> 1. A budget can be chosen independent of the number of parameters and 
> possible values.
> 2. Adding parameters that do not influence the performance does not decrease 
> efficiency.
> Randomized Grid search usually gives similar result as exhaustive search, 
> while the run time for randomized search is drastically lower.
> For more background, please refer to:
> sklearn: http://scikit-learn.org/stable/modules/grid_search.html
> http://blog.kaggle.com/2015/07/16/scikit-learn-video-8-efficiently-searching-for-optimal-tuning-parameters/
> http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
> https://www.r-bloggers.com/hyperparameter-optimization-in-h2o-grid-search-random-search-and-the-future/.
> There're two ways to implement this in Spark as I see:
> 1. Add searchRatio to ParamGridBuilder and conduct sampling directly during 
> build. Only 1 new public function is required.
> 2. Add trait RadomizedSearch and create new class RandomizedCrossValidator 
> and RandomizedTrainValiationSplit, which can be complicated since we need to 
> deal with the models.
> I'd prefer option 1 as it's much simpler and straightforward. We can support 
> Randomized grid search via some smallest change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to