[jira] [Commented] (SPARK-34415) Use randomization as a possibly better technique than grid search in optimizing hyperparameters

2021-08-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403608#comment-17403608
 ] 

Apache Spark commented on SPARK-34415:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33819

> Use randomization as a possibly better technique than grid search in 
> optimizing hyperparameters
> ---
>
> Key: SPARK-34415
> URL: https://issues.apache.org/jira/browse/SPARK-34415
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 3.0.1
>Reporter: Phillip Henry
>Assignee: Phillip Henry
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>
> Randomization can be a more effective techinique than a grid search in 
> finding optimal hyperparameters since min/max points can fall between the 
> grid lines and never be found. Randomisation is not so restricted although 
> the probability of finding minima/maxima is dependent on the number of 
> attempts.
> Alice Zheng has an accessible description on how this technique works at 
> [https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html]
> (Note that I have a PR for this work outstanding at 
> [https://github.com/apache/spark/pull/31535] )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34415) Use randomization as a possibly better technique than grid search in optimizing hyperparameters

2021-08-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402436#comment-17402436
 ] 

Apache Spark commented on SPARK-34415:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/33800

> Use randomization as a possibly better technique than grid search in 
> optimizing hyperparameters
> ---
>
> Key: SPARK-34415
> URL: https://issues.apache.org/jira/browse/SPARK-34415
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 3.0.1
>Reporter: Phillip Henry
>Assignee: Phillip Henry
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>
> Randomization can be a more effective techinique than a grid search in 
> finding optimal hyperparameters since min/max points can fall between the 
> grid lines and never be found. Randomisation is not so restricted although 
> the probability of finding minima/maxima is dependent on the number of 
> attempts.
> Alice Zheng has an accessible description on how this technique works at 
> [https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html]
> (Note that I have a PR for this work outstanding at 
> [https://github.com/apache/spark/pull/31535] )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34415) Use randomization as a possibly better technique than grid search in optimizing hyperparameters

2021-08-17 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400655#comment-17400655
 ] 

Sean R. Owen commented on SPARK-34415:
--

I agree; I think it was mostly as it makes it simple to extend and reuse the 
param grid builder rather than reimplement a fair bit more code that uses it. 
It isn't as useful as generation random samples each time. 

Hm, on a second look though, couldn't the new class override build() to 
generate a bunch of actually randomly-sampled combinations? that part is easy I 
think, but then the question is, how many combinations to return? that would 
need a new API somewhere.

You could argue this is a bit misleading as the caller may expect it to 
generate random samples not randomly generate the grid. Hm, I'm retroactively 
on the fence about it. Is it worth trying to redesign quickly for 3.2.0? maybe 
a small impl and API change can support what this might be expected to do. 
Leave it? revert?

> Use randomization as a possibly better technique than grid search in 
> optimizing hyperparameters
> ---
>
> Key: SPARK-34415
> URL: https://issues.apache.org/jira/browse/SPARK-34415
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 3.0.1
>Reporter: Phillip Henry
>Assignee: Phillip Henry
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>
> Randomization can be a more effective techinique than a grid search in 
> finding optimal hyperparameters since min/max points can fall between the 
> grid lines and never be found. Randomisation is not so restricted although 
> the probability of finding minima/maxima is dependent on the number of 
> attempts.
> Alice Zheng has an accessible description on how this technique works at 
> [https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html]
> (Note that I have a PR for this work outstanding at 
> [https://github.com/apache/spark/pull/31535] )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34415) Use randomization as a possibly better technique than grid search in optimizing hyperparameters

2021-08-17 Thread Xiangrui Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400644#comment-17400644
 ] 

Xiangrui Meng commented on SPARK-34415:
---

[~phenry] [~srowen] The implementation doesn't do uniform sampling of the 
hyper-parameter search space. Instead, it samples per params and then construct 
the cartesian product of all combinations. I think this would significantly 
reduce the effectiveness of the random search. Was it already discussed?

> Use randomization as a possibly better technique than grid search in 
> optimizing hyperparameters
> ---
>
> Key: SPARK-34415
> URL: https://issues.apache.org/jira/browse/SPARK-34415
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 3.0.1
>Reporter: Phillip Henry
>Assignee: Phillip Henry
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>
> Randomization can be a more effective techinique than a grid search in 
> finding optimal hyperparameters since min/max points can fall between the 
> grid lines and never be found. Randomisation is not so restricted although 
> the probability of finding minima/maxima is dependent on the number of 
> attempts.
> Alice Zheng has an accessible description on how this technique works at 
> [https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html]
> (Note that I have a PR for this work outstanding at 
> [https://github.com/apache/spark/pull/31535] )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34415) Use randomization as a possibly better technique than grid search in optimizing hyperparameters

2021-02-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282315#comment-17282315
 ] 

Apache Spark commented on SPARK-34415:
--

User 'PhillHenry' has created a pull request for this issue:
https://github.com/apache/spark/pull/31535

> Use randomization as a possibly better technique than grid search in 
> optimizing hyperparameters
> ---
>
> Key: SPARK-34415
> URL: https://issues.apache.org/jira/browse/SPARK-34415
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 3.0.1
>Reporter: Phillip Henry
>Priority: Minor
>  Labels: pull-request-available
>
> Randomization can be a more effective techinique than a grid search in 
> finding optimal hyperparameters since min/max points can fall between the 
> grid lines and never be found. Randomisation is not so restricted although 
> the probability of finding minima/maxima is dependent on the number of 
> attempts.
> Alice Zheng has an accessible description on how this technique works at 
> [https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html]
> (Note that I have a PR for this work outstanding at 
> [https://github.com/apache/spark/pull/31535] )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34415) Use randomization as a possibly better technique than grid search in optimizing hyperparameters

2021-02-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282313#comment-17282313
 ] 

Apache Spark commented on SPARK-34415:
--

User 'PhillHenry' has created a pull request for this issue:
https://github.com/apache/spark/pull/31535

> Use randomization as a possibly better technique than grid search in 
> optimizing hyperparameters
> ---
>
> Key: SPARK-34415
> URL: https://issues.apache.org/jira/browse/SPARK-34415
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 3.0.1
>Reporter: Phillip Henry
>Priority: Minor
>  Labels: pull-request-available
>
> Randomization can be a more effective techinique than a grid search in 
> finding optimal hyperparameters since min/max points can fall between the 
> grid lines and never be found. Randomisation is not so restricted although 
> the probability of finding minima/maxima is dependent on the number of 
> attempts.
> Alice Zheng has an accessible description on how this technique works at 
> [https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html]
> (Note that I have a PR for this work outstanding at 
> [https://github.com/apache/spark/pull/31535] )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org