Re: Hyperparameter Optimization via Randomization

2021-02-09 Thread Phillip Henry
Hi, Sean.

I've added a comment in the new class to suggest a look at Hyperopt etc if
the user is using Python.

Anyway I've created a pull request:

https://github.com/apache/spark/pull/31535

and all tests, style checks etc pass. Wish me luck :)

And thanks for the support :)

Phillip



On Mon, Feb 8, 2021 at 4:12 PM Sean Owen  wrote:

> It seems pretty reasonable to me. If it's a pull request we can code
> review it.
> My only question is just, would it be better to tell people to use
> hyperopt, and how much better is this than implementing randomization on
> the grid.
> But the API change isn't significant so maybe just fine.
>
> On Mon, Feb 8, 2021 at 3:49 AM Phillip Henry 
> wrote:
>
>> Hi, Sean.
>>
>> I don't think sampling from a grid is a good idea as the min/max may lie
>> between grid points. Unconstrained random sampling avoids this problem. To
>> this end, I have an implementation at:
>>
>> https://github.com/apache/spark/compare/master...PhillHenry:master
>>
>> It is unit tested and does not change any already existing code.
>>
>> Totally get what you mean about Hyperopt but this is a pure JVM solution
>> that's fairly straightforward.
>>
>> Is it worth contributing?
>>
>> Thanks,
>>
>> Phillip
>>
>>
>>
>>
>>
>> On Sat, Jan 30, 2021 at 2:00 PM Sean Owen  wrote:
>>
>>> I was thinking ParamGridBuilder would have to change to accommodate a
>>> continuous range of values, and that's not hard, though other code wouldn't
>>> understand that type of value, like the existing simple grid builder.
>>> It's all possible just wondering if simply randomly sampling the grid is
>>> enough. That would be a simpler change, just a new method or argument.
>>>
>>> Yes part of it is that if you really want to search continuous spaces,
>>> hyperopt is probably even better, so how much do you want to put into
>>> Pyspark - something really simple sure.
>>> Not out of the question to do something more complex if it turns out to
>>> also be pretty simple.
>>>
>>> On Sat, Jan 30, 2021 at 4:42 AM Phillip Henry 
>>> wrote:
>>>
 Hi, Sean.

 Perhaps I don't understand. As I see it, ParamGridBuilder builds an
 Array[ParamMap]. What I am proposing is a new class that also builds an
 Array[ParamMap] via its build() method, so there would be no "change in the
 APIs". This new class would, of course, have methods that defined the
 search space (log, linear, etc) over which random values were chosen.

 Now, if this is too trivial to warrant the work and people prefer
 Hyperopt, then so be it. It might be useful for people not using Python but
 they can just roll-their-own, I guess.

 Anyway, looking forward to hearing what you think.

 Regards,

 Phillip



 On Fri, Jan 29, 2021 at 4:18 PM Sean Owen  wrote:

> I think that's a bit orthogonal - right now you can't specify
> continuous spaces. The straightforward thing is to allow random sampling
> from a big grid. You can create a geometric series of values to try, of
> course - 0.001, 0.01, 0.1, etc.
> Yes I get that if you're randomly choosing, you can randomly choose
> from a continuous space of many kinds. I don't know if it helps a lot vs
> the change in APIs (and continuous spaces don't make as much sense for 
> grid
> search)
> Of course it helps a lot if you're doing a smarter search over the
> space, like what hyperopt does. For that, I mean, one can just use
> hyperopt + Spark ML already if desired.
>
> On Fri, Jan 29, 2021 at 9:01 AM Phillip Henry 
> wrote:
>
>> Thanks, Sean! I hope to offer a PR next week.
>>
>> Not sure about a dependency on the grid search, though - but happy to
>> hear your thoughts. I mean, you might want to explore logarithmic space
>> evenly. For example,  something like "please search 1e-7 to 1e-4" leads 
>> to
>> a reasonably random sample being {3e-7, 2e-6, 9e-5}. These are (roughly)
>> evenly spaced in logarithmic space but not in linear space. So, saying 
>> what
>> fraction of a grid search to sample wouldn't make sense (unless the grid
>> was warped, of course).
>>
>> Does that make sense? It might be better for me to just write the
>> code as I don't think it would be very complicated.
>>
>> Happy to hear your thoughts.
>>
>> Phillip
>>
>>
>>
>> On Fri, Jan 29, 2021 at 1:47 PM Sean Owen  wrote:
>>
>>> I don't know of anyone working on that. Yes I think it could be
>>> useful. I think it might be easiest to implement by simply having some
>>> parameter to the grid search process that says what fraction of all
>>> possible combinations you want to randomly test.
>>>
>>> On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry <
>>> londonjava...@gmail.com> wrote:
>>>
 Hi,

 I have no work at the moment so I was wondering if anybody would be
 

Re: Hyperparameter Optimization via Randomization

2021-02-08 Thread Sean Owen
It seems pretty reasonable to me. If it's a pull request we can code review
it.
My only question is just, would it be better to tell people to use
hyperopt, and how much better is this than implementing randomization on
the grid.
But the API change isn't significant so maybe just fine.

On Mon, Feb 8, 2021 at 3:49 AM Phillip Henry 
wrote:

> Hi, Sean.
>
> I don't think sampling from a grid is a good idea as the min/max may lie
> between grid points. Unconstrained random sampling avoids this problem. To
> this end, I have an implementation at:
>
> https://github.com/apache/spark/compare/master...PhillHenry:master
>
> It is unit tested and does not change any already existing code.
>
> Totally get what you mean about Hyperopt but this is a pure JVM solution
> that's fairly straightforward.
>
> Is it worth contributing?
>
> Thanks,
>
> Phillip
>
>
>
>
>
> On Sat, Jan 30, 2021 at 2:00 PM Sean Owen  wrote:
>
>> I was thinking ParamGridBuilder would have to change to accommodate a
>> continuous range of values, and that's not hard, though other code wouldn't
>> understand that type of value, like the existing simple grid builder.
>> It's all possible just wondering if simply randomly sampling the grid is
>> enough. That would be a simpler change, just a new method or argument.
>>
>> Yes part of it is that if you really want to search continuous spaces,
>> hyperopt is probably even better, so how much do you want to put into
>> Pyspark - something really simple sure.
>> Not out of the question to do something more complex if it turns out to
>> also be pretty simple.
>>
>> On Sat, Jan 30, 2021 at 4:42 AM Phillip Henry 
>> wrote:
>>
>>> Hi, Sean.
>>>
>>> Perhaps I don't understand. As I see it, ParamGridBuilder builds an
>>> Array[ParamMap]. What I am proposing is a new class that also builds an
>>> Array[ParamMap] via its build() method, so there would be no "change in the
>>> APIs". This new class would, of course, have methods that defined the
>>> search space (log, linear, etc) over which random values were chosen.
>>>
>>> Now, if this is too trivial to warrant the work and people prefer
>>> Hyperopt, then so be it. It might be useful for people not using Python but
>>> they can just roll-their-own, I guess.
>>>
>>> Anyway, looking forward to hearing what you think.
>>>
>>> Regards,
>>>
>>> Phillip
>>>
>>>
>>>
>>> On Fri, Jan 29, 2021 at 4:18 PM Sean Owen  wrote:
>>>
 I think that's a bit orthogonal - right now you can't specify
 continuous spaces. The straightforward thing is to allow random sampling
 from a big grid. You can create a geometric series of values to try, of
 course - 0.001, 0.01, 0.1, etc.
 Yes I get that if you're randomly choosing, you can randomly choose
 from a continuous space of many kinds. I don't know if it helps a lot vs
 the change in APIs (and continuous spaces don't make as much sense for grid
 search)
 Of course it helps a lot if you're doing a smarter search over the
 space, like what hyperopt does. For that, I mean, one can just use
 hyperopt + Spark ML already if desired.

 On Fri, Jan 29, 2021 at 9:01 AM Phillip Henry 
 wrote:

> Thanks, Sean! I hope to offer a PR next week.
>
> Not sure about a dependency on the grid search, though - but happy to
> hear your thoughts. I mean, you might want to explore logarithmic space
> evenly. For example,  something like "please search 1e-7 to 1e-4" leads to
> a reasonably random sample being {3e-7, 2e-6, 9e-5}. These are (roughly)
> evenly spaced in logarithmic space but not in linear space. So, saying 
> what
> fraction of a grid search to sample wouldn't make sense (unless the grid
> was warped, of course).
>
> Does that make sense? It might be better for me to just write the code
> as I don't think it would be very complicated.
>
> Happy to hear your thoughts.
>
> Phillip
>
>
>
> On Fri, Jan 29, 2021 at 1:47 PM Sean Owen  wrote:
>
>> I don't know of anyone working on that. Yes I think it could be
>> useful. I think it might be easiest to implement by simply having some
>> parameter to the grid search process that says what fraction of all
>> possible combinations you want to randomly test.
>>
>> On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry <
>> londonjava...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have no work at the moment so I was wondering if anybody would be
>>> interested in me contributing code that generates an Array[ParamMap] for
>>> random hyperparameters?
>>>
>>> Apparently, this technique can find a hyperparameter in the top 5%
>>> of parameter space in fewer than 60 iterations with 95% confidence [1].
>>>
>>> I notice that the Spark code base has only the brute force
>>> ParamGridBuilder unless I am missing something.
>>>
>>> Hyperparameter optimization is an area of interest to me but I don't

Re: Hyperparameter Optimization via Randomization

2021-02-08 Thread Phillip Henry
Hi, Sean.

I don't think sampling from a grid is a good idea as the min/max may lie
between grid points. Unconstrained random sampling avoids this problem. To
this end, I have an implementation at:

https://github.com/apache/spark/compare/master...PhillHenry:master

It is unit tested and does not change any already existing code.

Totally get what you mean about Hyperopt but this is a pure JVM solution
that's fairly straightforward.

Is it worth contributing?

Thanks,

Phillip





On Sat, Jan 30, 2021 at 2:00 PM Sean Owen  wrote:

> I was thinking ParamGridBuilder would have to change to accommodate a
> continuous range of values, and that's not hard, though other code wouldn't
> understand that type of value, like the existing simple grid builder.
> It's all possible just wondering if simply randomly sampling the grid is
> enough. That would be a simpler change, just a new method or argument.
>
> Yes part of it is that if you really want to search continuous spaces,
> hyperopt is probably even better, so how much do you want to put into
> Pyspark - something really simple sure.
> Not out of the question to do something more complex if it turns out to
> also be pretty simple.
>
> On Sat, Jan 30, 2021 at 4:42 AM Phillip Henry 
> wrote:
>
>> Hi, Sean.
>>
>> Perhaps I don't understand. As I see it, ParamGridBuilder builds an
>> Array[ParamMap]. What I am proposing is a new class that also builds an
>> Array[ParamMap] via its build() method, so there would be no "change in the
>> APIs". This new class would, of course, have methods that defined the
>> search space (log, linear, etc) over which random values were chosen.
>>
>> Now, if this is too trivial to warrant the work and people prefer
>> Hyperopt, then so be it. It might be useful for people not using Python but
>> they can just roll-their-own, I guess.
>>
>> Anyway, looking forward to hearing what you think.
>>
>> Regards,
>>
>> Phillip
>>
>>
>>
>> On Fri, Jan 29, 2021 at 4:18 PM Sean Owen  wrote:
>>
>>> I think that's a bit orthogonal - right now you can't specify continuous
>>> spaces. The straightforward thing is to allow random sampling from a big
>>> grid. You can create a geometric series of values to try, of course -
>>> 0.001, 0.01, 0.1, etc.
>>> Yes I get that if you're randomly choosing, you can randomly choose from
>>> a continuous space of many kinds. I don't know if it helps a lot vs the
>>> change in APIs (and continuous spaces don't make as much sense for grid
>>> search)
>>> Of course it helps a lot if you're doing a smarter search over the
>>> space, like what hyperopt does. For that, I mean, one can just use
>>> hyperopt + Spark ML already if desired.
>>>
>>> On Fri, Jan 29, 2021 at 9:01 AM Phillip Henry 
>>> wrote:
>>>
 Thanks, Sean! I hope to offer a PR next week.

 Not sure about a dependency on the grid search, though - but happy to
 hear your thoughts. I mean, you might want to explore logarithmic space
 evenly. For example,  something like "please search 1e-7 to 1e-4" leads to
 a reasonably random sample being {3e-7, 2e-6, 9e-5}. These are (roughly)
 evenly spaced in logarithmic space but not in linear space. So, saying what
 fraction of a grid search to sample wouldn't make sense (unless the grid
 was warped, of course).

 Does that make sense? It might be better for me to just write the code
 as I don't think it would be very complicated.

 Happy to hear your thoughts.

 Phillip



 On Fri, Jan 29, 2021 at 1:47 PM Sean Owen  wrote:

> I don't know of anyone working on that. Yes I think it could be
> useful. I think it might be easiest to implement by simply having some
> parameter to the grid search process that says what fraction of all
> possible combinations you want to randomly test.
>
> On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry 
> wrote:
>
>> Hi,
>>
>> I have no work at the moment so I was wondering if anybody would be
>> interested in me contributing code that generates an Array[ParamMap] for
>> random hyperparameters?
>>
>> Apparently, this technique can find a hyperparameter in the top 5% of
>> parameter space in fewer than 60 iterations with 95% confidence [1].
>>
>> I notice that the Spark code base has only the brute force
>> ParamGridBuilder unless I am missing something.
>>
>> Hyperparameter optimization is an area of interest to me but I don't
>> want to re-invent the wheel. So, if this work is already underway or 
>> there
>> are libraries out there to do it please let me know and I'll shut up :)
>>
>> Regards,
>>
>> Phillip
>>
>> [1]
>> https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html
>>
>


Re: Hyperparameter Optimization via Randomization

2021-01-30 Thread Sean Owen
I was thinking ParamGridBuilder would have to change to accommodate a
continuous range of values, and that's not hard, though other code wouldn't
understand that type of value, like the existing simple grid builder.
It's all possible just wondering if simply randomly sampling the grid is
enough. That would be a simpler change, just a new method or argument.

Yes part of it is that if you really want to search continuous spaces,
hyperopt is probably even better, so how much do you want to put into
Pyspark - something really simple sure.
Not out of the question to do something more complex if it turns out to
also be pretty simple.

On Sat, Jan 30, 2021 at 4:42 AM Phillip Henry 
wrote:

> Hi, Sean.
>
> Perhaps I don't understand. As I see it, ParamGridBuilder builds an
> Array[ParamMap]. What I am proposing is a new class that also builds an
> Array[ParamMap] via its build() method, so there would be no "change in the
> APIs". This new class would, of course, have methods that defined the
> search space (log, linear, etc) over which random values were chosen.
>
> Now, if this is too trivial to warrant the work and people prefer
> Hyperopt, then so be it. It might be useful for people not using Python but
> they can just roll-their-own, I guess.
>
> Anyway, looking forward to hearing what you think.
>
> Regards,
>
> Phillip
>
>
>
> On Fri, Jan 29, 2021 at 4:18 PM Sean Owen  wrote:
>
>> I think that's a bit orthogonal - right now you can't specify continuous
>> spaces. The straightforward thing is to allow random sampling from a big
>> grid. You can create a geometric series of values to try, of course -
>> 0.001, 0.01, 0.1, etc.
>> Yes I get that if you're randomly choosing, you can randomly choose from
>> a continuous space of many kinds. I don't know if it helps a lot vs the
>> change in APIs (and continuous spaces don't make as much sense for grid
>> search)
>> Of course it helps a lot if you're doing a smarter search over the space,
>> like what hyperopt does. For that, I mean, one can just use hyperopt +
>> Spark ML already if desired.
>>
>> On Fri, Jan 29, 2021 at 9:01 AM Phillip Henry 
>> wrote:
>>
>>> Thanks, Sean! I hope to offer a PR next week.
>>>
>>> Not sure about a dependency on the grid search, though - but happy to
>>> hear your thoughts. I mean, you might want to explore logarithmic space
>>> evenly. For example,  something like "please search 1e-7 to 1e-4" leads to
>>> a reasonably random sample being {3e-7, 2e-6, 9e-5}. These are (roughly)
>>> evenly spaced in logarithmic space but not in linear space. So, saying what
>>> fraction of a grid search to sample wouldn't make sense (unless the grid
>>> was warped, of course).
>>>
>>> Does that make sense? It might be better for me to just write the code
>>> as I don't think it would be very complicated.
>>>
>>> Happy to hear your thoughts.
>>>
>>> Phillip
>>>
>>>
>>>
>>> On Fri, Jan 29, 2021 at 1:47 PM Sean Owen  wrote:
>>>
 I don't know of anyone working on that. Yes I think it could be useful.
 I think it might be easiest to implement by simply having some parameter to
 the grid search process that says what fraction of all possible
 combinations you want to randomly test.

 On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry 
 wrote:

> Hi,
>
> I have no work at the moment so I was wondering if anybody would be
> interested in me contributing code that generates an Array[ParamMap] for
> random hyperparameters?
>
> Apparently, this technique can find a hyperparameter in the top 5% of
> parameter space in fewer than 60 iterations with 95% confidence [1].
>
> I notice that the Spark code base has only the brute force
> ParamGridBuilder unless I am missing something.
>
> Hyperparameter optimization is an area of interest to me but I don't
> want to re-invent the wheel. So, if this work is already underway or there
> are libraries out there to do it please let me know and I'll shut up :)
>
> Regards,
>
> Phillip
>
> [1]
> https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html
>



Re: Hyperparameter Optimization via Randomization

2021-01-30 Thread Phillip Henry
Hi, Sean.

Perhaps I don't understand. As I see it, ParamGridBuilder builds an
Array[ParamMap]. What I am proposing is a new class that also builds an
Array[ParamMap] via its build() method, so there would be no "change in the
APIs". This new class would, of course, have methods that defined the
search space (log, linear, etc) over which random values were chosen.

Now, if this is too trivial to warrant the work and people prefer Hyperopt,
then so be it. It might be useful for people not using Python but they can
just roll-their-own, I guess.

Anyway, looking forward to hearing what you think.

Regards,

Phillip



On Fri, Jan 29, 2021 at 4:18 PM Sean Owen  wrote:

> I think that's a bit orthogonal - right now you can't specify continuous
> spaces. The straightforward thing is to allow random sampling from a big
> grid. You can create a geometric series of values to try, of course -
> 0.001, 0.01, 0.1, etc.
> Yes I get that if you're randomly choosing, you can randomly choose from a
> continuous space of many kinds. I don't know if it helps a lot vs the
> change in APIs (and continuous spaces don't make as much sense for grid
> search)
> Of course it helps a lot if you're doing a smarter search over the space,
> like what hyperopt does. For that, I mean, one can just use hyperopt +
> Spark ML already if desired.
>
> On Fri, Jan 29, 2021 at 9:01 AM Phillip Henry 
> wrote:
>
>> Thanks, Sean! I hope to offer a PR next week.
>>
>> Not sure about a dependency on the grid search, though - but happy to
>> hear your thoughts. I mean, you might want to explore logarithmic space
>> evenly. For example,  something like "please search 1e-7 to 1e-4" leads to
>> a reasonably random sample being {3e-7, 2e-6, 9e-5}. These are (roughly)
>> evenly spaced in logarithmic space but not in linear space. So, saying what
>> fraction of a grid search to sample wouldn't make sense (unless the grid
>> was warped, of course).
>>
>> Does that make sense? It might be better for me to just write the code as
>> I don't think it would be very complicated.
>>
>> Happy to hear your thoughts.
>>
>> Phillip
>>
>>
>>
>> On Fri, Jan 29, 2021 at 1:47 PM Sean Owen  wrote:
>>
>>> I don't know of anyone working on that. Yes I think it could be useful.
>>> I think it might be easiest to implement by simply having some parameter to
>>> the grid search process that says what fraction of all possible
>>> combinations you want to randomly test.
>>>
>>> On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry 
>>> wrote:
>>>
 Hi,

 I have no work at the moment so I was wondering if anybody would be
 interested in me contributing code that generates an Array[ParamMap] for
 random hyperparameters?

 Apparently, this technique can find a hyperparameter in the top 5% of
 parameter space in fewer than 60 iterations with 95% confidence [1].

 I notice that the Spark code base has only the brute force
 ParamGridBuilder unless I am missing something.

 Hyperparameter optimization is an area of interest to me but I don't
 want to re-invent the wheel. So, if this work is already underway or there
 are libraries out there to do it please let me know and I'll shut up :)

 Regards,

 Phillip

 [1]
 https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html

>>>


Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Sean Owen
I think that's a bit orthogonal - right now you can't specify continuous
spaces. The straightforward thing is to allow random sampling from a big
grid. You can create a geometric series of values to try, of course -
0.001, 0.01, 0.1, etc.
Yes I get that if you're randomly choosing, you can randomly choose from a
continuous space of many kinds. I don't know if it helps a lot vs the
change in APIs (and continuous spaces don't make as much sense for grid
search)
Of course it helps a lot if you're doing a smarter search over the space,
like what hyperopt does. For that, I mean, one can just use hyperopt +
Spark ML already if desired.

On Fri, Jan 29, 2021 at 9:01 AM Phillip Henry 
wrote:

> Thanks, Sean! I hope to offer a PR next week.
>
> Not sure about a dependency on the grid search, though - but happy to hear
> your thoughts. I mean, you might want to explore logarithmic space evenly.
> For example,  something like "please search 1e-7 to 1e-4" leads to a
> reasonably random sample being {3e-7, 2e-6, 9e-5}. These are (roughly)
> evenly spaced in logarithmic space but not in linear space. So, saying what
> fraction of a grid search to sample wouldn't make sense (unless the grid
> was warped, of course).
>
> Does that make sense? It might be better for me to just write the code as
> I don't think it would be very complicated.
>
> Happy to hear your thoughts.
>
> Phillip
>
>
>
> On Fri, Jan 29, 2021 at 1:47 PM Sean Owen  wrote:
>
>> I don't know of anyone working on that. Yes I think it could be useful. I
>> think it might be easiest to implement by simply having some parameter to
>> the grid search process that says what fraction of all possible
>> combinations you want to randomly test.
>>
>> On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry 
>> wrote:
>>
>>> Hi,
>>>
>>> I have no work at the moment so I was wondering if anybody would be
>>> interested in me contributing code that generates an Array[ParamMap] for
>>> random hyperparameters?
>>>
>>> Apparently, this technique can find a hyperparameter in the top 5% of
>>> parameter space in fewer than 60 iterations with 95% confidence [1].
>>>
>>> I notice that the Spark code base has only the brute force
>>> ParamGridBuilder unless I am missing something.
>>>
>>> Hyperparameter optimization is an area of interest to me but I don't
>>> want to re-invent the wheel. So, if this work is already underway or there
>>> are libraries out there to do it please let me know and I'll shut up :)
>>>
>>> Regards,
>>>
>>> Phillip
>>>
>>> [1]
>>> https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html
>>>
>>


Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Phillip Henry
Thanks, Sean! I hope to offer a PR next week.

Not sure about a dependency on the grid search, though - but happy to hear
your thoughts. I mean, you might want to explore logarithmic space evenly.
For example,  something like "please search 1e-7 to 1e-4" leads to a
reasonably random sample being {3e-7, 2e-6, 9e-5}. These are (roughly)
evenly spaced in logarithmic space but not in linear space. So, saying what
fraction of a grid search to sample wouldn't make sense (unless the grid
was warped, of course).

Does that make sense? It might be better for me to just write the code as I
don't think it would be very complicated.

Happy to hear your thoughts.

Phillip



On Fri, Jan 29, 2021 at 1:47 PM Sean Owen  wrote:

> I don't know of anyone working on that. Yes I think it could be useful. I
> think it might be easiest to implement by simply having some parameter to
> the grid search process that says what fraction of all possible
> combinations you want to randomly test.
>
> On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry 
> wrote:
>
>> Hi,
>>
>> I have no work at the moment so I was wondering if anybody would be
>> interested in me contributing code that generates an Array[ParamMap] for
>> random hyperparameters?
>>
>> Apparently, this technique can find a hyperparameter in the top 5% of
>> parameter space in fewer than 60 iterations with 95% confidence [1].
>>
>> I notice that the Spark code base has only the brute force
>> ParamGridBuilder unless I am missing something.
>>
>> Hyperparameter optimization is an area of interest to me but I don't want
>> to re-invent the wheel. So, if this work is already underway or there are
>> libraries out there to do it please let me know and I'll shut up :)
>>
>> Regards,
>>
>> Phillip
>>
>> [1]
>> https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html
>>
>


Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Sean Owen
I don't know of anyone working on that. Yes I think it could be useful. I
think it might be easiest to implement by simply having some parameter to
the grid search process that says what fraction of all possible
combinations you want to randomly test.

On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry 
wrote:

> Hi,
>
> I have no work at the moment so I was wondering if anybody would be
> interested in me contributing code that generates an Array[ParamMap] for
> random hyperparameters?
>
> Apparently, this technique can find a hyperparameter in the top 5% of
> parameter space in fewer than 60 iterations with 95% confidence [1].
>
> I notice that the Spark code base has only the brute force
> ParamGridBuilder unless I am missing something.
>
> Hyperparameter optimization is an area of interest to me but I don't want
> to re-invent the wheel. So, if this work is already underway or there are
> libraries out there to do it please let me know and I'll shut up :)
>
> Regards,
>
> Phillip
>
> [1]
> https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html
>


Hyperparameter Optimization via Randomization

2021-01-29 Thread Phillip Henry
Hi,

I have no work at the moment so I was wondering if anybody would be
interested in me contributing code that generates an Array[ParamMap] for
random hyperparameters?

Apparently, this technique can find a hyperparameter in the top 5% of
parameter space in fewer than 60 iterations with 95% confidence [1].

I notice that the Spark code base has only the brute force ParamGridBuilder
unless I am missing something.

Hyperparameter optimization is an area of interest to me but I don't want
to re-invent the wheel. So, if this work is already underway or there are
libraries out there to do it please let me know and I'll shut up :)

Regards,

Phillip

[1]
https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html