Re: [Scikit-learn-general] Hyperparameter optimization

Olivier Grisel Tue, 06 Dec 2011 01:10:07 -0800

2011/12/6 Gael Varoquaux <gael.varoqu...@normalesup.org>:
> On Mon, Dec 05, 2011 at 01:41:53PM -0500, Alexandre Passos wrote:
>> On Mon, Dec 5, 2011 at 13:31, James Bergstra <james.bergs...@gmail.com> 
>> wrote:
>> > I should probably not have scared ppl off speaking of a 250-job
>> > budget.  My intuition would be that with 2-8 hyper-parameters, and 1-3
>> > "significant" hyper-parameters, randomly sampling around 10-30 points
>> > should be pretty reliable.
>
>> So perhaps the best implementation of this is to first generate a grid
>> (from the usual arguments to sklearn's GridSearch), randomly sort it,
>> and iterate over these points until the budget is exhausted?
>
> Does sound reasonnable.
>
> When doing grid searches, I find that an important aspect is that some
> grid points take a fraction of the time of others. This is actually a big
> motivation for doing things in parallel: with enough CPU (8) the time of
> a grid search can be fully limited by the time of computing the fit for
> the different folds on only one grid point.
>
> Thus the notion of budget is relevant, but the right budget is not
> exactly the number of fit points computed.


This is very true and I think that would be a great a area of future
work for James next papers: train 2 Gaussian processes, one to
estimate the expected cross validation error and the other to estimate
the expected runtime (CPU cost).

Then build a decision function that selects the next points to explore
from the estimated Pareto optimal front of those two objectives (low
cross validation error, low CPU cost).

Intuitively this would amount to using a proxy to the uncomputable yet
universal Solomonoff prior as a regularizer which sounds like a good
thing to do (both Epicurus, Occam and Bayes would agree to work that
way if they had access to macbook pros :). See:
http://www.scholarpedia.org/article/Algorithmic_probability

5 years ago I think the state of the art for multi-objective
optimization was using evolutionary algorithms such as Non-dominated
Sorting Genetic Algorithm-II (NSGA-II) and Strength Pareto
Evolutionary Algorithm 2 (SPEA-2). It might have improved since. There
are interesting links here:
http://en.wikipedia.org/wiki/Multi-objective_optimization . A nice
feat of EA is that they are embarrassingly parallelizable (hence
cloud-ready :).

On a more practical standpoint it would also be able to define a
scalar utility function as that combines the two components (cross
validation error and CPU cost) to get a single objective and select
next points by minimizing this one.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Hyperparameter optimization

Reply via email to