2011/12/6 Gael Varoquaux <gael.varoqu...@normalesup.org>: > On Mon, Dec 05, 2011 at 01:41:53PM -0500, Alexandre Passos wrote: >> On Mon, Dec 5, 2011 at 13:31, James Bergstra <james.bergs...@gmail.com> >> wrote: >> > I should probably not have scared ppl off speaking of a 250-job >> > budget. My intuition would be that with 2-8 hyper-parameters, and 1-3 >> > "significant" hyper-parameters, randomly sampling around 10-30 points >> > should be pretty reliable. > >> So perhaps the best implementation of this is to first generate a grid >> (from the usual arguments to sklearn's GridSearch), randomly sort it, >> and iterate over these points until the budget is exhausted? > > Does sound reasonnable. > > When doing grid searches, I find that an important aspect is that some > grid points take a fraction of the time of others. This is actually a big > motivation for doing things in parallel: with enough CPU (8) the time of > a grid search can be fully limited by the time of computing the fit for > the different folds on only one grid point. > > Thus the notion of budget is relevant, but the right budget is not > exactly the number of fit points computed.
This is very true and I think that would be a great a area of future work for James next papers: train 2 Gaussian processes, one to estimate the expected cross validation error and the other to estimate the expected runtime (CPU cost). Then build a decision function that selects the next points to explore from the estimated Pareto optimal front of those two objectives (low cross validation error, low CPU cost). Intuitively this would amount to using a proxy to the uncomputable yet universal Solomonoff prior as a regularizer which sounds like a good thing to do (both Epicurus, Occam and Bayes would agree to work that way if they had access to macbook pros :). See: http://www.scholarpedia.org/article/Algorithmic_probability 5 years ago I think the state of the art for multi-objective optimization was using evolutionary algorithms such as Non-dominated Sorting Genetic Algorithm-II (NSGA-II) and Strength Pareto Evolutionary Algorithm 2 (SPEA-2). It might have improved since. There are interesting links here: http://en.wikipedia.org/wiki/Multi-objective_optimization . A nice feat of EA is that they are embarrassingly parallelizable (hence cloud-ready :). On a more practical standpoint it would also be able to define a scalar utility function as that combines the two components (cross validation error and CPU cost) to get a single objective and select next points by minimizing this one. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general