On Tue, Dec 6, 2011 at 4:09 AM, Olivier Grisel <olivier.gri...@ensta.org> wrote: > 2011/12/6 Gael Varoquaux <gael.varoqu...@normalesup.org>: >> On Mon, Dec 05, 2011 at 01:41:53PM -0500, Alexandre Passos wrote: >>> On Mon, Dec 5, 2011 at 13:31, James Bergstra <james.bergs...@gmail.com> >>> wrote: >>> > I should probably not have scared ppl off speaking of a 250-job >>> > budget. My intuition would be that with 2-8 hyper-parameters, and 1-3 >>> > "significant" hyper-parameters, randomly sampling around 10-30 points >>> > should be pretty reliable. >> >>> So perhaps the best implementation of this is to first generate a grid >>> (from the usual arguments to sklearn's GridSearch), randomly sort it, >>> and iterate over these points until the budget is exhausted? >> >> Does sound reasonnable. >> >> When doing grid searches, I find that an important aspect is that some >> grid points take a fraction of the time of others. This is actually a big >> motivation for doing things in parallel: with enough CPU (8) the time of >> a grid search can be fully limited by the time of computing the fit for >> the different folds on only one grid point. >> >> Thus the notion of budget is relevant, but the right budget is not >> exactly the number of fit points computed. > > This is very true and I think that would be a great a area of future > work for James next papers: train 2 Gaussian processes, one to > estimate the expected cross validation error and the other to estimate > the expected runtime (CPU cost). > > Then build a decision function that selects the next points to explore > from the estimated Pareto optimal front of those two objectives (low > cross validation error, low CPU cost). >
You got me Olivier! I've definitely been thinking about this. Nothing to report so far though. I suspect there may be some subtleties about how to go about it but I haven't tried much. - James ------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general