On Thu, Nov 10, 2011 at 09:20:19AM -0500, Satrajit Ghosh wrote: > is there a way to implement a coarse->fine grid search with a pipeline? > essentially i want to link up two or more gridsearchcv modules where the > range of parameters for a given search is set around the best parameters > estimated by the previous gridsearchcv.
I don't really see how this would be done. I have recently used such a strategy in GraphLassoCV, where it was necessary due to points of the grid where the estimator does not converge. I think a new object would be necessary. Something like 'RefinedGridSearchCV', that would either derive from GridSearchCV, or a comon base class. I think that it might be an interesting addition. I say 'might' because I have given such ideas a try on general problems, and they actually often do not work well: the score as a function of parameters is often a nasty landscape. First it has a lot of noise. Second, it may have multiple local maxima. Third it often has plateaus, and simplex-like optimizers get stuch there. There is probably a fair amount of work to implement something like this that actually work on more than a hand full of estimators. This is the kind of project that should be a long running experimental branch before we consider merging it. As a side note, part of the code may be factored out from the GraphLassoCV code, eventhough I coded specificities of the GraphLasso problem in there, like where the plateau is, and what to do when estimators do not converge. Note that the new object would not implement the same contract as GridSearchCV, as GridSearchCV is useable on parameters that are not defined on a compact space (integers, or categorial variables like a kernel name: {'rbf', 'linear'}. > also when i use grid_search in a permutation test setting, i have to reset > the number of jobs to 1, because i set the number of permutation jobs to > -1. would this be something that could be useful to do automatically? I am a bit wary of the kind of magic that may arise here. The reason why we cannot have nested joblib.Parallel is that daemonic process cannot fork under unix. Basically, we could, in joblib, capture the fact that we are a daemonic process, and set n_jobs to 1 after a warning. AFAIK, this can be tested with the following code: from multiprocessing import _current_process assert not _current_process._daemonic If you want to do a pull request in joblib implementing this, I would take it. > does joblib have any load balancing capability? No. This is quite challenging, and, unless someone finds a simple way of implementing it, beyond the scope of joblib, I believe. Gaƫl ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general