On Thu, Nov 10, 2011 at 09:20:19AM -0500, Satrajit Ghosh wrote:
> is there a way to implement a coarse->fine grid search with a pipeline?
> essentially i want to link up two or more gridsearchcv modules where the
> range of parameters for a given search is set around the best parameters
> estimated by the previous gridsearchcv.

I don't really see how this would be done. I have recently used such a
strategy in GraphLassoCV, where it was necessary due to points of the
grid where the estimator does not converge. 

I think a new object would be necessary. Something like
'RefinedGridSearchCV', that would either derive from GridSearchCV, or a
comon base class. 

I think that it might be an interesting addition. I say 'might' because I
have given such ideas a try on general problems, and they actually often
do not work well: the score as a function of parameters is often a nasty
landscape. First it has a lot of noise. Second, it may have multiple
local maxima. Third it often has plateaus, and simplex-like optimizers
get stuch there. There is probably a fair amount of work to implement
something like this that actually work on more than a hand full of
estimators. This is the kind of project that should be a long running
experimental branch before we consider merging it.

As a side note, part of the code may be factored out from the
GraphLassoCV code, eventhough I coded specificities of the GraphLasso
problem in there, like where the plateau is, and what to do when
estimators do not converge. 

Note that the new object would not implement the same contract as
GridSearchCV, as GridSearchCV is useable on parameters that are not
defined on a compact space (integers, or categorial variables like a
kernel name: {'rbf', 'linear'}.

> also when i use grid_search in a permutation test setting, i have to reset
> the number of jobs to 1, because i set the number of permutation jobs to
> -1. would this be something that could be useful to do automatically?

I am a bit wary of the kind of magic that may arise here. The reason why
we cannot have nested joblib.Parallel is that daemonic process cannot
fork under unix. Basically, we could, in joblib, capture the fact that we
are a daemonic process, and set n_jobs to 1 after a warning. AFAIK, this
can be tested with the following code:

    from multiprocessing import _current_process
    assert not _current_process._daemonic

If you want to do a pull request in joblib implementing this, I would
take it.

> does joblib have any load balancing capability?

No. This is quite challenging, and, unless someone finds a simple way of
implementing it, beyond the scope of joblib, I believe.

Gaƫl

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to