Re: [Scikit-learn-general] Generalised warm start / parameter search

Joel Nothman Mon, 20 May 2013 05:47:41 -0700

On Mon, May 20, 2013 at 8:23 PM, Andreas Mueller
<[email protected]>wrote:

>  On 05/20/2013 05:20 AM, Joel Nothman wrote:
>
> I couldn't help but work on it, it seems.
>
> The Pipeline's refit is trivial given that all sub-estimators have a refit
> that will do nothing if certain parameters are not passed (and in case not
> all have set their noop_params, we can explicitly only refit from the first
> step where a parameter is changed).
>
>  By trivial, I mean just take the current fit(_transform) implementation
> and replace fit(_transform) with refit(_transform), passing each step its
> parameters.
>
>  Its _plan_refits is not as trivial. First, let us assume no steps are
> set as parameters <https://github.com/scikit-learn/scikit-learn/pull/1769>.
> Consider the parameters as a tree, like that attached, with each layer
> corresponding to a step and the parameters set there. (The tree attached
> corresponds to a grid, but it needn't be so balanced.) If you reorder the
> children of each node using the relevant _plan_refits (perhaps with
> memoization for the common grid case), and aggregate the costs
> appropriately, you should result in a good plan.
>
> I will have to think about this after the NIPS deadline ;)
>
>
> Now, I don't know enough about regularisation paths. I am worried they
> don't fit naturally in this framework, because they need all candidate
> values for the the relevant parameter to be specified at once; I was hoping
> someone would shout that at me when I proposed this. Could you please
> clarify?
>
> Basically you have a highest and lowest value of the regularization value,
> fit one model for the highest value and can efficiently produce models for
> all possible values
> in between.
> The thing here is that you can efficiently compute the models for all
> values of the parameter if you compute them together.
> This is basically the opposite of your "group by value" strategy.
> If you are not so much into linear models, another way of thinking about
> it is with trees / forests: if you compute a tree up to a certain depth,
> you basically
> get the tree with smaller depths  for free - similar things are true for
> boosting.
>
For these you basically need to know a maximum and / or minimum parameter
> setting and compute solutions for all parameter setting at once.
>
I feel like this is the really interesting case, which we should try to
> solve.
>

I agree. My approach doesn't necessarily exclude this working if one of:
* sorting parameters in descending order is sufficient;
* we extend the role of _plan_refits to being one of preparation, so the
estimator may set some state like a search range (which would need to be
copied in clone());
* we extend _plan_refits to allowing it to return the parameter settings
modified (though this may make implementing the Pipeline version harder); or
* we extend _plan_refits allowing it to return some additional information
to be passed to fit/refit (this will definitely make implementing the
Pipeline version harder).

Basically your proposal addresses cases where one doesn't need to touch
> parts of the pipeline at all.
> It wouldn't help us get rid of any of the CV objects, though.
>

It also helps get rid of anything that may warm start from a previous
solution...

> Is there something interesting about StandardScaler, or have you thrown it
> in for fun? or for an example where transform is more expensive than fit?
>
>  Just for fun ;) Basically I thought that was one that you don't really
> need to refit at all (for a given fold) as you usually don't search over
> any parameters.
>

Not refitting at all is easy. Not transforming at all is left till later.

So, let's take something like your proposal, but instead of having lists of
values for each parameter (which assumes a grid), we have lists of
parameter settings. So we have a method on each estimator such as:

def iter_fits(self, param_iter, X, y=None):
    """Generate models for each of the given parameter settings
    """

A default implementation would be an expansion of:

    param_iter, costs = self._plan_refits(param_iter)
    for params in param_iter:
        yield params, self.refit(X, y, params)

(It similarly needs a fit_transform variant.)

Note the generator references the parameters (or it could just be the index
into the parameters) as well as the model, so that they may be reordered;
and it generally would yield self as the second argument. By yielding from
the generator, we have full access to the model and its predicting
functions.

I like the look of this better, though it means there's no option for
cleverness about multiprocessing. And the recursive execution of a Pipeline
would be somewhat neater and not require memoizing for transform.

Let's (passively) play with it?

~J

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Generalised warm start / parameter search

Reply via email to