I couldn't help but work on it, it seems.

The Pipeline's refit is trivial given that all sub-estimators have a refit
that will do nothing if certain parameters are not passed (and in case not
all have set their noop_params, we can explicitly only refit from the first
step where a parameter is changed).

By trivial, I mean just take the current fit(_transform) implementation and
replace fit(_transform) with refit(_transform), passing each step its
parameters.

Its _plan_refits is not as trivial. First, let us assume no steps are set
as parameters <https://github.com/scikit-learn/scikit-learn/pull/1769>.
Consider the parameters as a tree, like that attached, with each layer
corresponding to a step and the parameters set there. (The tree attached
corresponds to a grid, but it needn't be so balanced.) If you reorder the
children of each node using the relevant _plan_refits (perhaps with
memoization for the common grid case), and aggregate the costs
appropriately, you should result in a good plan.

Where one can set the steps of the Pipeline as parameters, you simply need
to operate over a separate tree for each setting of steps. [The trees
aren't an in-memory data structure but a recursive descent.]

Of course, this still does some unnecessary work that wouldn't have to be
the case for your proposed solution: transform will always be called for
every step. Memoization is potentially an easy solution to that.

Now, I don't know enough about regularisation paths. I am worried they
don't fit naturally in this framework, because they need all candidate
values for the the relevant parameter to be specified at once; I was hoping
someone would shout that at me when I proposed this. Could you please
clarify?

Is there something interesting about StandardScaler, or have you thrown it
in for fun? or for an example where transform is more expensive than fit?

- Joel

Attachment: pipeline_params_tree.pdf
Description: Adobe PDF document

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to