I couldn't help but work on it, it seems. The Pipeline's refit is trivial given that all sub-estimators have a refit that will do nothing if certain parameters are not passed (and in case not all have set their noop_params, we can explicitly only refit from the first step where a parameter is changed).
By trivial, I mean just take the current fit(_transform) implementation and replace fit(_transform) with refit(_transform), passing each step its parameters. Its _plan_refits is not as trivial. First, let us assume no steps are set as parameters <https://github.com/scikit-learn/scikit-learn/pull/1769>. Consider the parameters as a tree, like that attached, with each layer corresponding to a step and the parameters set there. (The tree attached corresponds to a grid, but it needn't be so balanced.) If you reorder the children of each node using the relevant _plan_refits (perhaps with memoization for the common grid case), and aggregate the costs appropriately, you should result in a good plan. Where one can set the steps of the Pipeline as parameters, you simply need to operate over a separate tree for each setting of steps. [The trees aren't an in-memory data structure but a recursive descent.] Of course, this still does some unnecessary work that wouldn't have to be the case for your proposed solution: transform will always be called for every step. Memoization is potentially an easy solution to that. Now, I don't know enough about regularisation paths. I am worried they don't fit naturally in this framework, because they need all candidate values for the the relevant parameter to be specified at once; I was hoping someone would shout that at me when I proposed this. Could you please clarify? Is there something interesting about StandardScaler, or have you thrown it in for fun? or for an example where transform is more expensive than fit? - Joel
pipeline_params_tree.pdf
Description: Adobe PDF document
------------------------------------------------------------------------------ AlienVault Unified Security Management (USM) platform delivers complete security visibility with the essential security capabilities. Easily and efficiently configure, manage, and operate all of your security controls from a single console and one unified framework. Download a free trial. http://p.sf.net/sfu/alienvault_d2d
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
