Re: [Scikit-learn-general] Generalised warm start / parameter search

Joel Nothman Tue, 21 May 2013 17:55:32 -0700

And now that branch includes a prototype of BaseSearchCV using iter_fits,
and enet_path as an iter_fits method. (Unfortunately, I can't easily do the
same for lars_path as its output is not as straightforward in relation to
estimator coefs as that of enet_path; and its input does not allow
specifying the alphas explicitly.)


For example, take the grid-searched pipeline in
https://gist.github.com/jnothman/5624440 (StandardScaler, SelectKBest,
ElasticNet; cv=3, n_jobs=1):

(iter_fits)$ python -m cProfile example_pipeline.py | grep
'(\(fit\|transform\|fit_transform\))'
        2    0.000    0.000    0.002    0.001 base.py:360(fit_transform)
      241    0.003    0.000    0.124    0.001 coordinate_descent.py:153(fit)
        1    0.000    0.000    0.445    0.445 grid_search.py:679(fit)
        1    0.000    0.000    0.002    0.002 pipeline.py:128(fit)
        7    0.000    0.000    0.001    0.000 preprocessing.py:301(fit)
      247    0.006    0.000    0.019    0.000
preprocessing.py:332(transform)
      265    0.007    0.000    0.043    0.000
univariate_selection.py:285(transform)
        7    0.000    0.000    0.006    0.001
univariate_selection.py:338(fit)

(master)$ python -m cProfile example_pipeline.py | grep
'(\(fit\|transform\|fit_transform\))'
      482    0.002    0.000    0.289    0.001 base.py:343(fit_transform)
      481    0.012    0.000    0.102    0.000 base.py:61(transform)
      241    0.004    0.000    0.105    0.000 coordinate_descent.py:152(fit)
        1    0.000    0.000    1.208    1.208 grid_search.py:672(fit)
      241    0.002    0.000    0.402    0.002 pipeline.py:126(fit)
      241    0.002    0.000    0.032    0.000 preprocessing.py:301(fit)
      481    0.013    0.000    0.037    0.000
preprocessing.py:332(transform)
      241    0.003    0.000    0.184    0.001
univariate_selection.py:292(fit)

(And yes, even with this inefficient implementation of iter_fits, and no
major time-saving components in the pipeline, the iter_fits version seems
to run in 67% of the total time.)

A couple of caveats discovered en-route: iter_fits reorders the
parameter_iterator used in BaseSearchCV, so:
* iter_fits must be deterministic in its reordering of the input
param_iterator to match results across different folds;
* GridSearchCV results reshaping is no longer straightforward; and
* where multiple candidates produce the same score, the reordering affects
which argmax is selected.

~J


On Tue, May 21, 2013 at 12:49 PM, Joel Nothman <[email protected]
> wrote:

> Doing the ordering in the parameter space rather than in sampled
> candidates would also be difficult for regularization paths. And it's very
> much more complicated when Pipeline steps can be set, because then there is
> interdependence between one parameter's value and the availability/ordering
> of parameter names.
>
> In the meantime, I've procrastinated by hacking up an implementation of
> iter_fits for Pipeline, as yet ignoring the setting of steps.
>
> With https://github.com/jnothman/scikit-learn/tree/iter_fits
>
> >>> from sklearn import pipeline, grid_search, datasets, linear_model,
> feature_selection
> >>> clf = pipeline.Pipeline([('sel', feature_selection.SelectKBest()),
> ('clf', linear_model.LogisticRegression())])
> >>> params = grid_search.ParameterGrid({'sel__k': [1,2],
> 'sel__score_func': [feature_selection.chi2, feature_selection.f_classif],
> 'clf__C': [1, 10]})
> >>> iris = datasets.load_iris()
> >>> gen = clf.iter_fits(params, iris.data, iris.target == 1)
> >>> for params, est in gen:
> ...     print(params)
> {'sel__k': 2, 'clf__C': 1, 'sel__score_func': <function chi2 at 0x3803770>}
> {'sel__k': 2, 'clf__C': 10, 'sel__score_func': <function chi2 at
> 0x3803770>}
> {'sel__k': 1, 'clf__C': 1, 'sel__score_func': <function chi2 at 0x3803770>}
> {'sel__k': 1, 'clf__C': 10, 'sel__score_func': <function chi2 at
> 0x3803770>}
> {'sel__k': 1, 'clf__C': 1, 'sel__score_func': <function f_classif at
> 0x3803730>}
> {'sel__k': 1, 'clf__C': 10, 'sel__score_func': <function f_classif at
> 0x3803730>}
> {'sel__k': 2, 'clf__C': 1, 'sel__score_func': <function f_classif at
> 0x3803730>}
> {'sel__k': 2, 'clf__C': 10, 'sel__score_func': <function f_classif at
> 0x3803730>}
>
>
>
> On Tue, May 21, 2013 at 12:24 PM, Joel Nothman <
> [email protected]> wrote:
>
>> Hi Ken,
>>
>> It's not just the optimal parameter ordering, but the estimators noting
>> when fit does or does not need to be reperformed. This is complicated but
>> important in the Pipeline case where earlier steps affect later but not the
>> other way. And some of the earlier steps may be expensive to fit, such as
>> PCA. So not only would you have to specify the optimal ordering, but when
>> particular estimators and sub-estimators can be used without refitting.
>>
>> Currently, the parameter searches clone and fit the estimator for each
>> candidate. Anything that relies on some optimal ordering needs to not do
>> that.
>>
>> And apart from designing an API (which we'd still need to do for your
>> solution), and implementing planners for tricky cases such as Pipeline,
>> it's actually not that complicated.
>>
>> However, you are right to the extent that optimal ordering can often be
>> done when specifying the parameter space (manually or automatically),
>> before sampling candidates from it, and that may be a clever way to go
>> about it. But it may also be tricky to extend to, say, hyperopt parameter
>> spaces.
>>
>>
>> On Tue, May 21, 2013 at 12:11 PM, Kenneth C. Arnold <
>> [email protected]> wrote:
>>
>>> I haven't been following the details of this thread, but I thought: why
>>> automate? GridSearch could, e.g., take an OrderedDict of parameters, and
>>> try combinations in C-array order. (For parallelism, maybe batches could be
>>> queued up in the opposite (i.e., Fortran) order, though I haven't thought
>>> that one through in detail.) Then the user who wants to turbocharge their
>>> grid search can think through the optimal parameter ordering themselves. Or
>>> a dedicated Planner or Driver could work out how to order parameters,
>>> distribute work between machines, and maybe even explore the parameter
>>> space intelligently (a la hyperopt), but the simple objects don't have to
>>> care.
>>>
>>>
>>> -Ken
>>>
>>>
>>> On Mon, May 20, 2013 at 9:58 PM, Joel Nothman <
>>> [email protected]> wrote:
>>>
>>>> Another advantage of the approach I proposed the other day is that its
>>>> overhead in sorting through the parameters is done once per search.
>>>> Solutions that integrate the planning into the fitting require the planning
>>>> to be done once per fold.
>>>>
>>>> A big frustration in implementing any of this: grouping by parameter
>>>> values is complicated by the fact that parameter values may not be
>>>> orderable, boolean-comparable or hashable (numpy arrays are none of these),
>>>> which means the standard groupby(sorted(...)) and defaultdict(list) cannot
>>>> be used to bin them together. (This could be avoided if we had your
>>>> original proposal of receiving a set of values for each parameter; but we
>>>> can't really make the assumption that everything is a grid.)
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Try New Relic Now & We'll Send You this Cool Shirt
>>>> New Relic is the only SaaS-based application performance monitoring
>>>> service
>>>> that delivers powerful full stack analytics. Optimize and monitor your
>>>> browser, app, & servers with just a few lines of code. Try New Relic
>>>> and get this awesome Nerd Life shirt!
>>>> http://p.sf.net/sfu/newrelic_d2d_may
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Try New Relic Now & We'll Send You this Cool Shirt
>>> New Relic is the only SaaS-based application performance monitoring
>>> service
>>> that delivers powerful full stack analytics. Optimize and monitor your
>>> browser, app, & servers with just a few lines of code. Try New Relic
>>> and get this awesome Nerd Life shirt!
>>> http://p.sf.net/sfu/newrelic_d2d_may
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Generalised warm start / parameter search

Reply via email to