Doing the ordering in the parameter space rather than in sampled candidates
would also be difficult for regularization paths. And it's very much more
complicated when Pipeline steps can be set, because then there is
interdependence between one parameter's value and the availability/ordering
of parameter names.

In the meantime, I've procrastinated by hacking up an implementation of
iter_fits for Pipeline, as yet ignoring the setting of steps.

With https://github.com/jnothman/scikit-learn/tree/iter_fits

>>> from sklearn import pipeline, grid_search, datasets, linear_model,
feature_selection
>>> clf = pipeline.Pipeline([('sel', feature_selection.SelectKBest()),
('clf', linear_model.LogisticRegression())])
>>> params = grid_search.ParameterGrid({'sel__k': [1,2], 'sel__score_func':
[feature_selection.chi2, feature_selection.f_classif], 'clf__C': [1, 10]})
>>> iris = datasets.load_iris()
>>> gen = clf.iter_fits(params, iris.data, iris.target == 1)
>>> for params, est in gen:
...     print(params)
{'sel__k': 2, 'clf__C': 1, 'sel__score_func': <function chi2 at 0x3803770>}
{'sel__k': 2, 'clf__C': 10, 'sel__score_func': <function chi2 at 0x3803770>}
{'sel__k': 1, 'clf__C': 1, 'sel__score_func': <function chi2 at 0x3803770>}
{'sel__k': 1, 'clf__C': 10, 'sel__score_func': <function chi2 at 0x3803770>}
{'sel__k': 1, 'clf__C': 1, 'sel__score_func': <function f_classif at
0x3803730>}
{'sel__k': 1, 'clf__C': 10, 'sel__score_func': <function f_classif at
0x3803730>}
{'sel__k': 2, 'clf__C': 1, 'sel__score_func': <function f_classif at
0x3803730>}
{'sel__k': 2, 'clf__C': 10, 'sel__score_func': <function f_classif at
0x3803730>}



On Tue, May 21, 2013 at 12:24 PM, Joel Nothman <[email protected]
> wrote:

> Hi Ken,
>
> It's not just the optimal parameter ordering, but the estimators noting
> when fit does or does not need to be reperformed. This is complicated but
> important in the Pipeline case where earlier steps affect later but not the
> other way. And some of the earlier steps may be expensive to fit, such as
> PCA. So not only would you have to specify the optimal ordering, but when
> particular estimators and sub-estimators can be used without refitting.
>
> Currently, the parameter searches clone and fit the estimator for each
> candidate. Anything that relies on some optimal ordering needs to not do
> that.
>
> And apart from designing an API (which we'd still need to do for your
> solution), and implementing planners for tricky cases such as Pipeline,
> it's actually not that complicated.
>
> However, you are right to the extent that optimal ordering can often be
> done when specifying the parameter space (manually or automatically),
> before sampling candidates from it, and that may be a clever way to go
> about it. But it may also be tricky to extend to, say, hyperopt parameter
> spaces.
>
>
> On Tue, May 21, 2013 at 12:11 PM, Kenneth C. Arnold <
> [email protected]> wrote:
>
>> I haven't been following the details of this thread, but I thought: why
>> automate? GridSearch could, e.g., take an OrderedDict of parameters, and
>> try combinations in C-array order. (For parallelism, maybe batches could be
>> queued up in the opposite (i.e., Fortran) order, though I haven't thought
>> that one through in detail.) Then the user who wants to turbocharge their
>> grid search can think through the optimal parameter ordering themselves. Or
>> a dedicated Planner or Driver could work out how to order parameters,
>> distribute work between machines, and maybe even explore the parameter
>> space intelligently (a la hyperopt), but the simple objects don't have to
>> care.
>>
>>
>> -Ken
>>
>>
>> On Mon, May 20, 2013 at 9:58 PM, Joel Nothman <
>> [email protected]> wrote:
>>
>>> Another advantage of the approach I proposed the other day is that its
>>> overhead in sorting through the parameters is done once per search.
>>> Solutions that integrate the planning into the fitting require the planning
>>> to be done once per fold.
>>>
>>> A big frustration in implementing any of this: grouping by parameter
>>> values is complicated by the fact that parameter values may not be
>>> orderable, boolean-comparable or hashable (numpy arrays are none of these),
>>> which means the standard groupby(sorted(...)) and defaultdict(list) cannot
>>> be used to bin them together. (This could be avoided if we had your
>>> original proposal of receiving a set of values for each parameter; but we
>>> can't really make the assumption that everything is a grid.)
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Try New Relic Now & We'll Send You this Cool Shirt
>>> New Relic is the only SaaS-based application performance monitoring
>>> service
>>> that delivers powerful full stack analytics. Optimize and monitor your
>>> browser, app, & servers with just a few lines of code. Try New Relic
>>> and get this awesome Nerd Life shirt!
>>> http://p.sf.net/sfu/newrelic_d2d_may
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Try New Relic Now & We'll Send You this Cool Shirt
>> New Relic is the only SaaS-based application performance monitoring
>> service
>> that delivers powerful full stack analytics. Optimize and monitor your
>> browser, app, & servers with just a few lines of code. Try New Relic
>> and get this awesome Nerd Life shirt!
>> http://p.sf.net/sfu/newrelic_d2d_may
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to