Unless it is an estimator with warm_start=True, fit() should not be
affected by previous state (I hope I'm right in that :P).
And there's no shame in doing cross-validation by hand =) But it would
indeed be nice if stacking were easier in scikit-learn.
On Fri, Dec 20, 2013 at 6:47 AM, José Ricardo <[email protected]>wrote:
> Hi Joel, thank you for such a comprehensive answer. Only one more
> question, if you don't mind.
>
> I'm considering doing the cross-validation by hand. Are there any concerns
> on calling #fit multiple times on the same classifier (without cloning it)?
>
> Best regards,
>
> José
>
>
> On Wed, Dec 18, 2013 at 7:33 PM, Joel Nothman <[email protected]>wrote:
>
>> Hi José,
>>
>> Scikit-learn doesn't currently have anything out-of-the-box on this
>> front, and you've identified some ways in which the API makes it tricky.
>>
>> Yes, there could be a meta-estimator which turns a predictor into a
>> transformer (via predict, predict_proba, or decision_function), although
>> excessive meta-estimator nesting is never very neat. It could instead be a
>> mixin, and you have to sub-class the estimator to make it into a
>> transformer (although many estimators already have a mixin to provide a
>> feature selection transform() method).
>>
>> You could then incorporate these features either by training and fixing
>> the model (e.g. trained on other data), or including it in
>> cross-validation. In either case, the default clone operation that happens
>> in cross_val_score and *SearchCV will clear any fitted attributes, which
>> breaks a fixed model, and makes CV do repeated work.
>>
>> It might be nice to have a way to fix a model. CV doing repeated work is
>> a problem for pipelines in general, as a change to a later estimator's
>> parameter doesn't affect the fitting of an earlier estimator's model. The
>> proposed solution is to use memoisation (via joblib's Memory), but the
>> details, and how to supply this feature without adding complexity, are up
>> for debate (see e.g.
>> https://github.com/scikit-learn/scikit-learn/pull/2086).
>>
>> Cheers,
>>
>> - Joel
>>
>>
>> On Thu, Dec 19, 2013 at 7:23 AM, José Ricardo <[email protected]>wrote:
>>
>>> Hi, I'm trying to stack two classifiers. Right now, it's quite simple.
>>>
>>> I want to classify paragraphs of text and want to use their page
>>> classification as one of the features (pages can be classified in two
>>> classes).
>>>
>>> In other words: I want to use the page classifier's predict_proba as a
>>> feature of the paragraph classifier.
>>>
>>> Searching in the scikit-learn docs I didn't manage to find a standard
>>> way to stack classifiers in this way, is there any helper for this task?
>>>
>>> I created a wrapper that allows me to use the predict_proba method of a
>>> classifier as a feature. But when I try to cross-validate (via
>>> cross_val_score) the paragraph classifier, the page classifier is reset
>>> (cross_val_score tries to fit all classifiers again).
>>>
>>> Sorry for the long text, but I'm wondering if there are better ways to
>>> accomplish this task and I'm still a Machine Learning beginner.
>>>
>>> Any help will be appreciated.
>>>
>>> Best regards,
>>>
>>> José Ricardo
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>> organizations don't have a clear picture of how application performance
>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>> your
>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>> AppDynamics Pro!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Rapidly troubleshoot problems before they affect your business. Most IT
>> organizations don't have a clear picture of how application performance
>> affects their revenue. With AppDynamics, you get 100% visibility into your
>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
>> Pro!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general