Hi José,

Scikit-learn doesn't currently have anything out-of-the-box on this front,
and you've identified some ways in which the API makes it tricky.

Yes, there could be a meta-estimator which turns a predictor into a
transformer (via predict, predict_proba, or decision_function), although
excessive meta-estimator nesting is never very neat. It could instead be a
mixin, and you have to sub-class the estimator to make it into a
transformer (although many estimators already have a mixin to provide a
feature selection transform() method).

You could then incorporate these features either by training and fixing the
model (e.g. trained on other data), or including it in cross-validation. In
either case, the default clone operation that happens in cross_val_score
and *SearchCV will clear any fitted attributes, which breaks a fixed model,
and makes CV do repeated work.

It might be nice to have a way to fix a model. CV doing repeated work is a
problem for pipelines in general, as a change to a later estimator's
parameter doesn't affect the fitting of an earlier estimator's model. The
proposed solution is to use memoisation (via joblib's Memory), but the
details, and how to supply this feature without adding complexity, are up
for debate (see e.g. https://github.com/scikit-learn/scikit-learn/pull/2086
).

Cheers,

- Joel


On Thu, Dec 19, 2013 at 7:23 AM, José Ricardo <[email protected]>wrote:

> Hi, I'm trying to stack two classifiers. Right now, it's quite simple.
>
> I want to classify paragraphs of text and want to use their page
> classification as one of the features (pages can be classified in two
> classes).
>
> In other words: I want to use the page classifier's predict_proba as a
> feature of the paragraph classifier.
>
> Searching in the scikit-learn docs I didn't manage to find a standard way
> to stack classifiers in this way, is there any helper for this task?
>
> I created a wrapper that allows me to use the predict_proba method of a
> classifier as a feature. But when I try to cross-validate (via
> cross_val_score) the paragraph classifier, the page classifier is reset
> (cross_val_score tries to fit all classifiers again).
>
> Sorry for the long text, but I'm wondering if there are better ways to
> accomplish this task and I'm still a Machine Learning beginner.
>
> Any help will be appreciated.
>
> Best regards,
>
> José Ricardo
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to