Adaboost seems to always enforce dense arrays, irrespective of the base
estimator:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/weight_boosting.py#L93

It should at least be possible to use Adaboost with sparse matrices if the
base estimator supports them (which is the case of the Perceptron). I had a
quick look at the Adaboost code and it doesn't seem that X is used in any
fancy way:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/weight_boosting.py#L436

So, as a first step, I would make sure that Adaboost works with base
estimators which support sparse matrices. This should be infinitely easier
than hacking sparse matrix support in decision trees.

Mathieu


On Mon, Nov 25, 2013 at 7:56 PM, Olivier Grisel <[email protected]>wrote:

> 2013/11/22 Yi Pan <[email protected]>:
> > Dear scikit-learn persons,
> >
> > This is Pan Yi from the University of Washington, US. I am currently
> working
> > on a course project, exploring the performance of AdaBoostClassifier when
> > using the same base classifier, such as DecisionTreeClassifier,
> Perceptron,
> >
> > KNeighborsClassifier, or mixing different classifiers in one boosting.
> > Because my input is sparse matrix (41.8MB, mtx format),
> AdaBoostClassifier
> > doesn't work unless I change it to dense. The problem is that it will run
> > out of memory soon.
> >
> >
> > I want to know whether AdaBoostClassifier and DecisionTreeClassifier have
> > been improved to work with sparse matrix input X.
>
> Unfortunately no, at least not for the DecisionTreeClassifier class.
>
> > If not,  I need to
> > implement my own version of AdaBoost that can takes sparse matrix, could
> you
> > give me some advice on what kind of change should I make in the code?
>
> You would need to write a sparse variant (probably for matrices in the
> CSC layout) of the cython tree code but this is probably not an easy
> task:
>
>
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx
>
> For reference here is a linear regression model that works with the
> CSC sparse matrix representation:
>
>
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/cd_fast.pyx#L227-376
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> Shape the Mobile Experience: Free Subscription
> Software experts and developers: Be at the forefront of tech innovation.
> Intel(R) Software Adrenaline delivers strategic insight and game-changing
> conversations that shape the rapidly evolving mobile landscape. Sign up
> now.
> http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to