Hi,

Something I was wondering is whether sparse support in decision trees would
actually be useful. Do decision trees (or ensembles of them like random
forests) work better than linear models for high-dimensional data?

It would be nice to take the News20 dataset, pre-select the top 10k
features (or more if possible) then measure test accuracy on the densified
dataset. I would be very interested in hearing the results.

And regardless of accuracy, some algorithms (e.g., GMM) scale very poorly
with n_features. I wonder if it's not the case for decision trees too.

Gilles, Peter, Arnaud, any opinion / experience?

Mathieu


On Wed, Jan 22, 2014 at 2:13 PM, Maheshakya Wijewardena <
[email protected]> wrote:

> Hi,
>
> I have been using Scikit-learn One hot encoder for data encoding and the
> resulting array supports only for a few models such as logistic regression,
> SVC, etc. When I convert those sparse matrices with list comprehension or
> toarray() function to dense matrices, resulting arrays become too large for
> those classifiers such as Decision trees or any other tree based
> classifier.
> I saw a GSOC project idea of implementing this as mentioned here.
>
> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2014
> I'm looking forward to apply for GSOC this year as well, so I would like
> start working on this. From where can I get support for this. (There're no
> possible mentors assigned for this)
>
> Regards,
> Maheshakya
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to