2014/1/23 Felipe Eltermann <[email protected]>:
> I'm testing different classifiers for a BoW problem and last week I got
> disappointed that I couldn't use scikit's DecisionTree.
> However, using NaiveBayes was awesome! Thanks for this great piece of
> software.
> So, if you are planning to add the support for scipy sparse matrix on
> DecisionTree, I'd like to help.
>
> Gilles, I read /sklearn/tree/tree.py and found that there are 4 methods that
> receive X as a dense matrix:
> BaseDecisionTree.fit()
> BaseDecisionTree.predict()
> DecisionTreeClassifier.predict_proba()
> DecisionTreeClassifier.predict_log_proba()
>
> fit() calls some Cython classes, that I think you referred to:
> _tree.BestSplitter
> _tree.PresortBestSplitter
> _tree.RandomSplitter
> _tree.Gini
> _tree.Entropy
> _tree.MSE
> _tree.FriedmanMSE

As Gilles said, have a look at the Splitters first. You probably want
to do feature-wise access to the input data, hence the
scipy.sparse.csc_matrix representation should be supported first.  If
you are not familiar with the internal data structure of the CSC
representation, here is a piece of cython code of another estimator
that can deal efficiently with CSC sparse input data:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/cd_fast.pyx#L227

which is called by:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/coordinate_descent.py#L450

Also have a look at:

http://docs.scipy.org/doc/scipy/reference/sparse.html
http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html#scipy.sparse.csc_matrix

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to