2014/1/23 Felipe Eltermann <[email protected]>: > I'm testing different classifiers for a BoW problem and last week I got > disappointed that I couldn't use scikit's DecisionTree. > However, using NaiveBayes was awesome! Thanks for this great piece of > software. > So, if you are planning to add the support for scipy sparse matrix on > DecisionTree, I'd like to help. > > Gilles, I read /sklearn/tree/tree.py and found that there are 4 methods that > receive X as a dense matrix: > BaseDecisionTree.fit() > BaseDecisionTree.predict() > DecisionTreeClassifier.predict_proba() > DecisionTreeClassifier.predict_log_proba() > > fit() calls some Cython classes, that I think you referred to: > _tree.BestSplitter > _tree.PresortBestSplitter > _tree.RandomSplitter > _tree.Gini > _tree.Entropy > _tree.MSE > _tree.FriedmanMSE
As Gilles said, have a look at the Splitters first. You probably want to do feature-wise access to the input data, hence the scipy.sparse.csc_matrix representation should be supported first. If you are not familiar with the internal data structure of the CSC representation, here is a piece of cython code of another estimator that can deal efficiently with CSC sparse input data: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/cd_fast.pyx#L227 which is called by: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/coordinate_descent.py#L450 Also have a look at: http://docs.scipy.org/doc/scipy/reference/sparse.html http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html#scipy.sparse.csc_matrix -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
