Peter, This looks very good! I will definitely have a look at it later.
However, as I warned in pull request #385, I have been making changes [1] to the tree code and to the ensemble branch. I guess our future patches are in conflict :( How should we proceed? [1] https://github.com/glouppe/scikit-learn/tree/ensemble Best, Gilles On 3 November 2011 23:40, Peter Prettenhofer <[email protected]> wrote: > Hi everybody, > > I created an experimental branch [1] which uses numpy arrays (as Gael > suggested) instead of the composite structure to represent the tree. > > The reason for this was two-fold: first, storage is more compact (no > structure padding) and writing/reading to disc is more efficient and > second, traversing the composite structure in cython is inefficient > compared to pure C. I assume that the reason for the latter is the > reference counting overhead when we traverse the structure (look at > the generated c code of the `apply_tree` function in `_tree.c`). I ran > into this performance problem when I benched my gradient boosting code > [2] against its R counterpart gbm. > > According to our covertype benchmark the new representation is a bit > slower at training time due to the array re-sizing operations; its > about a factor of 4-5 faster at prediction time - competitive with > liblinear on our benchmark! The graphviz exporter has not been updated > yet - so one test fails. > > [1] https://github.com/pprett/scikit-learn/tree/tree-array-repr > [2] https://github.com/pprett/scikit-learn/tree/gradient_boosting > > best, > Peter > > 2011/10/28 Olivier Grisel <[email protected]>: >> Victor replied to me in a private message: it might be caused by this >> bug http://bugs.python.org/issue12775 . >> >> Brian, can you disable the gc dans re-run your scripts to check >> whether this is the case? >> >> import gc >> gc.disable() >> >> Also Victor would like to know whether the situation is better in python >> 3.2+. >> >> -- >> Olivier >> >> ------------------------------------------------------------------------------ >> The demand for IT networking professionals continues to grow, and the >> demand for specialized networking skills is growing even more rapidly. >> Take a complimentary Learning@Cisco Self-Assessment and learn >> about Cisco certifications, training, and career opportunities. >> http://p.sf.net/sfu/cisco-dev2dev >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > > -- > Peter Prettenhofer > > ------------------------------------------------------------------------------ > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsa-sfdev2dev1 > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
