Hi everybody, I created an experimental branch [1] which uses numpy arrays (as Gael suggested) instead of the composite structure to represent the tree.
The reason for this was two-fold: first, storage is more compact (no structure padding) and writing/reading to disc is more efficient and second, traversing the composite structure in cython is inefficient compared to pure C. I assume that the reason for the latter is the reference counting overhead when we traverse the structure (look at the generated c code of the `apply_tree` function in `_tree.c`). I ran into this performance problem when I benched my gradient boosting code [2] against its R counterpart gbm. According to our covertype benchmark the new representation is a bit slower at training time due to the array re-sizing operations; its about a factor of 4-5 faster at prediction time - competitive with liblinear on our benchmark! The graphviz exporter has not been updated yet - so one test fails. [1] https://github.com/pprett/scikit-learn/tree/tree-array-repr [2] https://github.com/pprett/scikit-learn/tree/gradient_boosting best, Peter 2011/10/28 Olivier Grisel <[email protected]>: > Victor replied to me in a private message: it might be caused by this > bug http://bugs.python.org/issue12775 . > > Brian, can you disable the gc dans re-run your scripts to check > whether this is the case? > > import gc > gc.disable() > > Also Victor would like to know whether the situation is better in python 3.2+. > > -- > Olivier > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- Peter Prettenhofer ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
