Hi Gilles, thanks!
I'll invest some more time into the array repr (tests, visitor, benchmarks) and ask Brian for feedback - if that's ok I'd suggest we merge the array repr into master and rebase ensemble (and gradient boosting). I can do the rebasing -> it should't be a huge problem for random forests since the modifications are pretty localized (apply_tree + < 10 lines of code in `build_tree` ). best, Peter 2011/11/4 Gilles Louppe <[email protected]>: > Peter, > > This looks very good! I will definitely have a look at it later. > > However, as I warned in pull request #385, I have been making changes > [1] to the tree code and to the ensemble branch. I guess our future > patches are in conflict :( How should we proceed? > > [1] https://github.com/glouppe/scikit-learn/tree/ensemble > > Best, > > Gilles > > > On 3 November 2011 23:40, Peter Prettenhofer > <[email protected]> wrote: >> Hi everybody, >> >> I created an experimental branch [1] which uses numpy arrays (as Gael >> suggested) instead of the composite structure to represent the tree. >> >> The reason for this was two-fold: first, storage is more compact (no >> structure padding) and writing/reading to disc is more efficient and >> second, traversing the composite structure in cython is inefficient >> compared to pure C. I assume that the reason for the latter is the >> reference counting overhead when we traverse the structure (look at >> the generated c code of the `apply_tree` function in `_tree.c`). I ran >> into this performance problem when I benched my gradient boosting code >> [2] against its R counterpart gbm. >> >> According to our covertype benchmark the new representation is a bit >> slower at training time due to the array re-sizing operations; its >> about a factor of 4-5 faster at prediction time - competitive with >> liblinear on our benchmark! The graphviz exporter has not been updated >> yet - so one test fails. >> >> [1] https://github.com/pprett/scikit-learn/tree/tree-array-repr >> [2] https://github.com/pprett/scikit-learn/tree/gradient_boosting >> >> best, >> Peter >> >> 2011/10/28 Olivier Grisel <[email protected]>: >>> Victor replied to me in a private message: it might be caused by this >>> bug http://bugs.python.org/issue12775 . >>> >>> Brian, can you disable the gc dans re-run your scripts to check >>> whether this is the case? >>> >>> import gc >>> gc.disable() >>> >>> Also Victor would like to know whether the situation is better in python >>> 3.2+. >>> >>> -- >>> Olivier >>> >>> ------------------------------------------------------------------------------ >>> The demand for IT networking professionals continues to grow, and the >>> demand for specialized networking skills is growing even more rapidly. >>> Take a complimentary Learning@Cisco Self-Assessment and learn >>> about Cisco certifications, training, and career opportunities. >>> http://p.sf.net/sfu/cisco-dev2dev >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> >> -- >> Peter Prettenhofer >> >> ------------------------------------------------------------------------------ >> RSA(R) Conference 2012 >> Save $700 by Nov 18 >> Register now >> http://p.sf.net/sfu/rsa-sfdev2dev1 >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > ------------------------------------------------------------------------------ > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsa-sfdev2dev1 > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- Peter Prettenhofer ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
