Peter, I have myself made a lot of changes in tree.py and _tree.pyx in a lot of places in the code. Wouldn't it be easier for you to merge your code into my files? As I see in [1, 2] your changes are localized, and hence it would be quicker for you to merge them into my files than for me merging all my changes. But I don't mind, I can also do the other way around.
[1] https://github.com/pprett/scikit-learn/commit/dde0274ec03d5e74cce2475e0c794cd1e1e9a2ce#diff-1 [2] https://github.com/pprett/scikit-learn/commit/dde0274ec03d5e74cce2475e0c794cd1e1e9a2ce#diff-2 Gilles On 4 November 2011 10:36, Peter Prettenhofer <[email protected]> wrote: > Hi Gilles, > > thanks! > > I'll invest some more time into the array repr (tests, visitor, > benchmarks) and ask Brian for feedback - if that's ok I'd suggest we > merge the array repr into master and rebase ensemble (and gradient > boosting). I can do the rebasing -> it should't be a huge problem for > random forests since the modifications are pretty localized > (apply_tree + < 10 lines of code in `build_tree` ). > > best, > Peter > > 2011/11/4 Gilles Louppe <[email protected]>: >> Peter, >> >> This looks very good! I will definitely have a look at it later. >> >> However, as I warned in pull request #385, I have been making changes >> [1] to the tree code and to the ensemble branch. I guess our future >> patches are in conflict :( How should we proceed? >> >> [1] https://github.com/glouppe/scikit-learn/tree/ensemble >> >> Best, >> >> Gilles >> >> >> On 3 November 2011 23:40, Peter Prettenhofer >> <[email protected]> wrote: >>> Hi everybody, >>> >>> I created an experimental branch [1] which uses numpy arrays (as Gael >>> suggested) instead of the composite structure to represent the tree. >>> >>> The reason for this was two-fold: first, storage is more compact (no >>> structure padding) and writing/reading to disc is more efficient and >>> second, traversing the composite structure in cython is inefficient >>> compared to pure C. I assume that the reason for the latter is the >>> reference counting overhead when we traverse the structure (look at >>> the generated c code of the `apply_tree` function in `_tree.c`). I ran >>> into this performance problem when I benched my gradient boosting code >>> [2] against its R counterpart gbm. >>> >>> According to our covertype benchmark the new representation is a bit >>> slower at training time due to the array re-sizing operations; its >>> about a factor of 4-5 faster at prediction time - competitive with >>> liblinear on our benchmark! The graphviz exporter has not been updated >>> yet - so one test fails. >>> >>> [1] https://github.com/pprett/scikit-learn/tree/tree-array-repr >>> [2] https://github.com/pprett/scikit-learn/tree/gradient_boosting >>> >>> best, >>> Peter >>> >>> 2011/10/28 Olivier Grisel <[email protected]>: >>>> Victor replied to me in a private message: it might be caused by this >>>> bug http://bugs.python.org/issue12775 . >>>> >>>> Brian, can you disable the gc dans re-run your scripts to check >>>> whether this is the case? >>>> >>>> import gc >>>> gc.disable() >>>> >>>> Also Victor would like to know whether the situation is better in python >>>> 3.2+. >>>> >>>> -- >>>> Olivier >>>> >>>> ------------------------------------------------------------------------------ >>>> The demand for IT networking professionals continues to grow, and the >>>> demand for specialized networking skills is growing even more rapidly. >>>> Take a complimentary Learning@Cisco Self-Assessment and learn >>>> about Cisco certifications, training, and career opportunities. >>>> http://p.sf.net/sfu/cisco-dev2dev >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> >>> -- >>> Peter Prettenhofer >>> >>> ------------------------------------------------------------------------------ >>> RSA(R) Conference 2012 >>> Save $700 by Nov 18 >>> Register now >>> http://p.sf.net/sfu/rsa-sfdev2dev1 >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> ------------------------------------------------------------------------------ >> RSA(R) Conference 2012 >> Save $700 by Nov 18 >> Register now >> http://p.sf.net/sfu/rsa-sfdev2dev1 >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > > -- > Peter Prettenhofer > > ------------------------------------------------------------------------------ > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsa-sfdev2dev1 > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
