Peter,

This looks very good! I will definitely have a look at it later.

However, as I warned in pull request #385, I have been making changes
[1] to the tree code and to the ensemble branch. I guess our future
patches are in conflict :( How should we proceed?

[1] https://github.com/glouppe/scikit-learn/tree/ensemble

Best,

Gilles


On 3 November 2011 23:40, Peter Prettenhofer
<[email protected]> wrote:
> Hi everybody,
>
> I created an experimental branch [1] which uses numpy arrays (as Gael
> suggested) instead of the composite structure to represent the tree.
>
> The reason for this was two-fold: first, storage is more compact (no
> structure padding) and writing/reading to disc is more efficient and
> second, traversing the composite structure in cython is inefficient
> compared to pure C. I assume that the reason for the latter is the
> reference counting overhead when we traverse the structure (look at
> the generated c code of the `apply_tree` function in `_tree.c`). I ran
> into this performance problem when I benched my gradient boosting code
> [2] against its R counterpart gbm.
>
> According to our covertype benchmark the new representation is a bit
> slower at training time due to the array re-sizing operations; its
> about a factor of 4-5 faster at prediction time - competitive with
> liblinear on our benchmark! The graphviz exporter has not been updated
> yet - so one test fails.
>
> [1] https://github.com/pprett/scikit-learn/tree/tree-array-repr
> [2] https://github.com/pprett/scikit-learn/tree/gradient_boosting
>
> best,
>  Peter
>
> 2011/10/28 Olivier Grisel <[email protected]>:
>> Victor replied to me in a private message: it might be caused by this
>> bug http://bugs.python.org/issue12775 .
>>
>> Brian, can you disable the gc dans re-run your scripts to check
>> whether this is the case?
>>
>>  import gc
>>  gc.disable()
>>
>> Also Victor would like to know whether the situation is better in python 
>> 3.2+.
>>
>> --
>> Olivier
>>
>> ------------------------------------------------------------------------------
>> The demand for IT networking professionals continues to grow, and the
>> demand for specialized networking skills is growing even more rapidly.
>> Take a complimentary Learning@Cisco Self-Assessment and learn
>> about Cisco certifications, training, and career opportunities.
>> http://p.sf.net/sfu/cisco-dev2dev
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> --
> Peter Prettenhofer
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to