Hi everybody,

I created an experimental branch [1] which uses numpy arrays (as Gael
suggested) instead of the composite structure to represent the tree.

The reason for this was two-fold: first, storage is more compact (no
structure padding) and writing/reading to disc is more efficient and
second, traversing the composite structure in cython is inefficient
compared to pure C. I assume that the reason for the latter is the
reference counting overhead when we traverse the structure (look at
the generated c code of the `apply_tree` function in `_tree.c`). I ran
into this performance problem when I benched my gradient boosting code
[2] against its R counterpart gbm.

According to our covertype benchmark the new representation is a bit
slower at training time due to the array re-sizing operations; its
about a factor of 4-5 faster at prediction time - competitive with
liblinear on our benchmark! The graphviz exporter has not been updated
yet - so one test fails.

[1] https://github.com/pprett/scikit-learn/tree/tree-array-repr
[2] https://github.com/pprett/scikit-learn/tree/gradient_boosting

best,
 Peter

2011/10/28 Olivier Grisel <[email protected]>:
> Victor replied to me in a private message: it might be caused by this
> bug http://bugs.python.org/issue12775 .
>
> Brian, can you disable the gc dans re-run your scripts to check
> whether this is the case?
>
>  import gc
>  gc.disable()
>
> Also Victor would like to know whether the situation is better in python 3.2+.
>
> --
> Olivier
>
> ------------------------------------------------------------------------------
> The demand for IT networking professionals continues to grow, and the
> demand for specialized networking skills is growing even more rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn
> about Cisco certifications, training, and career opportunities.
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to