Re: [Scikit-learn-general] Storing and loading decision tree classifiers

Gilles Louppe Fri, 04 Nov 2011 02:58:05 -0700

Peter,

I have myself made a lot of changes in tree.py and _tree.pyx in a lot
of places in the code. Wouldn't it be easier for you to merge your
code into my files? As I see in [1, 2] your changes are localized, and
hence it would be quicker for you to merge them into my files than for
me merging all my changes. But I don't mind, I can also do the other
way around.


[1] 
https://github.com/pprett/scikit-learn/commit/dde0274ec03d5e74cce2475e0c794cd1e1e9a2ce#diff-1
[2] 
https://github.com/pprett/scikit-learn/commit/dde0274ec03d5e74cce2475e0c794cd1e1e9a2ce#diff-2

Gilles

On 4 November 2011 10:36, Peter Prettenhofer
<[email protected]> wrote:
> Hi Gilles,
>
> thanks!
>
> I'll invest some more time into the array repr (tests, visitor,
> benchmarks) and ask Brian for feedback - if that's ok I'd suggest we
> merge the array repr into master and rebase ensemble (and gradient
> boosting). I can do the rebasing -> it should't be a huge problem for
> random forests since the modifications are pretty localized
> (apply_tree + < 10 lines of code in `build_tree` ).
>
> best,
>  Peter
>
> 2011/11/4 Gilles Louppe <[email protected]>:
>> Peter,
>>
>> This looks very good! I will definitely have a look at it later.
>>
>> However, as I warned in pull request #385, I have been making changes
>> [1] to the tree code and to the ensemble branch. I guess our future
>> patches are in conflict :( How should we proceed?
>>
>> [1] https://github.com/glouppe/scikit-learn/tree/ensemble
>>
>> Best,
>>
>> Gilles
>>
>>
>> On 3 November 2011 23:40, Peter Prettenhofer
>> <[email protected]> wrote:
>>> Hi everybody,
>>>
>>> I created an experimental branch [1] which uses numpy arrays (as Gael
>>> suggested) instead of the composite structure to represent the tree.
>>>
>>> The reason for this was two-fold: first, storage is more compact (no
>>> structure padding) and writing/reading to disc is more efficient and
>>> second, traversing the composite structure in cython is inefficient
>>> compared to pure C. I assume that the reason for the latter is the
>>> reference counting overhead when we traverse the structure (look at
>>> the generated c code of the `apply_tree` function in `_tree.c`). I ran
>>> into this performance problem when I benched my gradient boosting code
>>> [2] against its R counterpart gbm.
>>>
>>> According to our covertype benchmark the new representation is a bit
>>> slower at training time due to the array re-sizing operations; its
>>> about a factor of 4-5 faster at prediction time - competitive with
>>> liblinear on our benchmark! The graphviz exporter has not been updated
>>> yet - so one test fails.
>>>
>>> [1] https://github.com/pprett/scikit-learn/tree/tree-array-repr
>>> [2] https://github.com/pprett/scikit-learn/tree/gradient_boosting
>>>
>>> best,
>>>  Peter
>>>
>>> 2011/10/28 Olivier Grisel <[email protected]>:
>>>> Victor replied to me in a private message: it might be caused by this
>>>> bug http://bugs.python.org/issue12775 .
>>>>
>>>> Brian, can you disable the gc dans re-run your scripts to check
>>>> whether this is the case?
>>>>
>>>>  import gc
>>>>  gc.disable()
>>>>
>>>> Also Victor would like to know whether the situation is better in python 
>>>> 3.2+.
>>>>
>>>> --
>>>> Olivier
>>>>
>>>> ------------------------------------------------------------------------------
>>>> The demand for IT networking professionals continues to grow, and the
>>>> demand for specialized networking skills is growing even more rapidly.
>>>> Take a complimentary Learning@Cisco Self-Assessment and learn
>>>> about Cisco certifications, training, and career opportunities.
>>>> http://p.sf.net/sfu/cisco-dev2dev
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>>
>>> --
>>> Peter Prettenhofer
>>>
>>> ------------------------------------------------------------------------------
>>> RSA(R) Conference 2012
>>> Save $700 by Nov 18
>>> Register now
>>> http://p.sf.net/sfu/rsa-sfdev2dev1
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> RSA(R) Conference 2012
>> Save $700 by Nov 18
>> Register now
>> http://p.sf.net/sfu/rsa-sfdev2dev1
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> --
> Peter Prettenhofer
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Storing and loading decision tree classifiers

Reply via email to