[..]
>
> Interesting. What is the order of magnitude of the decrease in speed
> at fit time?

IMHO it's negligible

here are some timings for::

    rs = np.random.RandomState(13)
    X = rs.rand(50000, 100)
    y = rs.randint(2, size=50000)
    from sklearn.tree import tree
    clf = tree.DecisionTreeClassifier(max_depth=2)
    %timeit clf.fit(X, y)
    %timeit clf.predict(X)


Array repr:           fit           predict
max_depth=2    1.11 s    14.5 ms
max_depth=9    4.03 s    15.6 ms
max_depth=20  7.92 s    22.4 ms

Comp repr:           fit           predict
max_depth=2    1.11 s    64.9 ms
max_depth=9    3.96 s    65.8 ms
max_depth=20  8.03 s    72.9 ms

The array repr is significantly faster at prediction time - and
there's still some room for improvement because it might be possible
to vectorize the
predict computation (easy for decision stumps but more difficult for
trees of depth 3 or larger)


>
> Do you think we could get the best of both words by keeping the cython
> struct internally at fit time and converting it to the array
> representation at the end of the fit function to gain the efficient
> serialization and best prediction performance?
>
> Two downsides to this approach:
> - probably more code to maintain (unless the fit impl in cython struct
> is significantly simpler than the array repre counterpart)
> - this will prevent us to implement a tree fit with warm restart where
> the tree it further grown from an initial tree state (I wonder if
> there is any use case for that, it might not be an issue if not)
>

Given the above timings I think there is too little to be gained
considering the additional code complexity of such a hybrid approach.


> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to