[..]
>
> Interesting. What is the order of magnitude of the decrease in speed
> at fit time?
IMHO it's negligible
here are some timings for::
rs = np.random.RandomState(13)
X = rs.rand(50000, 100)
y = rs.randint(2, size=50000)
from sklearn.tree import tree
clf = tree.DecisionTreeClassifier(max_depth=2)
%timeit clf.fit(X, y)
%timeit clf.predict(X)
Array repr: fit predict
max_depth=2 1.11 s 14.5 ms
max_depth=9 4.03 s 15.6 ms
max_depth=20 7.92 s 22.4 ms
Comp repr: fit predict
max_depth=2 1.11 s 64.9 ms
max_depth=9 3.96 s 65.8 ms
max_depth=20 8.03 s 72.9 ms
The array repr is significantly faster at prediction time - and
there's still some room for improvement because it might be possible
to vectorize the
predict computation (easy for decision stumps but more difficult for
trees of depth 3 or larger)
>
> Do you think we could get the best of both words by keeping the cython
> struct internally at fit time and converting it to the array
> representation at the end of the fit function to gain the efficient
> serialization and best prediction performance?
>
> Two downsides to this approach:
> - probably more code to maintain (unless the fit impl in cython struct
> is significantly simpler than the array repre counterpart)
> - this will prevent us to implement a tree fit with warm restart where
> the tree it further grown from an initial tree state (I wonder if
> there is any use case for that, it might not be an issue if not)
>
Given the above timings I think there is too little to be gained
considering the additional code complexity of such a hybrid approach.
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
--
Peter Prettenhofer
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general