[Scikit-learn-general] Storing and loading decision tree classifiers

Brian Holt Wed, 26 Oct 2011 05:35:34 -0700

Once a Decision Tree ( or a forest ) has been trained, I almost always
want to save the resulting classifier to disk and then load the
classifier at a later stage for testing.


My dataset is 5.2GB on disk: (690K * 2K) float32s.  I can load this
into memory using `np.load('dataset.npy')` in 20 seconds on our
server.

When a decision tree is trained to depth 20 and pickled, it requires
between 200MB and 300MB on disk, but here is the kicker: it takes
*hours* to load it up.  Last time I tried, it took 16 hours to load a
forest of 10 trees.

My question is; is there a way to improve the performance of loading
classifiers, either using different pickle options (of which I don't
know any, but there may be), or by using a different scheme
(marshalling sounds promising based on [1]), or any other way?
Perhaps I can implement a a pickle loader in cython?

[1] 
http://stackoverflow.com/questions/329249/why-is-marshal-so-much-faster-than-pickle

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Storing and loading decision tree classifiers

Reply via email to