Firstly, thanks for all the helpful comments. I didn't know that the protocol made such a big difference, so until now in ignorance I've been using the default.
That said, I left a test running last night on one of our centre's servers and it took 8hrs to load 20 forests ( each with 10 trees, depth 20, approx 100K nodes) using `pickle`. It dropped to 6 hours using `cPickle`. The trees aren't complete binary trees (100K nodes out of a possible 2million). Its really quick to load a single one, what seems to be the trouble is when I load all of them into memory consecutively. Its gets progressively slower as more memory is used. I will give it another try saving and loading using the highest protocol. On 26 October 2011 19:58, Peter Prettenhofer <[email protected]> wrote: > I just dumped and loaded a fairly large tree (~40000 nodes; from > bench_sgd_covertype.py) with cPickle, both operations performed in > less than 1 sec (w/ and w/o HIGHTEST_PROTOCOL). > > Brian: how large are your trees (are they complete binary trees?) > > best, > Peter > > > 2011/10/26 Peter Prettenhofer <[email protected]>: >> brian, try to save the tree using:: >> >> cPickle.dump(tree, f, cPickle.HIGHEST_PROTOCOL) >> >> if this doesn't solve the issue we should reconsider Gaels array >> representation. >> >> best, >> peter >> >> Am 26.10.2011 14:37 schrieb "Andreas Mueller" <[email protected]>: >>> >>> > My question is; is there a way to improve the performance of loading >>> > classifiers, either using different pickle options (of which I don't >>> > know any, but there may be) >>> > >>> > >>> Just to be sure, you used the latest pickling format, right? >>> cPickle uses the oldest one by default afaik. >>> >>> >>> ------------------------------------------------------------------------------ >>> The demand for IT networking professionals continues to grow, and the >>> demand for specialized networking skills is growing even more rapidly. >>> Take a complimentary Learning@Cisco Self-Assessment and learn >>> about Cisco certifications, training, and career opportunities. >>> http://p.sf.net/sfu/cisco-dev2dev >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > > -- > Peter Prettenhofer > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- He is no fool who gives what he cannot keep to gain what he cannot lose. - Jim Elliot. ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
