I would have agreed if I was working on a machine with less memory: the server has 144GB and I'm only using a few percent.
-----Original Message----- From: Peter Prettenhofer <[email protected]> Date: Thu, 27 Oct 2011 12:20:40 To: <[email protected]> Reply-To: [email protected] Subject: Re: [Scikit-learn-general] Storing and loading decision tree classifiers 100K nodes is not much larger than my test (60K)... have you checked the memory consumption during the load operation? I suspect that you run out of memory and the huge overhead is due to thrashing. 2011/10/27 Brian Holt <[email protected]>: > Firstly, thanks for all the helpful comments. I didn't know that the > protocol made such a big difference, so until now in ignorance I've > been using the default. > > That said, I left a test running last night on one of our centre's > servers and it took 8hrs to load 20 forests ( each with 10 trees, > depth 20, approx 100K nodes) using `pickle`. It dropped to 6 hours > using `cPickle`. The trees aren't complete binary trees (100K nodes > out of a possible 2million). > > Its really quick to load a single one, what seems to be the trouble is > when I load all of them into memory consecutively. Its gets > progressively slower as more memory is used. I will give it another > try saving and loading using the highest protocol. > > > On 26 October 2011 19:58, Peter Prettenhofer > <[email protected]> wrote: >> I just dumped and loaded a fairly large tree (~40000 nodes; from >> bench_sgd_covertype.py) with cPickle, both operations performed in >> less than 1 sec (w/ and w/o HIGHTEST_PROTOCOL). >> >> Brian: how large are your trees (are they complete binary trees?) >> >> best, >> Peter >> >> >> 2011/10/26 Peter Prettenhofer <[email protected]>: >>> brian, try to save the tree using:: >>> >>> cPickle.dump(tree, f, cPickle.HIGHEST_PROTOCOL) >>> >>> if this doesn't solve the issue we should reconsider Gaels array >>> representation. >>> >>> best, >>> peter >>> >>> Am 26.10.2011 14:37 schrieb "Andreas Mueller" <[email protected]>: >>>> >>>> > My question is; is there a way to improve the performance of loading >>>> > classifiers, either using different pickle options (of which I don't >>>> > know any, but there may be) >>>> > >>>> > >>>> Just to be sure, you used the latest pickling format, right? >>>> cPickle uses the oldest one by default afaik. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> The demand for IT networking professionals continues to grow, and the >>>> demand for specialized networking skills is growing even more rapidly. >>>> Take a complimentary Learning@Cisco Self-Assessment and learn >>>> about Cisco certifications, training, and career opportunities. >>>> http://p.sf.net/sfu/cisco-dev2dev >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> >> -- >> Peter Prettenhofer >> >> ------------------------------------------------------------------------------ >> The demand for IT networking professionals continues to grow, and the >> demand for specialized networking skills is growing even more rapidly. >> Take a complimentary Learning@Cisco Self-Assessment and learn >> about Cisco certifications, training, and career opportunities. >> http://p.sf.net/sfu/cisco-dev2dev >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > > -- > He is no fool who gives what he cannot keep to gain what he cannot lose. > - Jim Elliot. > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- Peter Prettenhofer ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
