On Fri, Oct 28, 2011 at 10:30:01AM +0100, Brian Holt wrote: > cPickle with HIGHEST_PROTOCOL is significantly faster, it averages 15 > seconds to load the 10 tree forest compared to the 5 minutes without.
Good. Thus we do not need to do any modifications to the existing code, it seems. > What still confuses me is why loading the forests and storing them in > a list should be any slower than loading them individually. Technically, I do not think that the pickling/unpickling is an O(n) algorithm, with n the number of objects. I think that it grows quicker. Having looked at the corresponding code, one of the reasons is that the pickling actually works on a graph of self-referencing objects (a list can contain a dict that contains the same list). Thus, to avoid to go in infinite loops, the pickling needs to do loop detection, which is does by checking the 'id' of the different objects. In short, the objects that you are storing have a specific structure (they are unconnected). By storing them separately, you are benefiting from your knowledge of the structure, but the pickling/unpickling algorithm, which solves the general case, does not know that. Gaƫl ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
