Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

Alexandre Vassalotti Fri, 23 Apr 2010 11:40:27 -0700

On Fri, Apr 23, 2010 at 2:11 PM, Dan Gindikin <dgindi...@gmail.com> wrote:
> We were having performance problems unpickling a large pickle file, we were
> getting 170s running time (which was fine), but 1100mb memory usage. Memory
> usage ought to have been about 300mb, this was happening because of memory
> fragmentation, due to many unnecessary "puts" in the pickle stream.
>
> We made a pickletools.optimize inspired tool that could run directly on a
> pickle file and used pickletools.genops. This solved the unpickling problem
> (84s, 382mb).
>
> However the tool itself was using too much memory and time (1100s, 470mb), so
> I recoded it to scan through the pickle stream directly, without going through
> pickletools.genops, giving (240s, 130mb).
>


Collin Winter wrote a simple optimization pass for cPickle in Unladen
Swallow [1]. The code reads through the stream and remove all the
unnecessary PUTs in-place.

[1]: 
http://code.google.com/p/unladen-swallow/source/browse/trunk/Modules/cPickle.c#735

> Other people that deal with large pickle files are probably having similar
> problems, and since this comes up when dealing with large data it is precisely
> in this situation that you probably can't use pickletools.optimize or
> pickletools.genops. It feels like functionality that ought to be added to
> pickletools, is there some way I can contribute this?
>

Just put your code on bugs.python.org and I will take a look.

-- Alexandre
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Unpickling memory usage problem, and a proposed solution

Reply via email to