[issue7471] GZipFile.readline too slow

Antoine Pitrou Thu, 10 Dec 2009 15:03:27 -0800

Antoine Pitrou <pit...@free.fr> added the comment:

> The gz in question is 17mb compressed and 247mb uncompressed. Calling
> zcat the python process uses between 250 and 260 mb with the whole
> string in memory using zcat as a fork. Numbers for the gzip module
> aren't obtainable except for readline(), which doesn't use much memory
> but is very slow. Other methods thrash the machine to death.
> 
> The machine has 300mb free RAM from a total of 1024mb.


That would be the explanation. Reading the whole file at once and then
doing splitlines() on the result consumes twice the memory, since a list
of lines must be constructed while the original data is still around. If
you had more than 600 MB free RAM the splitlines() solution would
probably be adequate :-)

Doing repeated calls to splitlines() on chunks of limited size (say 1MB)
would probably be fast enough without using too much memory. It would be
a bit less trivial to implement though, and it seems you are ok with the
subprocess solution.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7471>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7471] GZipFile.readline too slow

Reply via email to