Martin Panter added the comment:

The gzip (as well as LZMA and bzip) modules should now use buffer and chunk 
sizes of 8 KiB (= io.DEFAULT_BUFFER_SIZE) for most read() and seek() type 
operations.

I have a patch that adds a buffer_size parameter to the three compression 
modules if anyone is interested. It may need a bit work, e.g. adding the 
parameter to open(), mimicking the built-in open() function when buffer_size=0, 
etc.

I did a quick test of seeking 100 MB into a gzip file, using the original 
Python 3.4.3 module, the current code that uses 8 KiB chunk sizes, and then my 
patched code with various chunk sizes. It looks like 8 KiB is significantly 
better than the previous code. My tests are peaking at about 64 KiB, but I 
guess that depends on the computer (cache etc). Anyway, 8 KiB seems like a good 
compromise without hogging all the fast memory cache or whatever, so I suggest 
we close this bug.

Command line for timing looked like:

python -m timeit -s 'import gzip' \
    'gzip.GzipFile("100M.gz", buffer_size=8192).seek(int(100e6))'

Python 3.4.3: 10 loops, best of 3: 2.36 sec per loop
Currently (8 KiB chunking): 10 loops, best of 3: 693 msec per loop
buffer_size=1024: 10 loops, best of 3: 2.46 sec per loop
buffer_size=8192: 10 loops, best of 3: 677 msec per loop
buffer_size=16 * 1024: 10 loops, best of 3: 502 msec per loop
buffer_size=int(60e3): 10 loops, best of 3: 400 msec per loop
buffer_size=64 * 1024: 10 loops, best of 3: 398 msec per loop
buffer_size=int(80e3): 10 loops, best of 3: 406 msec per loop
buffer_size=16 * 8192: 10 loops, best of 3: 469 msec per loop

----------
status: open -> pending

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20962>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to