Ruben Vorderman <r.h.p.vorder...@lumc.nl> added the comment:

I have found that using the timeit module provides more precise measurements:

For a simple gzip header. (As returned by gzip.compress or zlib.compress with 
wbits=31)
./python -m timeit -s "import io; data = 
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00';
 from gzip import _read_gzip_header" '_read_gzip_header(io.BytesIO(data))'


For a gzip header with FNAME. (Returned by gzip itself and by Python's GzipFile)
./python -m timeit -s "import io; data = 
b'\x1f\x8b\x08\x08j\x1a\x9ea\x02\xffcompressable_file\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00';
 from gzip import _read_gzip_header" '_read_gzip_header(io.BytesIO(data))'

For a gzip header with all flags set:
./python -m timeit -s 'import gzip, io; data = 
b"\x1f\x8b\x08\x1f\x00\x00\x00\x00\x00\xff\x05\x00extraname\x00comment\x00\xe9T";
 from gzip import _read_gzip_header' '_read_gzip_header(io.BytesIO(data))'


Since performance is most critical for in-memory compression and decompression, 
I now optimized for no flags.
Before (current main): 500000 loops, best of 5: 469 nsec per loop
after (PR): 1000000 loops, best of 5: 390 nsec per loop

For the most common case of only FNAME set:
before: 200000 loops, best of 5: 1.48 usec per loop
after: 200000 loops, best of 5: 1.45 usec per loop

For the case where FCHRC is set:
before: 200000 loops, best of 5: 1.62 usec per loop
after: 100000 loops, best of 5: 2.43 usec per loop


So this PR is now a clear win for decompressing anything that has been 
compressed with gzip.compress. It is neutral for normal file decompression. 
There is a performance cost associated with correctly checking the header, but 
that is expected. It is better than the alternative of not checking it.

----------
Added file: https://bugs.python.org/file50459/benchmark_gzip_read_header.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45509>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to