Ruben Vorderman <r.h.p.vorder...@lumc.nl> added the comment:
I have found that using the timeit module provides more precise measurements: For a simple gzip header. (As returned by gzip.compress or zlib.compress with wbits=31) ./python -m timeit -s "import io; data = b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00'; from gzip import _read_gzip_header" '_read_gzip_header(io.BytesIO(data))' For a gzip header with FNAME. (Returned by gzip itself and by Python's GzipFile) ./python -m timeit -s "import io; data = b'\x1f\x8b\x08\x08j\x1a\x9ea\x02\xffcompressable_file\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00'; from gzip import _read_gzip_header" '_read_gzip_header(io.BytesIO(data))' For a gzip header with all flags set: ./python -m timeit -s 'import gzip, io; data = b"\x1f\x8b\x08\x1f\x00\x00\x00\x00\x00\xff\x05\x00extraname\x00comment\x00\xe9T"; from gzip import _read_gzip_header' '_read_gzip_header(io.BytesIO(data))' Since performance is most critical for in-memory compression and decompression, I now optimized for no flags. Before (current main): 500000 loops, best of 5: 469 nsec per loop after (PR): 1000000 loops, best of 5: 390 nsec per loop For the most common case of only FNAME set: before: 200000 loops, best of 5: 1.48 usec per loop after: 200000 loops, best of 5: 1.45 usec per loop For the case where FCHRC is set: before: 200000 loops, best of 5: 1.62 usec per loop after: 100000 loops, best of 5: 2.43 usec per loop So this PR is now a clear win for decompressing anything that has been compressed with gzip.compress. It is neutral for normal file decompression. There is a performance cost associated with correctly checking the header, but that is expected. It is better than the alternative of not checking it. ---------- Added file: https://bugs.python.org/file50459/benchmark_gzip_read_header.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue45509> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com