[issue25626] Gzip fails for file over 2**32 bytes

Martin Panter Sun, 15 Nov 2015 03:14:09 -0800

Martin Panter added the comment:

Thanks for the report. Can you confirm if this demo illustrates your problem? 
For me, I only have 2 GiB of memory so I get a MemoryError, which seems 
reasonable for my situation.


from gzip import GzipFile
from io import BytesIO
file = BytesIO()
writer = GzipFile(fileobj=file, mode="wb")
writer.write(b"data")
writer.close()
file.seek(0)
reader = GzipFile(fileobj=file, mode="rb")
data = reader.read(2**32)  # Ideally this should return b"data"

Assuming that triggers the OverflowError, then the heart of the problem is that 
the zlib.decompressobj.decompress() method does not accept such large numbers 
for the length limit:

>>> import zlib
>>> decompressor = zlib.decompressobj(wbits=16 + 15)
>>> decompressor.decompress(file.getvalue(), 2**32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large for C unsigned int

I think the ideal fix would be to cap the limit at 2**32 - 1 in the zlib 
library. Would this be okay for a 3.5.1 bug fix release, or would it be 
considered a feature change?

Failing that, another option would be to cap the limit in the gzip library, and 
just document the zlib limitation. I already have a patch in Issue 23200 
documenting another quirk when max_length=0.

The same problem may also apply to the LZMA and bzip2 modules; I need to check.

----------
keywords: +3.5regression
nosy: +martin.panter
type: crash -> behavior

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25626>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25626] Gzip fails for file over 2**32 bytes

Reply via email to