[issue25626] Gzip fails for file over 2**32 bytes

2015-11-21 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 80f6f77a7cc3 by Martin Panter in branch '3.5':
Issue #25626: Change zlib to accept Py_ssize_t and cap to UINT_MAX
https://hg.python.org/cpython/rev/80f6f77a7cc3

New changeset afa1b6cd77a5 by Martin Panter in branch 'default':
Issue #25626: Merge zlib fix from 3.5
https://hg.python.org/cpython/rev/afa1b6cd77a5

New changeset 5117f0b3be09 by Martin Panter in branch 'default':
Issue #25626: Add news to 3.6 section
https://hg.python.org/cpython/rev/5117f0b3be09

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-21 Thread Martin Panter

Martin Panter added the comment:

Thank you Serhiy for you help, and Ben for reporting this.

--
resolution:  -> fixed
stage: commit review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Added new comments.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-20 Thread Martin Panter

Martin Panter added the comment:

Thanks for your testing Serhiy. I can reproduce this on 32-bit Linux (also by 
building with CC="gcc -m32"). It is easy to produce the bug with flush(), but 
with Decompress.decompress() it would need over 2 GiB of memory and data to 
expand the buffer enough to trigger the error.

>>> zlib.decompressobj().flush(sys.maxsize + 1)
SystemError: Negative size passed to PyBytes_FromStringAndSize

In zlib-Py_ssize_t.patch, I added the bigmem tests, but I am not able to 
actually test them. The gzip test would require 8 GiB of memory. I also changed 
the converter to Py_ssize_t to fix this latest bug.

--
Added file: http://bugs.python.org/file41096/zlib-Py_ssize_t.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Ah, I found yet one bug.

>>> zlib.decompress(zlib.compress(b'abcd'), 0, sys.maxsize+1)
Traceback (most recent call last):
  File "", line 1, in 
SystemError: Negative size passed to PyBytes_FromStringAndSize

There are similar bugs in decompressor's methods decompress() and flush() but 
it is hard to reproduce them.

The maximal length value should be bounded not only with UINT_MAX, but with 
PY_SSIZE_T_MAX too.

I would merge sval and uval into one variable of type Py_ssize_t.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-20 Thread Martin Panter

Martin Panter added the comment:

New patch addressing comments. I used the undocumented API _PyLong_FromNbInt() 
to call __int__() rather than __index__().

--
Added file: http://bugs.python.org/file41107/zlib-Py_ssize_t.2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-20 Thread Martin Panter

Martin Panter added the comment:

Actually it did make an existing bug a bit worse. The bug is triggered with 
valid sizes greater than LONG_MAX, due to an exception not being cleared, and 
my change made this bug more dramatic, turning the OverflowError into a crash. 
This new patch should fix that.

I’m now looking at adding the other tests that require allocating lots of 
memory.

--
Added file: http://bugs.python.org/file41090/zlib-size_t.2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you Martin, the patch LGTM.

The difference between __int__() and __index__() is that float has the former, 
but not the latter. For better compatibility we have to use __int__() in 
maintained releases. But obviously __index__() is more appropriate 
semantically. float buffer length doesn't make sense. We have to change __int__ 
to __index__ in future releases. But this is separate issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-20 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> martin.panter
stage: patch review -> commit review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-19 Thread Martin Panter

Martin Panter added the comment:

Okay. For the gzip module, I cannot easily test this myself. Quickly looking at 
other cases, I guess it would look something like this, but I need to spend 
some time understanging the bigmemtest decorator properly:

@unittest.skipIf(sys.maxsize < _4G, "Requires non-32-bit system")
@test.support.bigmemtest(_4G, 1, dry_run=False)
def test_large_read(self, size):
...
data = reader.read(size)  # Should not raise OverflowError
self.assertEqual(data, b"data")

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Could you please add tests for other two functions? And tests for the gzip 
module?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

You can commit your patch right now (it shouldn't make things worse) and add 
new tests later.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-18 Thread Martin Panter

Martin Panter added the comment:

I am inclined to commit my patch to get this fixed in the upcoming 3.5.1 
release, unless anyone thinks that could be a problem.

--
components: +Extension Modules

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Is 3.4 affected? A uint_converter was added in issue18294.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-16 Thread Martin Panter

Martin Panter added the comment:

3.4 shouldn’t be affected by the gzip regression. In 3.4 the gzip module does 
not set max_length to limit the decompression size.

In 3.4 the underlying zlib module will also raise OverflowError if 
max_lenth=2**32, and my patch could be applied, but I don’t think it is worth 
it at this stage.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-15 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-15 Thread Martin Panter

Martin Panter added the comment:

Ben: By adding the cap, it means when the lower-level zlib module is asked to 
decode say 5 GiB, it will decode at most 4 GiB (rather than raising an 
exception). The 4 GiB limit (UINT_MAX to be precise) is due to the underlying 
zlib library; it’s not Python’s doing, and the limit was present before 3.5 as 
well.

In 3.5, if you ask GzipFile.read() to decompress 5 GiB, it will actually feed 
the decompressor 8 KiB (io.DEFAULT_BUFFER_SIZE) of compressed data, which will 
decompress to at most 8 MB or so (max compression ratio is 1:1032). Then it 
will feed the next 8 KiB chunk. So it does not matter whether the limit is 4 or 
5 GiB because there can only be 8ish MB data per internal decompress() call.

Before 3.5, it first feeds a 1 KiB chunk, but exponentially increases the chunk 
size up to 10 MiB in each internal iteration.

If you think the new 3.5 implementation (with OverflowError fixed) is 
significantly less efficient, I can look at ways of improving it. But I am a 
bit skeptical. Usually I find once you process 10s or 100s of kB of data at 
once, any increases in the buffer size have negligible effect on the little 
time spent in native Python code, and times actually get worse as your buffers 
get too big, probably due to ineffective use of fast memory caching.

===

I propose this patch for the 3.5 branch, which implements the internal capping 
in the zlib module. It changes the zlib module to accept sizes greater than 
UINT_MAX, but internally limits them. I think this is the most practical way 
forward, because it does not require Python code to know the value of UINT_MAX 
(which could in theory vary by platform).

I would be good to get a review of my patch. In particular, is this change okay 
for the 3.5 maintainence branch? It would also be good to confirm that my patch 
fixes the original problem, i.e. read(2**32) works on computers with more than 
4 GiB of memory (which I cannot test).

This regression is caused by Issue 23529, where I make sure that 
GzipFile.read(size) doesn’t buffer huge amounts of data from a decompression 
bomb.

--
keywords: +patch
stage:  -> patch review
Added file: http://bugs.python.org/file41053/zlib-size_t.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25626] Gzip fails for file over 2**32 bytes

2015-11-14 Thread SilentGhost

Changes by SilentGhost :


--
components: +Library (Lib)
nosy: +nadeem.vawda, twouters
versions: +Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com