[issue26379] zlib decompress as_bytearray flag

2016-02-19 Thread Terry J. Reedy

Changes by Terry J. Reedy :


--
nosy: +nadeem.vawda, twouters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26379] zlib decompress as_bytearray flag

2016-02-18 Thread Martin Panter

Martin Panter added the comment:

If you don’t know how large the buffer is when you call zlib.decompress(), 
there may be copying involved as the buffer is expanded. Have you found that 
the copying the bytes result into a bytearray is significantly worse than 
expanding a bytearray (e.g. timing results)?

The trouble with this proposal is that to me it doesn’t seem like there is a 
high demand for it. That is why I suggested the more minimal, low-level 
decompress_into() version. I think I would like to hear other opinions.

As for updating GzipFile, that uses an intermediate BufferedReader, so we 
cannot completely avoid copying. But it might be worthwhile updating the 
internal readinto() method anyway. Possibly also with the zipfile module, 
though I am not so familiar with that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26379] zlib decompress as_bytearray flag

2016-02-18 Thread Joe Jevnik

Joe Jevnik added the comment:

The recipe you showed looks like it is very complicated for the expected use 
case of decompressing all of the data into a mutable buffer. One difference I 
see with this and struct.pack_into or socket.recv_into is that in both of those 
cases we know how large our buffer needs to be. With zlib.decompress there is 
not a simple way for users to preallocate a buffer to hold the result and need 
to resort to some looping with a lot of resizes. I think that returning either 
a mutable or immutable buffer hits 95% of use cases. If we go with the 
decompress_into method I would strongly advise putting decompress_as_bytearray 
into the standard library so that users do not need write this themselves.

Thank you for pointing me at the bz2 and LZMA modules. When we come to an 
agreement on the API I will update bz2 and LZMA so that they match. Should I 
also update the gzip module?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26379] zlib decompress as_bytearray flag

2016-02-18 Thread Martin Panter

Martin Panter added the comment:

This is an unusual technique. I can’t think of any other standard library 
routine that returns a new bytearray instead of a bytes object.

Normally there are sister functions with an -into() suffix that accept a 
pre-allocated buffer. Examples are BufferedIOBase.readinto(), 
struct.pack_into(), socket.recv_into(). What do you think about adding a 
decompressobj.decompress_into(input, output) method instead? If necessary you 
could use it in a wrapper, something like:

def decompress_as_bytearray(data, wbits=15, bufsize=16384):
decompressor = zlib.decompressobj(wbits=wbits)
buffer = bytearray(bufsize + 1)  # Overallocate to help detect EOF
offset = 0
while True:
with memoryview(buffer) as view:
offset += decompressor.decompress_into(data, view[offset:])
if offset < len(buffer):
break
data = decompressor.unconsumed_tail
buffer *= 2  # Expand the buffer
del buffer[offset:]
if not decompressor.eof:
raise zlib.error("Incomplete data")
return buffer

If anything is added for zlib decompression, it would be good to add equivalent 
functionality in the bz2 and LZMA modules.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26379] zlib decompress as_bytearray flag

2016-02-17 Thread Joe Jevnik

New submission from Joe Jevnik:

Adds a keyword only flag to zlib.decompress and zlib.Decompress.decompress 
which marks that the return type should be a bytearray instead of bytes.

The use case for this is reading compressed data into a mutable array to do 
further operations without requiring that the user copy the data first. Often, 
if a user is choosing to compress the data there might be a real cost to 
copying the uncompressed bytes into a mutable buffer.

The impetus for this change comes from a patch for Pandas 
(https://github.com/pydata/pandas/pull/12359). I have also worked on a similar 
feature for blosc, another popular compression library for python 
(https://github.com/Blosc/python-blosc/pull/107).

My hope is to fix the hacky solution presented in the pandas patch by 
supporting this feature directly in the compression libraries.

As a side note: this is my first time using the argument clinic and I love it. 
It was so simple to add this argument, thank you very much for developing such 
an amazing tool!

--
components: Extension Modules
files: zlib.patch
keywords: patch
messages: 260424
nosy: ll
priority: normal
severity: normal
status: open
title: zlib decompress as_bytearray flag
type: enhancement
versions: Python 3.6
Added file: http://bugs.python.org/file41944/zlib.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com