[issue15955] gzip, bz2, lzma: add option to limit output size

2014-06-15 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Sorry, I just haven't had any free time lately, and may still not be able
to give this the attention it deserves for another couple of weeks.

Serhiy, would you be interested in reviewing Nikolaus' patch?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add option to limit output size

2014-04-06 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I've posted a review at http://bugs.python.org/review/15955/. (For some reason, 
it looks like Rietveld didn't send out email notifications. But maybe it never 
sends a notification to the sender? Hmm.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add option to limit output size

2014-03-30 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Thanks for the patch, Nikolaus. I'm afraid I haven't had a chance to look
over it yet; this past week has been a bit crazy for me. I'll definitely
get back to you with a review in the next week, though.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-27 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 How does one create a multi-stream bzip2 file in the first place?

If you didn't do so deliberately, I would guess that you used a parallel
compression tool like pbzip2 or lbzip2 to create your bz2 file. These tools work
by splitting the input into chunks, compressing each chunk as a separate stream,
and then concatenating these streams afterward.

Another possibility is that you just concatenated two existing bz2 files, e.g.:

$ cat first.bz2 second.bz2 multi.bz2


 And how do I tell it's multi-stream.

I don't know of any pre-existing tools to do this, but you can write a script
for it yourself, by feeding the file's data through a BZ2Decompressor. When the
decompress() method raises EOFError, you're at the end of the first stream. If
the decompressor's unused_data attribute is non-empty, or there is data that has
not yet been read from the input file, then it is either (a) a multi-stream bz2
file or (b) a bz2 file with other metadata tacked on to the end.

To distinguish between cases (a) and (b), take unused_data + rest_of_input_file
and feed it into a new BZ2Decompressor. If don't get an IOError, then you've got
a multi-stream bz2 file.

(If you *do* get an IOError, then that's case (b) - someone's appended non-bz2
 data to the end of a bz2 file. For example, Gentoo and Sabayon Linux packages
 are bz2 files with package metadata appended, according to issue 19839.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20781
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20781] BZ2File doesn't decompress some .bz2 files correctly

2014-02-26 Thread Nadeem Vawda

Nadeem Vawda added the comment:

As Serhiy said, multi-stream support was only added to the bz2 module in 3.3,
and there is no plan to backport functionality this to 2.7.

However, the bz2file package on PyPI [1] does support multi-stream inputs,
and you can use its BZ2File class as a drop-in replacement for the built-in
one on 2.7.

[1] https://pypi.python.org/pypi/bz2file

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20781
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add option to limit output size

2014-02-02 Thread Nadeem Vawda

Nadeem Vawda added the comment:

After some consideration, I've come to agree with Serhiy that it would be better
to keep a private internal buffer, rather than having the user manage unconsumed
input data. I'm also in favor of having a flag to indicate whether the
decompressor needs more input to produce more decompressed data. (I'd prefer to
call it 'needs_input' or similar, though - 'data_ready' feels too vague to me.)

In msg176883 and msg177228, Serhiy raises the possibility that the compressor
might be unable to produce decompressed output from a given piece of (non-empty)
input, but will still leave the input unconsumed. I do not think that this can
actually happen (based on the libraries' documentation), but this API will work
even if that situation can occur.

So, to summarize, the API will look like this:

class LZMADecompressor:

...

def decompress(self, data, max_length=-1):
Decompresses *data*, returning uncompressed data as bytes.

If *max_length* is nonnegative, returns at most *max_length* bytes
of decompressed data. If this limit is reached and further output
can be produced, *self.needs_input* will be set to False. In this
case, the next call to *decompress()* should provide *data* as b''
to obtain more of the output.

If all of the input data was decompressed and returned (either
because this was less than *max_length* bytes, or because
*max_length* was negative), *self.needs_input* will be set to True.

...

Data not consumed due to the use of 'max_length' should be saved in an internal
buffer (that is not exposed to Python code at all), which is then prepended to
any data provided in the next call to decompress() before providing the data to
the underlying compression library. The cases where either the internal buffer
or the new data are empty should be optimized to avoid unnecessary allocations
or copies, since these will be the most common cases.

Note that this API does not need a Python-level 'unconsumed_tail' attribute -
its role is served by the internal buffer (which is private to the C module
implementation). This is not to be confused with the already-existing
'unused_data' attribute that stores data found after the end of the compressed
stream. 'unused_data' should continue to work as before, regardless of whether
decompress() is called with a max_length argument or not.

As a starting point I would suggest writing a patch for LZMADecompressor first,
since its implementation is a bit simpler than BZ2Decompressor. Once this patch
and an analogous one for BZ2Decompressor have been committed, we can then
convert GzipFile, BZ2File and LZMAFile to use this feature.

If you have any questions while you're working on this issue, feel free to send
them my way.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20193] Derby: Convert the zlib, _bz2 and _lzma modules to use Argument Clinic

2014-01-26 Thread Nadeem Vawda

Nadeem Vawda added the comment:

The latest patch for zlib seems to be missing Modules/zlibmodule.clinic.c

 I suppose that zdict=b'' have same effect as not specifying zdict. Am I right?

Probably, but to be on the safe side I'd prefer that we preserve the behavior of
not calling deflateSetDictionary/inflateSetDictionary unless the caller
explicitly provides zdict. If you need to give a Python default value, rather
use None than b''.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20193
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20193] Derby: Convert the zlib, _bz2 and _lzma modules to use Argument Clinic

2014-01-26 Thread Nadeem Vawda

Nadeem Vawda added the comment:

The patch for zlib looks good to me. Thanks for working on this, Serhiy.


 We're not allowing changes in semantics for Argument Clinic conversion for 
 3.4. If it doesn't currently accept None, we can't add it right now, and 
 we'll have to save it for 3.5.

Fair enough.


 The behavior is preserved. This case is exact analogue of _sha1.sha1(). No 
 one 
 additional function called when the parameter is not specified, but if it is
 specified as b'', the function behaves identically to not passing in that
 parameter.

Ah OK, I misunderstood the Argument Clinic input code when I first read it.
Having actually read the docs, it makes sense.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20193
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20358] test_curses is failing

2014-01-23 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
nosy: +nadeem.vawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20358
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20358] test_curses is failing

2014-01-23 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I can reproduce this (also on Ubuntu 13.10 64-bit). Maybe there's a bug
in the version of curses distributed with the latest Ubuntu release? It
looks like our only Ubuntu buildbot is using 8.04 (almost 6 years old!).

Also note that you won't be able to reproduce this with make test or
make testall (see issue 12669). make buildbottest does catch the bug,
though (which also rules out the possibility that the buildbots are just
skipping the test).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20358
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add option to limit output size

2014-01-22 Thread Nadeem Vawda

Nadeem Vawda added the comment:

No, I'm afraid I haven't had a chance to do any work on this issue since my last
message.

I would be happy to review a patch for this, but before you start writing one,
we should settle on how the API will look. I'll review the existing discussion
in detail over the weekend and come up with something that avoids the potential
problems raised by Serhiy.

--
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20193] Derby: Convert the zlib, _bz2 and _lzma modules to use Argument Clinic

2014-01-22 Thread Nadeem Vawda

Nadeem Vawda added the comment:

The bz2 patch looks good to me, aside from a nit with the docstring for
BZ2Compressor.__init__.

The lzma patch produces a bunch of test failures for me. It looks like
the __init__ methods for LZMACompressor and LZMADecompressor aren't
accepting keyword args:

☿ ./python -c 'import lzma; lzma.LZMACompressor(format=lzma.FORMAT_XZ)' 


   
Traceback (most recent call last):
  File string, line 1, in module
TypeError: __init__ does not take keyword arguments

☿ ./python -c 'import lzma; lzma.LZMADecompressor(format=lzma.FORMAT_AUTO)' 


   
Traceback (most recent call last):
  File string, line 1, in module
TypeError: __init__ does not take keyword arguments

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20193
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20193] Derby: Convert the zlib, _bz2 and _lzma modules to use Argument Clinic

2014-01-19 Thread Nadeem Vawda

Nadeem Vawda added the comment:

The patches for bz2 and lzma look good to me, aside from one nit for lzma.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20193
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20182] Derby #13: Convert 50 sites to Argument Clinic across 5 files

2014-01-08 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
nosy: +nadeem.vawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20182
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20184] Derby #16: Convert 50 sites to Argument Clinic across 9 files

2014-01-08 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
nosy: +nadeem.vawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20184
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20185] Derby #17: Convert 50 sites to Argument Clinic across 14 files

2014-01-08 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
nosy: +nadeem.vawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20185
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19885] lzma segfault when __init__ with non-existent file after executing the constructor (Python 2.7)

2013-12-04 Thread Nadeem Vawda

Nadeem Vawda added the comment:

To clarify, which version(s) does this affect? I have not been able to
reproduce against 3.4, and 2.7 does not included the lzma module in the
first place.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19885
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19878] bz2.BZ2File.__init__() cannot be called twice

2013-12-03 Thread Nadeem Vawda

Nadeem Vawda added the comment:

It appears that this *does* affect 2.7 (though not 3.2, 3.3 or 3.4, 
fortunately):

~/src/cpython/2.7☿ gdb --ex run --args ./python -c 'import bz2; obj = 
bz2.BZ2File(/dev/null); obj.__init__()'
«... snip banner ...»
Starting program: /home.u/nadeem/src/cpython/2.7/./python -c import\ bz2\;\ 
obj\ =\ bz2.BZ2File\(\/dev/null\\)\;\ obj.__init__\(\\\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1.
Traceback (most recent call last):
  File string, line 1, in module
IOError: [Errno 2] No such file or directory: ''

Program received signal SIGSEGV, Segmentation fault.
0x00431d3e in PyFile_DecUseCount (fobj=0x0) at 
Objects/fileobject.c:89
89  fobj-unlocked_count--;

--
assignee:  - nadeem.vawda
nosy: +nadeem.vawda
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19878
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19839] bz2: regression wrt supporting files with trailing garbage after EOF

2013-12-01 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I'll have a patch for this in the next couple of days (and a similar one
for the lzma module, which has the same issue (even though it's not a
regression in that case)).

In the meanwhile, you can work around this by feeding the compressed data
to a BZ2Decompressor yourself - it stops at the end of the bz2 stream,
with any leftover data stored in its 'unused_data' attribute.

--
assignee:  - nadeem.vawda
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19839
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19395] unpickled LZMACompressor is crashy

2013-10-28 Thread Nadeem Vawda

Nadeem Vawda added the comment:

The part of this issue specific to LZMACompressor should now be fixed;
I've filed issue 19425 for the issue with Pool.map hanging.

--
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19395
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19425] multiprocessing.Pool.map hangs if pickling argument raises an exception

2013-10-28 Thread Nadeem Vawda

New submission from Nadeem Vawda:

[Split off from issue 19395]

The following code hangs after hitting a TypeError trying to pickle one
of the TextIOWrapper objects:

import multiprocessing

def read(f): return f.read()

files = [open(path) for path in 3 * ['/dev/null']]
pool = multiprocessing.Pool()
results = pool.map(read, files)
print(results)

This issue is present in 3.2, 3.3 and 3.4, but not in 2.7.

--
components: Library (Lib)
messages: 201580
nosy: cantor, jnoller, nadeem.vawda, pitrou, python-dev, sbt, tim.peters
priority: normal
severity: normal
stage: needs patch
status: open
title: multiprocessing.Pool.map hangs if pickling argument raises an exception
type: behavior
versions: Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19425
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19227] test_multiprocessing_xxx hangs under Gentoo buildbots

2013-10-28 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
nosy: +nadeem.vawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19227
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19395] unpickled LZMACompressor is crashy

2013-10-26 Thread Nadeem Vawda

Nadeem Vawda added the comment:

It looks like there's also a separate problem in the multiprocessing
module. The following code hangs after hitting a TypeError trying to
pickle one of the TextIOWrapper objects:

import multiprocessing

def read(f): return f.read()

files = [open(path) for path in 3 * ['/dev/null']]
pool = multiprocessing.Pool()
results = pool.map(read, files)
print(results)

--
nosy: +jnoller, sbt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19395
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19395] unpickled LZMACompressor is crashy

2013-10-25 Thread Nadeem Vawda

Nadeem Vawda added the comment:

As far as I can tell, liblzma provides no way to serialize a compressor's
state, so the best we can do is raise a TypeError when attempting to
pickle the LZMACompressor (and likewise for LZMADecompressor).

Also, it's worth pointing out that the provided code wouldn't work even
if you could serialize LZMACompressor objects - each call to compress()
updates the compressor's internal state with information needed by the
final call to flush(), but each compress() call would be made on a
*copy* of the compressor rather than the original object. So flush()
would end up producing bogus data (and mostly likely all compress()
calls after the first would too).

If you are trying to do this because LZMA compression is too slow, I'd
suggest you try using zlib or bz2 instead - both of these algorithms
can compress faster than LZMA (at the expense of your compression ratio).
zlib is faster on both compression and decompression, while bz2 is slower
than lzma at decompression.

Alternatively, you can do parallel compression by calling lzma.compress()
on each block (instead of creating an LZMACompressor), and then joining
the results. But note that (a) this will give you a worse compression
ratio than serial compression (because it can't exploit redundancy shared
between blocks), and (b) using multiprocessing has a performance overhead
of its own, because you will need to copy the input when sending it to
the worker subprocess, and then copy the result when sending it back to
the main process.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19395
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19395] unpickled LZMACompressor is crashy

2013-10-25 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Yes, that's because the builtin map function doesn't handle each input
in a separate process, so it uses the same LZMACompressor object
everywhere. Whereas multiprocessing.Pool.map creates a new copy of the
compressor object for each input, which is where the problem comes in.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19395
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19222] Add 'x' mode to gzip.open()

2013-10-18 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
assignee:  - nadeem.vawda
nosy: +nadeem.vawda
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19222
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19223] Add 'x' mode to bz2.open()

2013-10-18 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
assignee:  - nadeem.vawda
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19223
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19201] Add 'x' mode to lzma.open()

2013-10-18 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Fix committed. Thanks for the patches!

As Jesús and Terry have said, this won't be backported to 3.3/2.7, since
it is a new feature.


[oylenshpeegul]
| It's weird how different these three patches are! We're
| essentially doing the same thing: please allow the x option to pass
| through to builtins.open. Why don't these three modules look more alike?

Mostly because they were written at different times, by different people,
with different things to be backward-compatible with. Ideally they would
share the bulk of their code, but it's tricky to do that without changing
behavior in some corner cases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19201
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19201] Add 'x' mode to lzma.open()

2013-10-18 Thread Nadeem Vawda

Nadeem Vawda added the comment:

[terry.reedy]
| Arfrever's point about the order of characters makes me wonder why mode
| strings (as opposed to characters in the strings) are being checked.
| The following tests that exactly one of w, a, x appear in mode.
|  if len({'w', 'a', 'x'}  set(mode)) == 1:
| If mode is eventually passed to open(), the latter would do what ever
| it does with junk chars in mode (such as 'q').

There are two separate questions here - how rigid we are about modes
containing only valid characters, and how we handle invalid characters.

I don't think there's any point in passing through unrecognized chars
to builtins.open(), since it results in a ValueError either way.

On the first point, the code only accepts modes like 'r' and 'rb' (but
not 'br') for the sake of simplicity. There doesn't seem to be much
practical value in accepting arbitrarily-ordered modes, but if someone
has a compelling use-case (or a patch that's no more complex than the
status quo), please bring it up in a separate issue.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19201
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-20 Thread Nadeem Vawda

Nadeem Vawda added the comment:

No, that is the intended behavior for binary streams - they operate at
the level of individual byes. If you want to treat your input file as
Unicode-encoded text, you should open it in text mode. This will return a
TextIOWrapper which handles the decoding and line splitting properly.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-19 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I agree that making lzma.open() wrap its return value in a BufferedReader
(or BufferedWriter, as appropriate) is the way to go. I'm currently
travelling and don't have my SSH key with me - Serhiy, can you make the
change?

I'll put together a documentation patch that recommends using lzma.open()
rather than LZMAFile directly, and mentions the performance implications.


 Interestingly, opening in text (i.e. unicode) mode is almost as fast as with 
 a BufferedReader:

This is because opening in text mode returns a TextIOWrapper, which is
written in C, and presumably does its own buffering on top of
LZMAFile.read1() instead of calling LZMAFile.readline().


 From my perspective default wrapping with io.BufferedReader is a great
 idea. I can't think of who would suffer. Maybe someone who wants to
 open thousands of simultaneous streams wouldn't appreciate the memory
 overhead. If that person exists then he would want an option to turn
 it off.

If someone doesn't want the BufferedReader/BufferedWriter, they can
create an LZMAFile directly; we don't plan to remove that possibility. So
I don't think that should be a problem.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-19 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 I agree that making lzma.open() wrap its return value in a BufferedReader
 (or BufferedWriter, as appropriate) is the way to go.

On second thoughts, there's no need to change the behavior for mode='wb'.
We can just return a BufferedReader for mode='rb', and leave the current
behavior (returning a raw LZMAFile) in place for mode='wb'.


I also ran some additional benchmarks for the bz2 and gzip modules. It
looks like those two modules would also benefit from having their open()
functions use io.BufferedReader:

[lzma]

  $ time xzcat src.xz | wc -l
  1057980

  real0m0.543s
  user0m0.556s
  sys 0m0.024s
  $ ../cpython/python -m timeit -s 'import lzma, io' 'f = lzma.open(src.xz, 
r)' 'for line in f: pass'
  10 loops, best of 3: 2.01 sec per loop
  $ ../cpython/python -m timeit -s 'import lzma, io' 'f = 
io.BufferedReader(lzma.open(src.xz, r))' 'for line in f: pass'
  10 loops, best of 3: 795 msec per loop

[bz2]

  $ time bzcat src.bz2 | wc -l
  1057980

  real0m1.322s
  user0m1.324s
  sys 0m0.044s
  $ ../cpython/python -m timeit -s 'import bz2, io' 'f = bz2.open(src.bz2, 
r)' 'for line in f: pass'
  10 loops, best of 3: 3.71 sec per loop
  $ ../cpython/python -m timeit -s 'import bz2, io' 'f = 
io.BufferedReader(bz2.open(src.bz2, r))' 'for line in f: pass'
  10 loops, best of 3: 2.04 sec per loop

[gzip]

  $ time zcat src.gz | wc -l
  1057980

  real0m0.310s
  user0m0.296s
  sys 0m0.028s
  $ ../cpython/python -m timeit -s 'import gzip, io' 'f = gzip.open(src.gz, 
r)' 'for line in f: pass'
  10 loops, best of 3: 1.94 sec per loop
  $ ../cpython/python -m timeit -s 'import gzip, io' 'f = 
io.BufferedReader(gzip.open(src.gz, r))' 'for line in f: pass'
  10 loops, best of 3: 556 msec per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18003] New lzma crazy slow with line-oriented reading.

2013-05-18 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Have you tried running the benchmark against the default (3.4) branch?
There was some significant optimization work done in issue 16034, but
the changes were not backported to 3.3.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17843] Lib/test/testbz2_bigmem.bz2 trigger virus warnings

2013-04-30 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Benjamin, please cherry-pick this for 2.7.4 as well (changesets b7bfedc8ee18 
and 529c4defbfd7).

--
stage: needs patch - commit review
versions: +Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17843
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17843] Lib/test/testbz2_bigmem.bz2 trigger virus warnings

2013-04-28 Thread Nadeem Vawda

Nadeem Vawda added the comment:

OK, 2.7 is done.

Georg, what do we want to do for 3.2? I've attached a patch.

--
assignee: nadeem.vawda - georg.brandl
keywords: +patch
Added file: http://bugs.python.org/file30049/bz2-viruswarning.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17843
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17843] Lib/test/testbz2_bigmem.bz2 trigger virus warnings

2013-04-25 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Oh dear. I'll update the test suite over the weekend. In the meanwhile, 
Christian, can you confirm which versions are affected? The file should only 
have been included in 2.7 and 3.2.

--
assignee:  - nadeem.vawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17843
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14398] bz2.BZ2DEcompressor.decompress fail on large files

2013-04-21 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Hmm, so actually most of the bugs fixed in 2.7 and 3.2 weren't present
in 3.3 and 3.4, and those versions already had tests equivalent to the
tests I added for 2.7/3.2.

As for the changes that I did make to 3.3/3.4:

- two of the three cover cases that only occur if the output data is
  larger than ~32GiB. Even if we have a buildbot with enough memory for
  it (which I don't think we do), actually running such tests would take
  forever and then some.

- the third is for a condition that's actually pretty much impossible to
  trigger - grow_buffer() has to be called on a buffer that is already at
  least 8*((size_t)-1)/9 bytes long. On a 64-bit system this is
  astronomically large, while on a 32-bit system the OS will probably
  have reserved more than 1/9th of the virtual address space for itself,
  so it won't be possible to allocate a large enough buffer.

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14398
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14398] bz2.BZ2DEcompressor.decompress fail on large files

2013-04-18 Thread Nadeem Vawda

Nadeem Vawda added the comment:

An oversight on my part, I think. I'll add tests for 3.x this weekend.

--
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14398
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13898] Ignored exception in test_ssl

2013-03-03 Thread Nadeem Vawda

Nadeem Vawda added the comment:

This change fixes the problem (and doesn't break anything else that I can see):

--- a/Lib/test/test_ssl.py
+++ b/Lib/test/test_ssl.py
@@ -979,7 +979,7 @@
 self.sslconn = self.server.context.wrap_socket(
 self.sock, server_side=True)
 
self.server.selected_protocols.append(self.sslconn.selected_npn_protocol())
-except ssl.SSLError as e:
+except (ssl.SSLError, ConnectionResetError) as e:
 # XXX Various errors can have happened here, for example
 # a mismatching protocol version, an invalid certificate,
 # or a low-level bug. This should be made more 
discriminating.

Does that look reasonable?

--
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13898
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13898] Ignored exception in test_ssl

2013-03-03 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 You could add a comment explaining the issue.

Done.

This doesn't seem to affect 2.7. Marking as fixed in 3.2/3.3/3.4.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed
versions:  -Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13898
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13886] readline-related test_builtin failure

2013-02-02 Thread Nadeem Vawda

Nadeem Vawda added the comment:

You're right; it breaks backspacing over multibyte characters. I should
have tested it more carefully before committing. I'll revert the changes.

--
resolution: fixed - 
stage: committed/rejected - needs patch
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13886
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1159051] Handle corrupted gzip files with unexpected EOF

2013-02-02 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I think the new behavior should be controlled by a constructor flag, maybe
named defer_errors. I don't like the idea of adding the flag to read(),
since that makes us diverge from the standard file interface. Making a
distinction between size0 and size=None seems confusing and error-prone,
not to mention that we (again) would have read() work differently from most
other file classes.

I'd prefer it if the new behavior is not enabled by default for size=0,
even if this wouldn't break well-behaved code. Having a flag that only
controls the size0 case is inelegant, and I don't think we should change
the default behavior unless there is a clear benefit to doing so.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1159051
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13886] readline-related test_builtin failure

2013-01-27 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
assignee:  - nadeem.vawda
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed
versions: +Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13886
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1159051] Handle corrupted gzip files with unexpected EOF

2013-01-20 Thread Nadeem Vawda

Nadeem Vawda added the comment:

The updated patch looks good to me.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1159051
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1159051] Handle corrupted gzip files with unexpected EOF

2013-01-19 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I've reviewed the patch and posted some comments on Rietveld.


 I doubt about backward compatibility. It's obvious that struct.error and 
 TypeError are unintentional, and EOFError is purposed for this case. However 
 users can catch undocumented but de facto exceptions and doesn't expect 
 EOFError.

I think it's fine for us to change it to raise EOFError in these cases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1159051
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add option to limit output size

2013-01-19 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 What if unconsumed_tail is not empty but less than needed to decompress at
 least one byte? We need read more data until unconsumed_tail grow enought to
 be decompressed.

This is possible in zlib, but not in bz2. According to the manual [1], it is
perfectly OK to supply one byte at a time.

For xz, I'm not sure whether this problem could occur. I had assumed that it
could not, but I may be mistaken ;-). Unfortunately liblzma has no proper
manual, so I'll have to dig into the implementation to find out, and I haven't
had the time to do this yet.


[As an aside, it would be nice if the documentation for the zlib module
 mentioned this problem. We can't assume that users of the Python module are
 familiar with the C API for zlib...]


[1] http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16943] seriously? FileCookieJar can't really save ? save method is NotImplemented

2013-01-12 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
resolution:  - duplicate
stage:  - committed/rejected
status: open - closed
superseder:  - seriously? urllib still doesn't support persistent connections?

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16943
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16828] bz2 error on compression of empty string

2013-01-02 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Fixed. Thanks for the bug report and the patches!

--
assignee:  - nadeem.vawda
keywords: +3.3regression -patch
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add option to limit output size

2012-12-09 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 # Using zlib's interface
 while not d.eof:
 compressed = d.unconsumed_tail or f.read(8192)
 if not compressed:
 raise ValueError('End-of-stream marker not found')
 output = d.decompress(compressed, 8192)
 # process output

 This is not usable with bzip2. Bzip2 uses large block size and 
 unconsumed_tail 
 can be non empty but decompress() will return b''. With zlib you possible can 
 see the same effect on some input when read by one byte.

I don't see how this is a problem. If (for some strange reason) the
application-specific processing code can't handle empty blocks properly, you can
just stick if not output: continue before it.


 Actually it should be:

 # Using zlib's interface
 while not d.eof:
 output = d.decompress(d.unconsumed_tail, 8192)
 while not output and not d.eof:
 compressed = f.read(8192)
 if not compressed:
 raise ValueError('End-of-stream marker not found')
 output = d.decompress(d.unconsumed_tail + compressed, 8192)
 # process output

 Note that you should use d.unconsumed_tail + compressed as input, and 
 therefore
 do an unnecessary copy of the data.

Why is this necessary? If unconsumed_tail is b'', then there's no need to
prepend it (and the concatenation would be a no-op anyway). If unconsumed_tail
does contain data, then we don't need to read additional compressed data from
the file until we've finished decompressing the data we already have.


 Without explicit unconsumed_tail you can write input data in the internal
 mutable buffer, it will be more effective for large buffer (handreds of KB)
 and small input chunks (several KB).

Are you proposing that the decompressor object maintain its own buffer, and
copy the input data into it before passing it to the decompression library?
Doesn't that just duplicate work that the library is already doing for us?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add option to limit output size

2012-12-02 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I've tried reimplementing LZMAFile in terms of the decompress_into()
method, and it has ended up not being any faster than the existing
implementation. (It is _slightly_ faster for readinto() with a large
buffer size, but all other cases it was either of equal performance or
significantly slower.)

In addition, decompress_into() is more complicated to work with than I
had expected, so I withdraw my objection to the approach based on
max_length/unconsumed_tail.


 unconsumed_tail should be private hidden attribute, which automatically 
 prepends any consumed data.

I don't think this is a good idea. In order to have predictable memory
usage, the caller will need to ensure that the current input is fully
decompressed before passing in the next block of compressed data. This
can be done more simply with the interface used by zlib. Compare:

while not d.eof:
output = d.decompress(b'', 8192)
if not output:
compressed = f.read(8192)
if not compressed:
raise ValueError('End-of-stream marker not found')
output = d.decompress(compressed, 8192)
# process output

with:

# Using zlib's interface
while not d.eof:
compressed = d.unconsumed_tail or f.read(8192)
if not compressed:
raise ValueError('End-of-stream marker not found')
output = d.decompress(compressed, 8192)
# process output


A related, but orthogonal proposal: We might want to make unconsumed_tail
a memoryview (provided the input data is know to be immutable), to avoid
creating an unnecessary copy of the data.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15677] Gzip/zlib allows for compression level=0

2012-11-11 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Committed. Thanks for the patch!

--
resolution:  - fixed
stage: commit review - committed/rejected
status: open - closed
type:  - enhancement

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15677
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data

2012-11-10 Thread Nadeem Vawda

Nadeem Vawda added the comment:

New patch committed. Once again, thanks for all your work on this issue!

--
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16350
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16411] zlib.Decompress.decompress() retains pointer to input buffer without acquiring reference to it

2012-11-10 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Ah, that's much nicer than either of my ideas. Patch committed. Thanks!

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16411
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16441] range usage in gzip module leads to excessive memory usage.

2012-11-08 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Looks good to me. Go ahead.

You needn't add or change any tests for this, but you should run the
existing tests before committing, just to be safe.

--
nosy: +nadeem.vawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16441
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add option to limit output size

2012-11-06 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I suspect that it will be slower than the decompress_into() approach, but
as you say, we need to do benchmarks to see for sure.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data

2012-11-06 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 These were not idle questions.  I wrote the patch, and I had to know
 what behavior is correct.

Ah, sorry. I assumed you were going to submit a separate patch to fix the
unconsumed_tail issues.

 Here's the patch.  It fixes potential memory bug (unconsumed_tail sets
 to NULL in case of out of memory), resets the unconsumed_tail to b''
 after EOF, updates unconsumed_tail and unused_data in flush().

Did you perhaps forget to attach the patch? The only ones I see are those
that you uploaded last week.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16350
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data

2012-11-04 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Fixed. Thanks for the patch!


 This hacking is not needed, if first argument of PyBytes_FromStringAndSize()
 is NULL, the contents of the bytes object are uninitialized.

Oh, cool. I didn't know about that.


 What should unconsumed_tail be equal after EOF? b'' or unused_data?

Definitely b''. unconsumed_tail is meant to hold compressed data that should
be passed in to the next call to decompress(). If we are at EOF, then
decompress() should not be called again, and so it would be misleading to have
unconsumed_tail be non-empty.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16350
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16411] zlib.Decompress.decompress() retains pointer to input buffer without acquiring reference to it

2012-11-04 Thread Nadeem Vawda

New submission from Nadeem Vawda:

When calling zlib.Decompress.decompress() with a max_length argument,
if the input data is not full consumed, the next_in pointer in the
z_stream struct are left pointing into the data object, but the
decompressor does not hold a reference to this object. This same
pointer is reused (perhaps unintentionally) if flush() is called
without calling decompress() again.

If the data object gets deallocated between the calls to decompress()
and to flush(), zlib will then try to access this deallocated memory,
and most likely return bogus output (or segfault). See the attached
script for a demonstration.

I see two potential solutions:

  1. Set avail_in to zero in flush(), so that it does not try to use
 leftover data (or whatever is else where that data used to be).

  2. Have decompress() check if there is leftover data, and if so,
 save a reference to the object until a) we consume the rest of
 the data in flush(), or b) discard it in a subsequent call to
 decompress().

Solution 2 would be less disruptive to code that depends on the existing
behavior (in non-pathological cases), but I'm don't like the maintenance
burden of adding yet another thing to keep track of to the decompressor
state. The PyZlib_objdecompress function is complex enough as it is, and
we can expect more bugs like this to creep in the more we cram additional
logic into it. So I'm more in favor of solution 1.

Any thoughts?

--
files: zlib_stale_ptr.py
messages: 174853
nosy: nadeem.vawda, serhiy.storchaka
priority: normal
severity: normal
stage: needs patch
status: open
title: zlib.Decompress.decompress() retains pointer to input buffer without 
acquiring reference to it
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file27889/zlib_stale_ptr.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16411
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data

2012-11-04 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 flush() does not update unconsumed_tail and unused_data.

  import zlib
  x = zlib.compress(b'abcdefghijklmnopqrstuvwxyz') + b'0123456789'
  dco = zlib.decompressobj()
  dco.decompress(x, 1)
 b'a'
  dco.flush()
 b'bcdefghijklmnopqrstuvwxyz'
  dco.unconsumed_tail
 b'NIMK\xcf\xc8\xcc\xca\xce\xc9\xcd\xcb/(,*.)-+\xaf\xa8\xac\x02\x00\x90\x86\x0b
  0123456789'
  dco.unused_data
 b''

I see another bug here - described in issue 16411.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16350
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add option to limit output size

2012-11-04 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I agree that being able to limit output size is useful and desirable, but
I'm not keen on copying the max_length/unconsumed_tail approach used by
zlib's decompressor class. It feels awkward to use, and it complicates
the implementation of the existing decompress() method, which is already
unwieldy enough.

As an alternative, I propose a thin wrapper around the underlying C API:

def decompress_into(self, src, dst, src_start=0, dst_start=0): ...

This would store decompressed data in a caller-provided bytearray, and
return a pair of integers indicating the end points of the consumed and
produced data in the respective buffers.

The implementation should be extremely simple - it does not need to do
any memory allocation or reference management.

I think it could also be useful for optimizing the implementation of
BZ2File and LZMAFile. I plan to write a prototype and run some benchmarks
some time in the next few weeks.

(Aside: if implemented for zlib, this could also be a nicer (I think)
 solution for the problem raised in issue 5804.)

--
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16316] Support xz compression in mimetypes module

2012-10-28 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16316
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data

2012-10-28 Thread Nadeem Vawda

New submission from Nadeem Vawda:

From issue 5210:

amaury.forgeotdarc wrote:
 Hm, I tried a modified version of your first test, and I found another
 problem with the current zlib library;
 starting with the input:
 x = x1 + x2 + HAMLET_SCENE# both compressed and uncompressed data

 The following scenario is OK:
 dco.decompress(x) # returns HAMLET_SCENE
 dco.unused_data   # returns HAMLET_SCENE

 But this one:
 for c in x:
 dco.decompress(x) # will return HAMLET_SCENE, in several pieces
 dco.unused_data   # only one character, the last of (c in x)!

 This is a bug IMO: unused_data should accumulate all the extra uncompressed 
 data.

Ideally, I would prefer to raise an EOFError if decompress() is called
after end-of-stream is reached (for consistency with BZ2Decompressor).
However, accumulating the data in unused_data is closer to being backward-
compatible, so it's probably the better approach to take.

--
components: Library (Lib)
files: zlib_unused_data_test.py
messages: 174056
nosy: amaury.forgeotdarc, nadeem.vawda
priority: normal
severity: normal
stage: needs patch
status: open
title: zlib.Decompress.decompress() after EOF discards existing value of 
unused_data
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file27767/zlib_unused_data_test.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16350
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5210] zlib does not indicate end of compressed stream properly

2012-10-28 Thread Nadeem Vawda

Nadeem Vawda added the comment:

This bug (zlib not providing a way to detect end-of-stream) has already
been fixed - see issue 12646.

I've opened issue 16350 for the unused_data problem.

--
resolution:  - out of date
stage: test needed - committed/rejected
status: open - closed
superseder:  - zlib.Decompress.decompress/flush do not raise any exceptions 
when given truncated input streams

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5210
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data

2012-10-28 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Interesting idea, but I'm not sure it would be worth the effort. It would
make the code and API more complicated, so it wouldn't really help users,
and would be an added maintenance burden.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16350
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12692] test_urllib2net is triggering a ResourceWarning

2012-10-21 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12692
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5148] gzip.open breaks with 'U' flag

2012-10-21 Thread Nadeem Vawda

Nadeem Vawda added the comment:

The data corruption issue is now fixed in the 2.7 branch.

In 3.x, using a mode containing 'U' results in an exception rather than silent 
data corruption. Additionally, gzip.open() has supported text modes 
(rt/wt/at) and newline translation since 3.3 [issue 13989].

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed
versions: +Python 2.7 -Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5148
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14398] bz2.BZ2DEcompressor.decompress fail on large files

2012-10-21 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I'm working on it now. Will push in the next 15 minutes or so.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14398
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14398] bz2.BZ2DEcompressor.decompress fail on large files

2012-10-21 Thread Nadeem Vawda

Nadeem Vawda added the comment:

All fixed, along with some other similar but harder-to-trigger bugs.

Thanks for the bug report, Laurent!

--
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14398
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10050] urllib.request still has old 2.x urllib primitives

2012-10-14 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Hmm, OK. URLopener and FancyURLopener do each issue a DeprecationWarning when 
used, though. If they are not actually deprecated, perhaps we should remove the 
warnings for the moment?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10050
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14214] test_concurrent_futures hangs

2012-10-13 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
resolution:  - works for me
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14214
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14120] ARM Ubuntu 3.x buildbot failing test_dbm

2012-10-13 Thread Nadeem Vawda

Nadeem Vawda added the comment:

No sign of these failures any more; looks like that fixed it.

--
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14120
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14229] On KeyboardInterrupt, the exit code should mirror the signal number

2012-10-13 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
resolution:  - rejected
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14229
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10050] urllib.request still has old 2.x urllib primitives

2012-10-13 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Are we still planning on removing URLopener and FancyURLopener in 3.4? The 
documentation for 3.3 does not list these classes as deprecated.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10050
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x

2012-10-08 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I've released v0.95 of bz2file, which incorporates all the optimizations 
discussed here. The performance should be similar to 2.x's bz2 in most cases.

It is still a lot slower when calling read(10) or read(1), but I hope no-one is 
doing that anywhere where performance is important ;-)

One other note: bz2file's readline() is faster when running on 3.x than on 2.x 
(and in some cases faster than the 2.x stdlib version). This is probably due to 
improvements made to io.BufferedIOBase.readline() since 2.7, but I haven't had 
a chance to investigate this.

Let me know if you have any issues with the new release.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16034
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x

2012-10-01 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Ah, nice - I didn't think of that optimization. Neater and faster.

I've committed this patch [e6d872b61c57], along with a minor bugfix  
[7252f9f95fe6], and another optimization for readline()/readlines() 
[6d7bf512e0c3]. [merge with default: a19f47d380d2]

If you're wondering why the Roundup Robot didn't update the issue 
automatically, it's because I made a typo in each of the commit messages. 
Apparently 16304 isn't the same as 16034. Who would have thought it? :P

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16034
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x

2012-09-30 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 Yes, of course.

Awesome. I plan to do a new release for this in the next couple of days.


 We can even speed up 1.5 times the reading of small chunks, if we inline 
 _check_can_read() and _read_block().

Interesting idea, but I don't think it's worthwhile. It looks like this is only 
a noticeable improvement if size is 10 or 1, and I don't think these are common 
cases (especially not for users who care about performance). Also, I'm 
reluctant to have two copies of the code for _read_block(); it makes the code 
harder to read, and increases the chance of introducing a bug when changing the 
code.


 The same approach is applied for LZMAFile.

Of course. I'll apply these optimizations to LZMAFile next weekend.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16034
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x

2012-09-30 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 Recursive inline _check_can_read() will be enough. Now this check calls 4 
 Python functions (_check_can_read(), readable(), _check_non_closed(), 
 closed). Recursive inlining only readable() in _check_can_read() is achieved 
 significant but less (about 30%) effect.

I've inlined readable() into _check_can_read() [3.3: 4258248a44c7 | default: 
abb5c5bde872]. This seems like a good balance between maximizing our 
performance in edge cases and not turning the code into a mess in the process ;)

Once again, thanks for your contributions!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16034
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x

2012-09-29 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Thanks for the bug report, Victor, and thank you Serhiy for the patch!

Serhiy, would you be OK with me also including this patch in the bz2file 
package?

--
resolution:  - fixed
stage:  - committed/rejected
status: open - closed
versions: +Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16034
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15955] gzip, bz2, lzma: add method to get decompressed size

2012-09-23 Thread Nadeem Vawda

Nadeem Vawda added the comment:

As far as I can tell, there is no way to find this out reliably without 
decompressing the entire file. With gzip, the file trailer contains the 
uncompressed size modulo 2^32, but this seems less than useful. It appears that 
the other two formats do not store the total uncompressed data size in any form.

For bz2 and lzma, one can get the uncompressed size by doing f.seek(0, 2) 
followed by f.tell(). However this approach is ugly and potentially very slow, 
so I would be reluctant to add a method based on it to the (BZ2|LZMA)File 
classes.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15955
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15666] PEP 3121, 384 refactoring applied to lzma module

2012-08-15 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Thanks for the patch. Unfortunately I don't have much free time at the
moment, so it might be a few weeks before I get a chance to review it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15666
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15664] test_curses not run with 'make test'

2012-08-15 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
superseder:  - test_curses skipped on buildbots

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15664
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12669] test_curses skipped on buildbots

2012-08-15 Thread Nadeem Vawda

Nadeem Vawda added the comment:

 Nadeem: is the failure you show in msg141798 with a version of test_curses 
 that uses pty.openpty?

Yes, I tried the following change:

--- a/Lib/test/test_curses.py
+++ b/Lib/test/test_curses.py
@@ -328,11 +328,12 @@
 curses.resetty()

 def test_main():
-if not sys.__stdout__.isatty():
-raise unittest.SkipTest(sys.__stdout__ is not a tty)
 # testing setupterm() inside initscr/endwin
 # causes terminal breakage
-curses.setupterm(fd=sys.__stdout__.fileno())
+#curses.setupterm(fd=sys.__stdout__.fileno())
+import pty
+_, pty = pty.openpty()
+curses.setupterm(fd=pty)
 try:
 stdscr = curses.initscr()
 main(stdscr)

(I've never used openpty, either in Python or in C, so I can't vouch for
the correctness of this usage.)


 If it isn't: I'd expect more test failures on buildbot machines where the 
 buildbot agent is started as a system daemon, in which case the process 
 doesn't have a tty at all. Using pty.openpty it would be possible to ensure 
 that there is a pty that can be used for the test.

Looking at the actual buildbot results, most of the *nix bots I checked
are actually skipping this test; the only one I could find that wasn't is
the x86 Ubuntu Shared bot:
ttp://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%203.x/builds/6640/steps/test/logs/stdio

So it looks like on most of the bots, buildbot is running without a tty.
Then, test_main() sees that sys.__stdout__ isn't suitable to run the
test, and bails out.

It'd be great if you can come up with a fix that gets the test running
in this environment, but it'll probably be more complicated than just
slotting in a call to openpty().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12669
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12669] test_curses skipped on buildbots

2012-08-15 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12669
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15654] PEP 384 Refactoring applied to bz2 module

2012-08-14 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
nosy: +nadeem.vawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15654
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15546] Iteration breaks with bz2.open(filename,'rt')

2012-08-05 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Before these fixes, it looks like all three classes' peek() methods were 
susceptible
to the same problem as read1().

The fixes for BZ2File.read1() and LZMAFile.read1() should have fixed peek() as 
well;
both methods are implemented in terms of _fill_buffer().

For GzipFile, peek() is still potentially broken - I'll push a fix shortly.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15546
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15546] Iteration breaks with bz2.open(filename,'rt')

2012-08-05 Thread Nadeem Vawda

Nadeem Vawda added the comment:

No, if _read() is called once the file is already at EOF, it raises an
EOFError (http://hg.python.org/cpython/file/8c07ff7f882f/Lib/gzip.py#l433),
which will then break out of the loop.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15546
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15546] Iteration breaks with bz2.open(filename,'rt')

2012-08-04 Thread Nadeem Vawda

Nadeem Vawda added the comment:

OK, BZ2File should now be fixed. It looks like LZMAFile and GzipFile may
be susceptible to the same problem; I'll push fixes for them shortly.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15546
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15546] Iteration breaks with bz2.open(filename,'rt')

2012-08-04 Thread Nadeem Vawda

Nadeem Vawda added the comment:

Done.

Thanks for the bug report, David.

--
resolution:  - fixed
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15546
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15546] Iteration breaks with bz2.open(filename,'rt')

2012-08-03 Thread Nadeem Vawda

Nadeem Vawda added the comment:

I can't seem to reproduce this with an up-to-date checkout from Mercurial:

 import bz2
 g = bz2.open('access-log-0108.bz2','rt')
 next(g)
'140.180.132.213 - - [24/Feb/2008:00:08:59 -0600] GET /ply/ply.html 
HTTP/1.1 200 97238\n'

(where 'access-log-0108.bz2' is a file I created with the output above as
its first line, and a couple of other lines of random junk following that)

Would it be possible for you to upload the file you used to trigger this
bug?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15546
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15546] Iteration breaks with bz2.open(filename,'rt')

2012-08-03 Thread Nadeem Vawda

Nadeem Vawda added the comment:

The cause of this problem is that BZ2File.read1() sometimes returns b, even 
though
the file is not at EOF. This happens when the underlying BZ2Decompressor cannot 
produce
any decompressed data from just the block passed to it in _fill_buffer(); in 
this case, it needs to read more of the compressed stream to make progress.

It would seem that BZ2File cannot satisfy the contract of the read1() method - 
we
can't guarantee that a single call to the read() method of the underlying file 
will
allow us to return a non-empty result, whereas returning b is reserved for the
case where we have reached EOF.

Simply removing the read1() method would simply trade this problem for a bigger 
one
(resurrecting issue 10791), so I propose amending BZ2File.read1() to make as 
many reads
from the underlying file as necessary to return a non-empty result.

Antoine, what do you think of this?

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15546
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15405] Invitation to connect on LinkedIn

2012-07-20 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
resolution:  - invalid
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15405
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15204] Deprecate the 'U' open mode

2012-06-28 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

+1 for the general idea of deprecating and eventually removing the U
modes.

But I agree with David, that it doesn't make sense to have separate steps
for 3.5 and 3.6/4.0. If you make the code raise an exception when U is
used, how is that different from what will happen when you remove the
code for processing it? Surely we want it to eventually be treated just
like any other invalid mode string?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15204
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13876] Sporadic failure in test_socket

2012-06-27 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Merging nosy list from duplicate issue 15155.

--
nosy: +giampaolo.rodola, neologix, pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13876
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12559] gzip.open() needs an optional encoding argument

2012-06-26 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

I already fixed this without knowing about this issue; see 55202ca694d7.


storchaka:
 Why not use io.TextWrapper? I think it is the right answer for this issue.

The proposed patch (and the code I committed) *do* use TextIOWrapper.

Unless you mean that callers should create the TextIOWrapper themselves.
This is certainly possible, but quite inconvenient for something that is
conceptually simple, and not difficult to implement.


amaury.forgeotdarc:
 There remains a difference between open() and gzip.open():
 open(filename, 'r', encoding=None) is a text file (with a default encoding), 
 gzip.open() with the same arguments returns a binary file.

The committed code unfortunately still has gzip.open(filename, r)
returning a binary file. This is something that cannot be fixed without
breaking backward compatibility.

However, it does provide a way to open a text file with the system's
default encoding (encoding=None, or no encoding argument specified).
To do this, you can use the rt/wt/at modes, just like with
builtins.open(). Of course, this also works if you do specify an encoding
explicitly.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed
versions: +Python 3.3 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12559
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10376] ZipFile unzip is unbuffered

2012-06-23 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Patch looks fine to me.

Antoine, can you commit this? I'm currently away from the computer that
has my SSH key on it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10376
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14684] zlib set dictionary support inflateSetDictionary

2012-06-21 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

 Just saw this on the checkins list; where are the other options documented? 

They aren't, AFAIK. I've been planning on adding them when I've got time
(based on the zlib manual at http://zlib.net/manual.html), but with the
upcoming feature freeze for 3.3, this issue was higher priority.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14684
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14684] zlib set dictionary support inflateSetDictionary

2012-06-20 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Committed. Once again, thanks for the patch!

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14684
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14684] zlib set dictionary support inflateSetDictionary

2012-06-19 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

 To restate my position: the need is for an immutable string of bytes, [...]

I disagree that we should require the dictionary to be immutable - if the
caller wishes to use a mutable buffer here, it is their responsibility to
ensure that it is not modified until the compressor is finished with it
(consenting adults and all that). The documentation can inform users of
this requirement.


 I believe the argument for aesthetics does not apply, as the constant
 dictionary constructor argument is a morally different kind of
 parameter, comparable to (say) the compression level.

Even so, the surrounding code sets a precedent for how it accepts binary
data buffers, and deviating from this existing convention should not be
taken lightly.


Nitpicking about the API aside, thanks for the patch :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14684
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14684] zlib set dictionary support inflateSetDictionary

2012-06-19 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

I plan to commit it (along with the buffer API changes) tomorrow.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14684
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15087] Add gzip function to read gzip'd strings

2012-06-16 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

There is already such a function, gzip.decompress() - it was added in 3.2.

--
nosy: +nadeem.vawda
resolution:  - invalid
stage:  - committed/rejected
status: open - pending

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15087
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   4   5   6   >