[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: Sorry, I just haven't had any free time lately, and may still not be able to give this the attention it deserves for another couple of weeks. Serhiy, would you be interested in reviewing Nikolaus' patch? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: I've posted a review at http://bugs.python.org/review/15955/. (For some reason, it looks like Rietveld didn't send out email notifications. But maybe it never sends a notification to the sender? Hmm.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: Thanks for the patch, Nikolaus. I'm afraid I haven't had a chance to look over it yet; this past week has been a bit crazy for me. I'll definitely get back to you with a review in the next week, though. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20781] BZ2File doesn't decompress some .bz2 files correctly
Nadeem Vawda added the comment: How does one create a multi-stream bzip2 file in the first place? If you didn't do so deliberately, I would guess that you used a parallel compression tool like pbzip2 or lbzip2 to create your bz2 file. These tools work by splitting the input into chunks, compressing each chunk as a separate stream, and then concatenating these streams afterward. Another possibility is that you just concatenated two existing bz2 files, e.g.: $ cat first.bz2 second.bz2 multi.bz2 And how do I tell it's multi-stream. I don't know of any pre-existing tools to do this, but you can write a script for it yourself, by feeding the file's data through a BZ2Decompressor. When the decompress() method raises EOFError, you're at the end of the first stream. If the decompressor's unused_data attribute is non-empty, or there is data that has not yet been read from the input file, then it is either (a) a multi-stream bz2 file or (b) a bz2 file with other metadata tacked on to the end. To distinguish between cases (a) and (b), take unused_data + rest_of_input_file and feed it into a new BZ2Decompressor. If don't get an IOError, then you've got a multi-stream bz2 file. (If you *do* get an IOError, then that's case (b) - someone's appended non-bz2 data to the end of a bz2 file. For example, Gentoo and Sabayon Linux packages are bz2 files with package metadata appended, according to issue 19839.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20781 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20781] BZ2File doesn't decompress some .bz2 files correctly
Nadeem Vawda added the comment: As Serhiy said, multi-stream support was only added to the bz2 module in 3.3, and there is no plan to backport functionality this to 2.7. However, the bz2file package on PyPI [1] does support multi-stream inputs, and you can use its BZ2File class as a drop-in replacement for the built-in one on 2.7. [1] https://pypi.python.org/pypi/bz2file -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20781 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: After some consideration, I've come to agree with Serhiy that it would be better to keep a private internal buffer, rather than having the user manage unconsumed input data. I'm also in favor of having a flag to indicate whether the decompressor needs more input to produce more decompressed data. (I'd prefer to call it 'needs_input' or similar, though - 'data_ready' feels too vague to me.) In msg176883 and msg177228, Serhiy raises the possibility that the compressor might be unable to produce decompressed output from a given piece of (non-empty) input, but will still leave the input unconsumed. I do not think that this can actually happen (based on the libraries' documentation), but this API will work even if that situation can occur. So, to summarize, the API will look like this: class LZMADecompressor: ... def decompress(self, data, max_length=-1): Decompresses *data*, returning uncompressed data as bytes. If *max_length* is nonnegative, returns at most *max_length* bytes of decompressed data. If this limit is reached and further output can be produced, *self.needs_input* will be set to False. In this case, the next call to *decompress()* should provide *data* as b'' to obtain more of the output. If all of the input data was decompressed and returned (either because this was less than *max_length* bytes, or because *max_length* was negative), *self.needs_input* will be set to True. ... Data not consumed due to the use of 'max_length' should be saved in an internal buffer (that is not exposed to Python code at all), which is then prepended to any data provided in the next call to decompress() before providing the data to the underlying compression library. The cases where either the internal buffer or the new data are empty should be optimized to avoid unnecessary allocations or copies, since these will be the most common cases. Note that this API does not need a Python-level 'unconsumed_tail' attribute - its role is served by the internal buffer (which is private to the C module implementation). This is not to be confused with the already-existing 'unused_data' attribute that stores data found after the end of the compressed stream. 'unused_data' should continue to work as before, regardless of whether decompress() is called with a max_length argument or not. As a starting point I would suggest writing a patch for LZMADecompressor first, since its implementation is a bit simpler than BZ2Decompressor. Once this patch and an analogous one for BZ2Decompressor have been committed, we can then convert GzipFile, BZ2File and LZMAFile to use this feature. If you have any questions while you're working on this issue, feel free to send them my way. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20193] Derby: Convert the zlib, _bz2 and _lzma modules to use Argument Clinic
Nadeem Vawda added the comment: The latest patch for zlib seems to be missing Modules/zlibmodule.clinic.c I suppose that zdict=b'' have same effect as not specifying zdict. Am I right? Probably, but to be on the safe side I'd prefer that we preserve the behavior of not calling deflateSetDictionary/inflateSetDictionary unless the caller explicitly provides zdict. If you need to give a Python default value, rather use None than b''. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20193 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20193] Derby: Convert the zlib, _bz2 and _lzma modules to use Argument Clinic
Nadeem Vawda added the comment: The patch for zlib looks good to me. Thanks for working on this, Serhiy. We're not allowing changes in semantics for Argument Clinic conversion for 3.4. If it doesn't currently accept None, we can't add it right now, and we'll have to save it for 3.5. Fair enough. The behavior is preserved. This case is exact analogue of _sha1.sha1(). No one additional function called when the parameter is not specified, but if it is specified as b'', the function behaves identically to not passing in that parameter. Ah OK, I misunderstood the Argument Clinic input code when I first read it. Having actually read the docs, it makes sense. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20193 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20358] test_curses is failing
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- nosy: +nadeem.vawda ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20358 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20358] test_curses is failing
Nadeem Vawda added the comment: I can reproduce this (also on Ubuntu 13.10 64-bit). Maybe there's a bug in the version of curses distributed with the latest Ubuntu release? It looks like our only Ubuntu buildbot is using 8.04 (almost 6 years old!). Also note that you won't be able to reproduce this with make test or make testall (see issue 12669). make buildbottest does catch the bug, though (which also rules out the possibility that the buildbots are just skipping the test). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20358 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: No, I'm afraid I haven't had a chance to do any work on this issue since my last message. I would be happy to review a patch for this, but before you start writing one, we should settle on how the API will look. I'll review the existing discussion in detail over the weekend and come up with something that avoids the potential problems raised by Serhiy. -- versions: +Python 3.5 -Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20193] Derby: Convert the zlib, _bz2 and _lzma modules to use Argument Clinic
Nadeem Vawda added the comment: The bz2 patch looks good to me, aside from a nit with the docstring for BZ2Compressor.__init__. The lzma patch produces a bunch of test failures for me. It looks like the __init__ methods for LZMACompressor and LZMADecompressor aren't accepting keyword args: ☿ ./python -c 'import lzma; lzma.LZMACompressor(format=lzma.FORMAT_XZ)' Traceback (most recent call last): File string, line 1, in module TypeError: __init__ does not take keyword arguments ☿ ./python -c 'import lzma; lzma.LZMADecompressor(format=lzma.FORMAT_AUTO)' Traceback (most recent call last): File string, line 1, in module TypeError: __init__ does not take keyword arguments -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20193 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20193] Derby: Convert the zlib, _bz2 and _lzma modules to use Argument Clinic
Nadeem Vawda added the comment: The patches for bz2 and lzma look good to me, aside from one nit for lzma. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20193 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20182] Derby #13: Convert 50 sites to Argument Clinic across 5 files
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- nosy: +nadeem.vawda ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20182 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20184] Derby #16: Convert 50 sites to Argument Clinic across 9 files
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- nosy: +nadeem.vawda ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20184 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20185] Derby #17: Convert 50 sites to Argument Clinic across 14 files
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- nosy: +nadeem.vawda ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20185 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19885] lzma segfault when __init__ with non-existent file after executing the constructor (Python 2.7)
Nadeem Vawda added the comment: To clarify, which version(s) does this affect? I have not been able to reproduce against 3.4, and 2.7 does not included the lzma module in the first place. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19885 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19878] bz2.BZ2File.__init__() cannot be called twice
Nadeem Vawda added the comment: It appears that this *does* affect 2.7 (though not 3.2, 3.3 or 3.4, fortunately): ~/src/cpython/2.7☿ gdb --ex run --args ./python -c 'import bz2; obj = bz2.BZ2File(/dev/null); obj.__init__()' «... snip banner ...» Starting program: /home.u/nadeem/src/cpython/2.7/./python -c import\ bz2\;\ obj\ =\ bz2.BZ2File\(\/dev/null\\)\;\ obj.__init__\(\\\) [Thread debugging using libthread_db enabled] Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1. Traceback (most recent call last): File string, line 1, in module IOError: [Errno 2] No such file or directory: '' Program received signal SIGSEGV, Segmentation fault. 0x00431d3e in PyFile_DecUseCount (fobj=0x0) at Objects/fileobject.c:89 89 fobj-unlocked_count--; -- assignee: - nadeem.vawda nosy: +nadeem.vawda stage: - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19878 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19839] bz2: regression wrt supporting files with trailing garbage after EOF
Nadeem Vawda added the comment: I'll have a patch for this in the next couple of days (and a similar one for the lzma module, which has the same issue (even though it's not a regression in that case)). In the meanwhile, you can work around this by feeding the compressed data to a BZ2Decompressor yourself - it stops at the end of the bz2 stream, with any leftover data stored in its 'unused_data' attribute. -- assignee: - nadeem.vawda stage: - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19839 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19395] unpickled LZMACompressor is crashy
Nadeem Vawda added the comment: The part of this issue specific to LZMACompressor should now be fixed; I've filed issue 19425 for the issue with Pool.map hanging. -- resolution: - fixed stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19395 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19425] multiprocessing.Pool.map hangs if pickling argument raises an exception
New submission from Nadeem Vawda: [Split off from issue 19395] The following code hangs after hitting a TypeError trying to pickle one of the TextIOWrapper objects: import multiprocessing def read(f): return f.read() files = [open(path) for path in 3 * ['/dev/null']] pool = multiprocessing.Pool() results = pool.map(read, files) print(results) This issue is present in 3.2, 3.3 and 3.4, but not in 2.7. -- components: Library (Lib) messages: 201580 nosy: cantor, jnoller, nadeem.vawda, pitrou, python-dev, sbt, tim.peters priority: normal severity: normal stage: needs patch status: open title: multiprocessing.Pool.map hangs if pickling argument raises an exception type: behavior versions: Python 3.3, Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19425 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19227] test_multiprocessing_xxx hangs under Gentoo buildbots
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- nosy: +nadeem.vawda ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19227 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19395] unpickled LZMACompressor is crashy
Nadeem Vawda added the comment: It looks like there's also a separate problem in the multiprocessing module. The following code hangs after hitting a TypeError trying to pickle one of the TextIOWrapper objects: import multiprocessing def read(f): return f.read() files = [open(path) for path in 3 * ['/dev/null']] pool = multiprocessing.Pool() results = pool.map(read, files) print(results) -- nosy: +jnoller, sbt ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19395 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19395] unpickled LZMACompressor is crashy
Nadeem Vawda added the comment: As far as I can tell, liblzma provides no way to serialize a compressor's state, so the best we can do is raise a TypeError when attempting to pickle the LZMACompressor (and likewise for LZMADecompressor). Also, it's worth pointing out that the provided code wouldn't work even if you could serialize LZMACompressor objects - each call to compress() updates the compressor's internal state with information needed by the final call to flush(), but each compress() call would be made on a *copy* of the compressor rather than the original object. So flush() would end up producing bogus data (and mostly likely all compress() calls after the first would too). If you are trying to do this because LZMA compression is too slow, I'd suggest you try using zlib or bz2 instead - both of these algorithms can compress faster than LZMA (at the expense of your compression ratio). zlib is faster on both compression and decompression, while bz2 is slower than lzma at decompression. Alternatively, you can do parallel compression by calling lzma.compress() on each block (instead of creating an LZMACompressor), and then joining the results. But note that (a) this will give you a worse compression ratio than serial compression (because it can't exploit redundancy shared between blocks), and (b) using multiprocessing has a performance overhead of its own, because you will need to copy the input when sending it to the worker subprocess, and then copy the result when sending it back to the main process. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19395 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19395] unpickled LZMACompressor is crashy
Nadeem Vawda added the comment: Yes, that's because the builtin map function doesn't handle each input in a separate process, so it uses the same LZMACompressor object everywhere. Whereas multiprocessing.Pool.map creates a new copy of the compressor object for each input, which is where the problem comes in. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19395 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19222] Add 'x' mode to gzip.open()
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- assignee: - nadeem.vawda nosy: +nadeem.vawda resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19222 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19223] Add 'x' mode to bz2.open()
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- assignee: - nadeem.vawda resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19223 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19201] Add 'x' mode to lzma.open()
Nadeem Vawda added the comment: Fix committed. Thanks for the patches! As Jesús and Terry have said, this won't be backported to 3.3/2.7, since it is a new feature. [oylenshpeegul] | It's weird how different these three patches are! We're | essentially doing the same thing: please allow the x option to pass | through to builtins.open. Why don't these three modules look more alike? Mostly because they were written at different times, by different people, with different things to be backward-compatible with. Ideally they would share the bulk of their code, but it's tricky to do that without changing behavior in some corner cases. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19201 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19201] Add 'x' mode to lzma.open()
Nadeem Vawda added the comment: [terry.reedy] | Arfrever's point about the order of characters makes me wonder why mode | strings (as opposed to characters in the strings) are being checked. | The following tests that exactly one of w, a, x appear in mode. | if len({'w', 'a', 'x'} set(mode)) == 1: | If mode is eventually passed to open(), the latter would do what ever | it does with junk chars in mode (such as 'q'). There are two separate questions here - how rigid we are about modes containing only valid characters, and how we handle invalid characters. I don't think there's any point in passing through unrecognized chars to builtins.open(), since it results in a ValueError either way. On the first point, the code only accepts modes like 'r' and 'rb' (but not 'br') for the sake of simplicity. There doesn't seem to be much practical value in accepting arbitrarily-ordered modes, but if someone has a compelling use-case (or a patch that's no more complex than the status quo), please bring it up in a separate issue. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19201 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18003] New lzma crazy slow with line-oriented reading.
Nadeem Vawda added the comment: No, that is the intended behavior for binary streams - they operate at the level of individual byes. If you want to treat your input file as Unicode-encoded text, you should open it in text mode. This will return a TextIOWrapper which handles the decoding and line splitting properly. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18003 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18003] New lzma crazy slow with line-oriented reading.
Nadeem Vawda added the comment: I agree that making lzma.open() wrap its return value in a BufferedReader (or BufferedWriter, as appropriate) is the way to go. I'm currently travelling and don't have my SSH key with me - Serhiy, can you make the change? I'll put together a documentation patch that recommends using lzma.open() rather than LZMAFile directly, and mentions the performance implications. Interestingly, opening in text (i.e. unicode) mode is almost as fast as with a BufferedReader: This is because opening in text mode returns a TextIOWrapper, which is written in C, and presumably does its own buffering on top of LZMAFile.read1() instead of calling LZMAFile.readline(). From my perspective default wrapping with io.BufferedReader is a great idea. I can't think of who would suffer. Maybe someone who wants to open thousands of simultaneous streams wouldn't appreciate the memory overhead. If that person exists then he would want an option to turn it off. If someone doesn't want the BufferedReader/BufferedWriter, they can create an LZMAFile directly; we don't plan to remove that possibility. So I don't think that should be a problem. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18003 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18003] New lzma crazy slow with line-oriented reading.
Nadeem Vawda added the comment: I agree that making lzma.open() wrap its return value in a BufferedReader (or BufferedWriter, as appropriate) is the way to go. On second thoughts, there's no need to change the behavior for mode='wb'. We can just return a BufferedReader for mode='rb', and leave the current behavior (returning a raw LZMAFile) in place for mode='wb'. I also ran some additional benchmarks for the bz2 and gzip modules. It looks like those two modules would also benefit from having their open() functions use io.BufferedReader: [lzma] $ time xzcat src.xz | wc -l 1057980 real0m0.543s user0m0.556s sys 0m0.024s $ ../cpython/python -m timeit -s 'import lzma, io' 'f = lzma.open(src.xz, r)' 'for line in f: pass' 10 loops, best of 3: 2.01 sec per loop $ ../cpython/python -m timeit -s 'import lzma, io' 'f = io.BufferedReader(lzma.open(src.xz, r))' 'for line in f: pass' 10 loops, best of 3: 795 msec per loop [bz2] $ time bzcat src.bz2 | wc -l 1057980 real0m1.322s user0m1.324s sys 0m0.044s $ ../cpython/python -m timeit -s 'import bz2, io' 'f = bz2.open(src.bz2, r)' 'for line in f: pass' 10 loops, best of 3: 3.71 sec per loop $ ../cpython/python -m timeit -s 'import bz2, io' 'f = io.BufferedReader(bz2.open(src.bz2, r))' 'for line in f: pass' 10 loops, best of 3: 2.04 sec per loop [gzip] $ time zcat src.gz | wc -l 1057980 real0m0.310s user0m0.296s sys 0m0.028s $ ../cpython/python -m timeit -s 'import gzip, io' 'f = gzip.open(src.gz, r)' 'for line in f: pass' 10 loops, best of 3: 1.94 sec per loop $ ../cpython/python -m timeit -s 'import gzip, io' 'f = io.BufferedReader(gzip.open(src.gz, r))' 'for line in f: pass' 10 loops, best of 3: 556 msec per loop -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18003 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18003] New lzma crazy slow with line-oriented reading.
Nadeem Vawda added the comment: Have you tried running the benchmark against the default (3.4) branch? There was some significant optimization work done in issue 16034, but the changes were not backported to 3.3. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18003 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17843] Lib/test/testbz2_bigmem.bz2 trigger virus warnings
Nadeem Vawda added the comment: Benjamin, please cherry-pick this for 2.7.4 as well (changesets b7bfedc8ee18 and 529c4defbfd7). -- stage: needs patch - commit review versions: +Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17843 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17843] Lib/test/testbz2_bigmem.bz2 trigger virus warnings
Nadeem Vawda added the comment: OK, 2.7 is done. Georg, what do we want to do for 3.2? I've attached a patch. -- assignee: nadeem.vawda - georg.brandl keywords: +patch Added file: http://bugs.python.org/file30049/bz2-viruswarning.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17843 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17843] Lib/test/testbz2_bigmem.bz2 trigger virus warnings
Nadeem Vawda added the comment: Oh dear. I'll update the test suite over the weekend. In the meanwhile, Christian, can you confirm which versions are affected? The file should only have been included in 2.7 and 3.2. -- assignee: - nadeem.vawda ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17843 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14398] bz2.BZ2DEcompressor.decompress fail on large files
Nadeem Vawda added the comment: Hmm, so actually most of the bugs fixed in 2.7 and 3.2 weren't present in 3.3 and 3.4, and those versions already had tests equivalent to the tests I added for 2.7/3.2. As for the changes that I did make to 3.3/3.4: - two of the three cover cases that only occur if the output data is larger than ~32GiB. Even if we have a buildbot with enough memory for it (which I don't think we do), actually running such tests would take forever and then some. - the third is for a condition that's actually pretty much impossible to trigger - grow_buffer() has to be called on a buffer that is already at least 8*((size_t)-1)/9 bytes long. On a 64-bit system this is astronomically large, while on a 32-bit system the OS will probably have reserved more than 1/9th of the virtual address space for itself, so it won't be possible to allocate a large enough buffer. -- status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14398 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14398] bz2.BZ2DEcompressor.decompress fail on large files
Nadeem Vawda added the comment: An oversight on my part, I think. I'll add tests for 3.x this weekend. -- status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14398 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13898] Ignored exception in test_ssl
Nadeem Vawda added the comment: This change fixes the problem (and doesn't break anything else that I can see): --- a/Lib/test/test_ssl.py +++ b/Lib/test/test_ssl.py @@ -979,7 +979,7 @@ self.sslconn = self.server.context.wrap_socket( self.sock, server_side=True) self.server.selected_protocols.append(self.sslconn.selected_npn_protocol()) -except ssl.SSLError as e: +except (ssl.SSLError, ConnectionResetError) as e: # XXX Various errors can have happened here, for example # a mismatching protocol version, an invalid certificate, # or a low-level bug. This should be made more discriminating. Does that look reasonable? -- stage: needs patch - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13898 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13898] Ignored exception in test_ssl
Nadeem Vawda added the comment: You could add a comment explaining the issue. Done. This doesn't seem to affect 2.7. Marking as fixed in 3.2/3.3/3.4. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed versions: -Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13898 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13886] readline-related test_builtin failure
Nadeem Vawda added the comment: You're right; it breaks backspacing over multibyte characters. I should have tested it more carefully before committing. I'll revert the changes. -- resolution: fixed - stage: committed/rejected - needs patch status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13886 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1159051] Handle corrupted gzip files with unexpected EOF
Nadeem Vawda added the comment: I think the new behavior should be controlled by a constructor flag, maybe named defer_errors. I don't like the idea of adding the flag to read(), since that makes us diverge from the standard file interface. Making a distinction between size0 and size=None seems confusing and error-prone, not to mention that we (again) would have read() work differently from most other file classes. I'd prefer it if the new behavior is not enabled by default for size=0, even if this wouldn't break well-behaved code. Having a flag that only controls the size0 case is inelegant, and I don't think we should change the default behavior unless there is a clear benefit to doing so. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1159051 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13886] readline-related test_builtin failure
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- assignee: - nadeem.vawda resolution: - fixed stage: patch review - committed/rejected status: open - closed versions: +Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13886 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1159051] Handle corrupted gzip files with unexpected EOF
Nadeem Vawda added the comment: The updated patch looks good to me. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1159051 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1159051] Handle corrupted gzip files with unexpected EOF
Nadeem Vawda added the comment: I've reviewed the patch and posted some comments on Rietveld. I doubt about backward compatibility. It's obvious that struct.error and TypeError are unintentional, and EOFError is purposed for this case. However users can catch undocumented but de facto exceptions and doesn't expect EOFError. I think it's fine for us to change it to raise EOFError in these cases. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1159051 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: What if unconsumed_tail is not empty but less than needed to decompress at least one byte? We need read more data until unconsumed_tail grow enought to be decompressed. This is possible in zlib, but not in bz2. According to the manual [1], it is perfectly OK to supply one byte at a time. For xz, I'm not sure whether this problem could occur. I had assumed that it could not, but I may be mistaken ;-). Unfortunately liblzma has no proper manual, so I'll have to dig into the implementation to find out, and I haven't had the time to do this yet. [As an aside, it would be nice if the documentation for the zlib module mentioned this problem. We can't assume that users of the Python module are familiar with the C API for zlib...] [1] http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16943] seriously? FileCookieJar can't really save ? save method is NotImplemented
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- resolution: - duplicate stage: - committed/rejected status: open - closed superseder: - seriously? urllib still doesn't support persistent connections? ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16943 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16828] bz2 error on compression of empty string
Nadeem Vawda added the comment: Fixed. Thanks for the bug report and the patches! -- assignee: - nadeem.vawda keywords: +3.3regression -patch resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16828 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: # Using zlib's interface while not d.eof: compressed = d.unconsumed_tail or f.read(8192) if not compressed: raise ValueError('End-of-stream marker not found') output = d.decompress(compressed, 8192) # process output This is not usable with bzip2. Bzip2 uses large block size and unconsumed_tail can be non empty but decompress() will return b''. With zlib you possible can see the same effect on some input when read by one byte. I don't see how this is a problem. If (for some strange reason) the application-specific processing code can't handle empty blocks properly, you can just stick if not output: continue before it. Actually it should be: # Using zlib's interface while not d.eof: output = d.decompress(d.unconsumed_tail, 8192) while not output and not d.eof: compressed = f.read(8192) if not compressed: raise ValueError('End-of-stream marker not found') output = d.decompress(d.unconsumed_tail + compressed, 8192) # process output Note that you should use d.unconsumed_tail + compressed as input, and therefore do an unnecessary copy of the data. Why is this necessary? If unconsumed_tail is b'', then there's no need to prepend it (and the concatenation would be a no-op anyway). If unconsumed_tail does contain data, then we don't need to read additional compressed data from the file until we've finished decompressing the data we already have. Without explicit unconsumed_tail you can write input data in the internal mutable buffer, it will be more effective for large buffer (handreds of KB) and small input chunks (several KB). Are you proposing that the decompressor object maintain its own buffer, and copy the input data into it before passing it to the decompression library? Doesn't that just duplicate work that the library is already doing for us? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: I've tried reimplementing LZMAFile in terms of the decompress_into() method, and it has ended up not being any faster than the existing implementation. (It is _slightly_ faster for readinto() with a large buffer size, but all other cases it was either of equal performance or significantly slower.) In addition, decompress_into() is more complicated to work with than I had expected, so I withdraw my objection to the approach based on max_length/unconsumed_tail. unconsumed_tail should be private hidden attribute, which automatically prepends any consumed data. I don't think this is a good idea. In order to have predictable memory usage, the caller will need to ensure that the current input is fully decompressed before passing in the next block of compressed data. This can be done more simply with the interface used by zlib. Compare: while not d.eof: output = d.decompress(b'', 8192) if not output: compressed = f.read(8192) if not compressed: raise ValueError('End-of-stream marker not found') output = d.decompress(compressed, 8192) # process output with: # Using zlib's interface while not d.eof: compressed = d.unconsumed_tail or f.read(8192) if not compressed: raise ValueError('End-of-stream marker not found') output = d.decompress(compressed, 8192) # process output A related, but orthogonal proposal: We might want to make unconsumed_tail a memoryview (provided the input data is know to be immutable), to avoid creating an unnecessary copy of the data. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15677] Gzip/zlib allows for compression level=0
Nadeem Vawda added the comment: Committed. Thanks for the patch! -- resolution: - fixed stage: commit review - committed/rejected status: open - closed type: - enhancement ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15677 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data
Nadeem Vawda added the comment: New patch committed. Once again, thanks for all your work on this issue! -- stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16350 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16411] zlib.Decompress.decompress() retains pointer to input buffer without acquiring reference to it
Nadeem Vawda added the comment: Ah, that's much nicer than either of my ideas. Patch committed. Thanks! -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16411 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16441] range usage in gzip module leads to excessive memory usage.
Nadeem Vawda added the comment: Looks good to me. Go ahead. You needn't add or change any tests for this, but you should run the existing tests before committing, just to be safe. -- nosy: +nadeem.vawda ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16441 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: I suspect that it will be slower than the decompress_into() approach, but as you say, we need to do benchmarks to see for sure. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data
Nadeem Vawda added the comment: These were not idle questions. I wrote the patch, and I had to know what behavior is correct. Ah, sorry. I assumed you were going to submit a separate patch to fix the unconsumed_tail issues. Here's the patch. It fixes potential memory bug (unconsumed_tail sets to NULL in case of out of memory), resets the unconsumed_tail to b'' after EOF, updates unconsumed_tail and unused_data in flush(). Did you perhaps forget to attach the patch? The only ones I see are those that you uploaded last week. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16350 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data
Nadeem Vawda added the comment: Fixed. Thanks for the patch! This hacking is not needed, if first argument of PyBytes_FromStringAndSize() is NULL, the contents of the bytes object are uninitialized. Oh, cool. I didn't know about that. What should unconsumed_tail be equal after EOF? b'' or unused_data? Definitely b''. unconsumed_tail is meant to hold compressed data that should be passed in to the next call to decompress(). If we are at EOF, then decompress() should not be called again, and so it would be misleading to have unconsumed_tail be non-empty. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16350 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16411] zlib.Decompress.decompress() retains pointer to input buffer without acquiring reference to it
New submission from Nadeem Vawda: When calling zlib.Decompress.decompress() with a max_length argument, if the input data is not full consumed, the next_in pointer in the z_stream struct are left pointing into the data object, but the decompressor does not hold a reference to this object. This same pointer is reused (perhaps unintentionally) if flush() is called without calling decompress() again. If the data object gets deallocated between the calls to decompress() and to flush(), zlib will then try to access this deallocated memory, and most likely return bogus output (or segfault). See the attached script for a demonstration. I see two potential solutions: 1. Set avail_in to zero in flush(), so that it does not try to use leftover data (or whatever is else where that data used to be). 2. Have decompress() check if there is leftover data, and if so, save a reference to the object until a) we consume the rest of the data in flush(), or b) discard it in a subsequent call to decompress(). Solution 2 would be less disruptive to code that depends on the existing behavior (in non-pathological cases), but I'm don't like the maintenance burden of adding yet another thing to keep track of to the decompressor state. The PyZlib_objdecompress function is complex enough as it is, and we can expect more bugs like this to creep in the more we cram additional logic into it. So I'm more in favor of solution 1. Any thoughts? -- files: zlib_stale_ptr.py messages: 174853 nosy: nadeem.vawda, serhiy.storchaka priority: normal severity: normal stage: needs patch status: open title: zlib.Decompress.decompress() retains pointer to input buffer without acquiring reference to it type: behavior versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4 Added file: http://bugs.python.org/file27889/zlib_stale_ptr.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16411 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data
Nadeem Vawda added the comment: flush() does not update unconsumed_tail and unused_data. import zlib x = zlib.compress(b'abcdefghijklmnopqrstuvwxyz') + b'0123456789' dco = zlib.decompressobj() dco.decompress(x, 1) b'a' dco.flush() b'bcdefghijklmnopqrstuvwxyz' dco.unconsumed_tail b'NIMK\xcf\xc8\xcc\xca\xce\xc9\xcd\xcb/(,*.)-+\xaf\xa8\xac\x02\x00\x90\x86\x0b 0123456789' dco.unused_data b'' I see another bug here - described in issue 16411. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16350 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add option to limit output size
Nadeem Vawda added the comment: I agree that being able to limit output size is useful and desirable, but I'm not keen on copying the max_length/unconsumed_tail approach used by zlib's decompressor class. It feels awkward to use, and it complicates the implementation of the existing decompress() method, which is already unwieldy enough. As an alternative, I propose a thin wrapper around the underlying C API: def decompress_into(self, src, dst, src_start=0, dst_start=0): ... This would store decompressed data in a caller-provided bytearray, and return a pair of integers indicating the end points of the consumed and produced data in the respective buffers. The implementation should be extremely simple - it does not need to do any memory allocation or reference management. I think it could also be useful for optimizing the implementation of BZ2File and LZMAFile. I plan to write a prototype and run some benchmarks some time in the next few weeks. (Aside: if implemented for zlib, this could also be a nicer (I think) solution for the problem raised in issue 5804.) -- stage: - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16316] Support xz compression in mimetypes module
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16316 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data
New submission from Nadeem Vawda: From issue 5210: amaury.forgeotdarc wrote: Hm, I tried a modified version of your first test, and I found another problem with the current zlib library; starting with the input: x = x1 + x2 + HAMLET_SCENE# both compressed and uncompressed data The following scenario is OK: dco.decompress(x) # returns HAMLET_SCENE dco.unused_data # returns HAMLET_SCENE But this one: for c in x: dco.decompress(x) # will return HAMLET_SCENE, in several pieces dco.unused_data # only one character, the last of (c in x)! This is a bug IMO: unused_data should accumulate all the extra uncompressed data. Ideally, I would prefer to raise an EOFError if decompress() is called after end-of-stream is reached (for consistency with BZ2Decompressor). However, accumulating the data in unused_data is closer to being backward- compatible, so it's probably the better approach to take. -- components: Library (Lib) files: zlib_unused_data_test.py messages: 174056 nosy: amaury.forgeotdarc, nadeem.vawda priority: normal severity: normal stage: needs patch status: open title: zlib.Decompress.decompress() after EOF discards existing value of unused_data type: behavior versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4 Added file: http://bugs.python.org/file27767/zlib_unused_data_test.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16350 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5210] zlib does not indicate end of compressed stream properly
Nadeem Vawda added the comment: This bug (zlib not providing a way to detect end-of-stream) has already been fixed - see issue 12646. I've opened issue 16350 for the unused_data problem. -- resolution: - out of date stage: test needed - committed/rejected status: open - closed superseder: - zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streams ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5210 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16350] zlib.Decompress.decompress() after EOF discards existing value of unused_data
Nadeem Vawda added the comment: Interesting idea, but I'm not sure it would be worth the effort. It would make the code and API more complicated, so it wouldn't really help users, and would be an added maintenance burden. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16350 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12692] test_urllib2net is triggering a ResourceWarning
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- resolution: - fixed stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12692 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5148] gzip.open breaks with 'U' flag
Nadeem Vawda added the comment: The data corruption issue is now fixed in the 2.7 branch. In 3.x, using a mode containing 'U' results in an exception rather than silent data corruption. Additionally, gzip.open() has supported text modes (rt/wt/at) and newline translation since 3.3 [issue 13989]. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed versions: +Python 2.7 -Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5148 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14398] bz2.BZ2DEcompressor.decompress fail on large files
Nadeem Vawda added the comment: I'm working on it now. Will push in the next 15 minutes or so. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14398 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14398] bz2.BZ2DEcompressor.decompress fail on large files
Nadeem Vawda added the comment: All fixed, along with some other similar but harder-to-trigger bugs. Thanks for the bug report, Laurent! -- resolution: - fixed stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14398 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10050] urllib.request still has old 2.x urllib primitives
Nadeem Vawda added the comment: Hmm, OK. URLopener and FancyURLopener do each issue a DeprecationWarning when used, though. If they are not actually deprecated, perhaps we should remove the warnings for the moment? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10050 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14214] test_concurrent_futures hangs
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- resolution: - works for me stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14214 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14120] ARM Ubuntu 3.x buildbot failing test_dbm
Nadeem Vawda added the comment: No sign of these failures any more; looks like that fixed it. -- resolution: - fixed stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14120 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14229] On KeyboardInterrupt, the exit code should mirror the signal number
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- resolution: - rejected stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14229 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10050] urllib.request still has old 2.x urllib primitives
Nadeem Vawda added the comment: Are we still planning on removing URLopener and FancyURLopener in 3.4? The documentation for 3.3 does not list these classes as deprecated. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10050 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x
Nadeem Vawda added the comment: I've released v0.95 of bz2file, which incorporates all the optimizations discussed here. The performance should be similar to 2.x's bz2 in most cases. It is still a lot slower when calling read(10) or read(1), but I hope no-one is doing that anywhere where performance is important ;-) One other note: bz2file's readline() is faster when running on 3.x than on 2.x (and in some cases faster than the 2.x stdlib version). This is probably due to improvements made to io.BufferedIOBase.readline() since 2.7, but I haven't had a chance to investigate this. Let me know if you have any issues with the new release. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16034 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x
Nadeem Vawda added the comment: Ah, nice - I didn't think of that optimization. Neater and faster. I've committed this patch [e6d872b61c57], along with a minor bugfix [7252f9f95fe6], and another optimization for readline()/readlines() [6d7bf512e0c3]. [merge with default: a19f47d380d2] If you're wondering why the Roundup Robot didn't update the issue automatically, it's because I made a typo in each of the commit messages. Apparently 16304 isn't the same as 16034. Who would have thought it? :P -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16034 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x
Nadeem Vawda added the comment: Yes, of course. Awesome. I plan to do a new release for this in the next couple of days. We can even speed up 1.5 times the reading of small chunks, if we inline _check_can_read() and _read_block(). Interesting idea, but I don't think it's worthwhile. It looks like this is only a noticeable improvement if size is 10 or 1, and I don't think these are common cases (especially not for users who care about performance). Also, I'm reluctant to have two copies of the code for _read_block(); it makes the code harder to read, and increases the chance of introducing a bug when changing the code. The same approach is applied for LZMAFile. Of course. I'll apply these optimizations to LZMAFile next weekend. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16034 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x
Nadeem Vawda added the comment: Recursive inline _check_can_read() will be enough. Now this check calls 4 Python functions (_check_can_read(), readable(), _check_non_closed(), closed). Recursive inlining only readable() in _check_can_read() is achieved significant but less (about 30%) effect. I've inlined readable() into _check_can_read() [3.3: 4258248a44c7 | default: abb5c5bde872]. This seems like a good balance between maximizing our performance in edge cases and not turning the code into a mess in the process ;) Once again, thanks for your contributions! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16034 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16034] bz2 module appears slower in Python 3.x versus Python 2.x
Nadeem Vawda added the comment: Thanks for the bug report, Victor, and thank you Serhiy for the patch! Serhiy, would you be OK with me also including this patch in the bz2file package? -- resolution: - fixed stage: - committed/rejected status: open - closed versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16034 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15955] gzip, bz2, lzma: add method to get decompressed size
Nadeem Vawda added the comment: As far as I can tell, there is no way to find this out reliably without decompressing the entire file. With gzip, the file trailer contains the uncompressed size modulo 2^32, but this seems less than useful. It appears that the other two formats do not store the total uncompressed data size in any form. For bz2 and lzma, one can get the uncompressed size by doing f.seek(0, 2) followed by f.tell(). However this approach is ugly and potentially very slow, so I would be reluctant to add a method based on it to the (BZ2|LZMA)File classes. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15955 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15666] PEP 3121, 384 refactoring applied to lzma module
Nadeem Vawda added the comment: Thanks for the patch. Unfortunately I don't have much free time at the moment, so it might be a few weeks before I get a chance to review it. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15666 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15664] test_curses not run with 'make test'
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- superseder: - test_curses skipped on buildbots ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15664 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12669] test_curses skipped on buildbots
Nadeem Vawda added the comment: Nadeem: is the failure you show in msg141798 with a version of test_curses that uses pty.openpty? Yes, I tried the following change: --- a/Lib/test/test_curses.py +++ b/Lib/test/test_curses.py @@ -328,11 +328,12 @@ curses.resetty() def test_main(): -if not sys.__stdout__.isatty(): -raise unittest.SkipTest(sys.__stdout__ is not a tty) # testing setupterm() inside initscr/endwin # causes terminal breakage -curses.setupterm(fd=sys.__stdout__.fileno()) +#curses.setupterm(fd=sys.__stdout__.fileno()) +import pty +_, pty = pty.openpty() +curses.setupterm(fd=pty) try: stdscr = curses.initscr() main(stdscr) (I've never used openpty, either in Python or in C, so I can't vouch for the correctness of this usage.) If it isn't: I'd expect more test failures on buildbot machines where the buildbot agent is started as a system daemon, in which case the process doesn't have a tty at all. Using pty.openpty it would be possible to ensure that there is a pty that can be used for the test. Looking at the actual buildbot results, most of the *nix bots I checked are actually skipping this test; the only one I could find that wasn't is the x86 Ubuntu Shared bot: ttp://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%203.x/builds/6640/steps/test/logs/stdio So it looks like on most of the bots, buildbot is running without a tty. Then, test_main() sees that sys.__stdout__ isn't suitable to run the test, and bails out. It'd be great if you can come up with a fix that gets the test running in this environment, but it'll probably be more complicated than just slotting in a call to openpty(). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12669 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12669] test_curses skipped on buildbots
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- stage: - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12669 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15654] PEP 384 Refactoring applied to bz2 module
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- nosy: +nadeem.vawda ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15654 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15546] Iteration breaks with bz2.open(filename,'rt')
Nadeem Vawda added the comment: Before these fixes, it looks like all three classes' peek() methods were susceptible to the same problem as read1(). The fixes for BZ2File.read1() and LZMAFile.read1() should have fixed peek() as well; both methods are implemented in terms of _fill_buffer(). For GzipFile, peek() is still potentially broken - I'll push a fix shortly. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15546 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15546] Iteration breaks with bz2.open(filename,'rt')
Nadeem Vawda added the comment: No, if _read() is called once the file is already at EOF, it raises an EOFError (http://hg.python.org/cpython/file/8c07ff7f882f/Lib/gzip.py#l433), which will then break out of the loop. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15546 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15546] Iteration breaks with bz2.open(filename,'rt')
Nadeem Vawda added the comment: OK, BZ2File should now be fixed. It looks like LZMAFile and GzipFile may be susceptible to the same problem; I'll push fixes for them shortly. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15546 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15546] Iteration breaks with bz2.open(filename,'rt')
Nadeem Vawda added the comment: Done. Thanks for the bug report, David. -- resolution: - fixed stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15546 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15546] Iteration breaks with bz2.open(filename,'rt')
Nadeem Vawda added the comment: I can't seem to reproduce this with an up-to-date checkout from Mercurial: import bz2 g = bz2.open('access-log-0108.bz2','rt') next(g) '140.180.132.213 - - [24/Feb/2008:00:08:59 -0600] GET /ply/ply.html HTTP/1.1 200 97238\n' (where 'access-log-0108.bz2' is a file I created with the output above as its first line, and a couple of other lines of random junk following that) Would it be possible for you to upload the file you used to trigger this bug? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15546 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15546] Iteration breaks with bz2.open(filename,'rt')
Nadeem Vawda added the comment: The cause of this problem is that BZ2File.read1() sometimes returns b, even though the file is not at EOF. This happens when the underlying BZ2Decompressor cannot produce any decompressed data from just the block passed to it in _fill_buffer(); in this case, it needs to read more of the compressed stream to make progress. It would seem that BZ2File cannot satisfy the contract of the read1() method - we can't guarantee that a single call to the read() method of the underlying file will allow us to return a non-empty result, whereas returning b is reserved for the case where we have reached EOF. Simply removing the read1() method would simply trade this problem for a bigger one (resurrecting issue 10791), so I propose amending BZ2File.read1() to make as many reads from the underlying file as necessary to return a non-empty result. Antoine, what do you think of this? -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15546 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15405] Invitation to connect on LinkedIn
Changes by Nadeem Vawda nadeem.va...@gmail.com: -- resolution: - invalid stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15405 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15204] Deprecate the 'U' open mode
Nadeem Vawda nadeem.va...@gmail.com added the comment: +1 for the general idea of deprecating and eventually removing the U modes. But I agree with David, that it doesn't make sense to have separate steps for 3.5 and 3.6/4.0. If you make the code raise an exception when U is used, how is that different from what will happen when you remove the code for processing it? Surely we want it to eventually be treated just like any other invalid mode string? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15204 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13876] Sporadic failure in test_socket
Nadeem Vawda nadeem.va...@gmail.com added the comment: Merging nosy list from duplicate issue 15155. -- nosy: +giampaolo.rodola, neologix, pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13876 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12559] gzip.open() needs an optional encoding argument
Nadeem Vawda nadeem.va...@gmail.com added the comment: I already fixed this without knowing about this issue; see 55202ca694d7. storchaka: Why not use io.TextWrapper? I think it is the right answer for this issue. The proposed patch (and the code I committed) *do* use TextIOWrapper. Unless you mean that callers should create the TextIOWrapper themselves. This is certainly possible, but quite inconvenient for something that is conceptually simple, and not difficult to implement. amaury.forgeotdarc: There remains a difference between open() and gzip.open(): open(filename, 'r', encoding=None) is a text file (with a default encoding), gzip.open() with the same arguments returns a binary file. The committed code unfortunately still has gzip.open(filename, r) returning a binary file. This is something that cannot be fixed without breaking backward compatibility. However, it does provide a way to open a text file with the system's default encoding (encoding=None, or no encoding argument specified). To do this, you can use the rt/wt/at modes, just like with builtins.open(). Of course, this also works if you do specify an encoding explicitly. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed versions: +Python 3.3 -Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12559 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10376] ZipFile unzip is unbuffered
Nadeem Vawda nadeem.va...@gmail.com added the comment: Patch looks fine to me. Antoine, can you commit this? I'm currently away from the computer that has my SSH key on it. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10376 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14684] zlib set dictionary support inflateSetDictionary
Nadeem Vawda nadeem.va...@gmail.com added the comment: Just saw this on the checkins list; where are the other options documented? They aren't, AFAIK. I've been planning on adding them when I've got time (based on the zlib manual at http://zlib.net/manual.html), but with the upcoming feature freeze for 3.3, this issue was higher priority. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14684 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14684] zlib set dictionary support inflateSetDictionary
Nadeem Vawda nadeem.va...@gmail.com added the comment: Committed. Once again, thanks for the patch! -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14684 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14684] zlib set dictionary support inflateSetDictionary
Nadeem Vawda nadeem.va...@gmail.com added the comment: To restate my position: the need is for an immutable string of bytes, [...] I disagree that we should require the dictionary to be immutable - if the caller wishes to use a mutable buffer here, it is their responsibility to ensure that it is not modified until the compressor is finished with it (consenting adults and all that). The documentation can inform users of this requirement. I believe the argument for aesthetics does not apply, as the constant dictionary constructor argument is a morally different kind of parameter, comparable to (say) the compression level. Even so, the surrounding code sets a precedent for how it accepts binary data buffers, and deviating from this existing convention should not be taken lightly. Nitpicking about the API aside, thanks for the patch :-) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14684 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14684] zlib set dictionary support inflateSetDictionary
Nadeem Vawda nadeem.va...@gmail.com added the comment: I plan to commit it (along with the buffer API changes) tomorrow. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14684 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15087] Add gzip function to read gzip'd strings
Nadeem Vawda nadeem.va...@gmail.com added the comment: There is already such a function, gzip.decompress() - it was added in 3.2. -- nosy: +nadeem.vawda resolution: - invalid stage: - committed/rejected status: open - pending ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15087 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com