[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-07 Thread Jesús Cea Avión

Changes by Jesús Cea Avión j...@jcea.es:


--
nosy: +jcea

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-03 Thread Roundup Robot

Roundup Robot devnull@devnull added the comment:

New changeset 2cb07a46f4b5 by Antoine Pitrou in branch 'default':
Issue #5863: Rewrite BZ2File in pure Python, and allow it to accept
http://hg.python.org/cpython/rev/2cb07a46f4b5

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-03 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Thank you very much, Nadeem. The patch is now in.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-03 Thread Oliver Deppert

Oliver Deppert oliver.depp...@stud.tu-darmstadt.de added the comment:

Hi,

thanks for the patch. Could you also publish a version for older python 2.x ?

regards,
Olli

--
nosy: +Kontr-Olli

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-03 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

As a new feature, this can’t go into older versions.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-02 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Here is an updated patch that adds read1() to BZ2File. This should fix things
for issue10791 from the bz2 side. I also took the opportunity to clean up
_read_block() to be more readable. As per Martin's suggestion on python-dev, I
put the copyright notice in the patch header, rather than in the code itself.

--
Added file: http://bugs.python.org/file21502/bz2-v5.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-02 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Updated documentation.

--
Added file: http://bugs.python.org/file21503/bz2-v5-doc.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-02 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Here is an updated patch that adds read1() to BZ2File. This should fix things
 for issue10791 from the bz2 side. I also took the opportunity to clean up
 _read_block() to be more readable. As per Martin's suggestion on python-dev, I
 put the copyright notice in the patch header, rather than in the code itself.

Thank you! A couple of comments:
- please avoid C++-style comments (// ...), some compilers don't like
them
- BZ2Decompressor.eof would be better as a T_BOOL than a T_INT, IMO
- BZ2Decompressor.eof should be documented as new in 3.3
- instead of using PyObject_GetBuffer(), I think it's better to call
PyArg_ParseTuple with the y* typecode: it makes sure it does the right
thing
- instead of int(size), use size = size.__index__() so as to forbid
floats

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-02 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Thanks for the review. I've made most of the changes you suggested, but there's
one thing I wanted to check about:

 - instead of int(size), use size = size.__index__() so as to forbid floats

The tests for readline() and readlines() expect a TypeError if size is None.
Calling size.__index__() in this case raises an AttributeError instead. Should I
change the tests to expect an AttributeError? Alternatively, something like this
would more closely match the behaviour of the old code:

try:
size = size.__index__()
except AttributeError:
raise TypeError(Integer argument expected)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-02 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 The tests for readline() and readlines() expect a TypeError if size is None.
 Calling size.__index__() in this case raises an AttributeError instead. 
 Should I
 change the tests to expect an AttributeError? Alternatively, something like 
 this
 would more closely match the behaviour of the old code:
 
 try:
 size = size.__index__()
 except AttributeError:
 raise TypeError(Integer argument expected)

Ah, you're right, TypeError should be raised.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-02 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Here's the updated patch.

--
Added file: http://bugs.python.org/file21507/bz2-v6.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-04-02 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

... and the corresponding updated documentation patch.

--
Added file: http://bugs.python.org/file21508/bz2-v6-doc.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-30 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Nadeem,

 Can I ask that you not commit this
 patch until the CA has been submitted? I will need to clear it with my
 employer, and it might complicate things if the code in question has
 already been committed.

Apparently the PSF has received your contributor agreement. Does it mean the 
situation is cleared? I plan to do a review of your latest patch.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-30 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

 Apparently the PSF has received your contributor agreement.

Great; I was just about to send them an email to check.

 Does it mean the situation is cleared? I plan to do a review of your latest 
 patch.

Yes, everything's sorted out. Go ahead :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-24 Thread Andrew Svetlov

Changes by Andrew Svetlov andrew.svet...@gmail.com:


--
nosy: +asvetlov

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-21 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

From the discussion on python-dev, it seems that I will need to submit a
Contributor Agreement to the PSF. Can I ask that you not commit this
patch until the CA has been submitted? I will need to clear it with my
employer, and it might complicate things if the code in question has
already been committed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-21 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 From the discussion on python-dev, it seems that I will need to submit a
 Contributor Agreement to the PSF. Can I ask that you not commit this
 patch until the CA has been submitted? I will need to clear it with my
 employer, and it might complicate things if the code in question has
 already been committed.

Ok, I was planning to do another review anyway.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-20 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Here is an updated patch, incorporating the feedback from your review.

The new patch no longer checks for errors in bz2CompressEnd()/bz2DecompressEnd()
in the dealloc functions for BZ2Compressor/BZ2Decompressor. I found that calling
PyErr_WriteUnraisable() results in spurious error messages if an exception is
raised by the init function, and in any case, the output would not be of much
use if a genuine error were to occur.

The patch adds implementations of most of the io.BufferedIOBase methods
(everything except detach(), read1() and truncate()), and includes unit tests
for fileno() and readinto().

--
Added file: http://bugs.python.org/file21314/bz2-v4.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-20 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Corresponding patch for the module docs.

--
Added file: http://bugs.python.org/file21315/bz2-v4-doc.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Given the absence of response on python-dev, I'd say simply remove the obsolete 
copyright notice.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-14 Thread Michiel de Hoon

Michiel de Hoon mdeh...@users.sourceforge.net added the comment:

Would it be possible to add an open() function to the bz2 module? Currently 
gzip has such a function, but bz2 does not:

 import gzip
 gzip.open
function open at 0x781f0
 import bz2
 bz2.open
Traceback (most recent call last):
  File stdin, line 1, in ?
AttributeError: 'module' object has no attribute 'open'


--
nosy: +mdehoon

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Would it be possible to add an open() function to the bz2 module?
 Currently gzip has such a function, but bz2 does not:

Well, it could be a topic for a separate issue.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-14 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

 Would it be possible to add an open() function to the bz2 module?

Yes, it would be quite trivial, though I don't think it would be worthwhile -
all it would do is provide a direct alias for the BZ2File constructor. But as
Antoine said, that is a topic for a separate issue.

@Antoine:

Regarding the use of PY_SSIZE_T_CLEAN, I assume that Py_ssize_t is to be
preferred over plain ssize_t. Is this correct?

Also, I was wondering whether I need to add some sort of license boilerplate to
the beginning of bz2.py? With _bz2module.c, I presume I should retain the
copyright information from the old bz2module.c. Would something like this be ok?

   /* _bz2 - Low-level Python interface to libbzip2.
*
* Copyright (c) 2011  Nadeem Vawda nadeem.va...@gmail.com
*
* Based on bz2module.c:
*
* Copyright (c) 2002  Gustavo Niemeyer nieme...@conectiva.com
* Copyright (c) 2002  Python Software Foundation; All Rights Reserved
*/

(Browsing through the source files in Lib/ and Modules/, there doesn't seem to
be a clear convention for this sort of thing...)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Regarding the use of PY_SSIZE_T_CLEAN, I assume that Py_ssize_t is to be
 preferred over plain ssize_t. Is this correct?

Yes, ssize_t doesn't exist everywhere AFAIK.
(size_t does, or at least we assume it does)

 Also, I was wondering whether I need to add some sort of license boilerplate 
 to
 the beginning of bz2.py? With _bz2module.c, I presume I should retain the
 copyright information from the old bz2module.c. Would something like this be 
 ok?

Well, I would personally advocate not re-adding a license boilerplate,
since it doesn't serve a purpose (nearly all of Python is freely usable
under the PSF License, and the authors are documented by version
control).
You could ask on python-dev to get other opinions, though.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-14 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

 Well, I would personally advocate not re-adding a license boilerplate,
 since it doesn't serve a purpose (nearly all of Python is freely usable
 under the PSF License, and the authors are documented by version control).

That sounds sensible to me. I'll see what the rest of python-dev thinks, though.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-03-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Review posted at http://codereview.appspot.com/4274045/

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-13 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Reviewers: nadeem vawda nadeem.vawda_gmail.com,

http://codereview.appspot.com/4274045/diff/1/Lib/bz2.py
File Lib/bz2.py (right):

http://codereview.appspot.com/4274045/diff/1/Lib/bz2.py#newcode25
Lib/bz2.py:25: class BZ2File:
Is there any reason it doesn't inherit io.BufferedIOBase?
(it should also bring you a couple of methods implemented for free:
readlines, writelines, __iter__, __next__, __enter__, __exit__)

You should probably also implement fileno() (simply return
self.fp.fileno()) and the `closed` property.

http://codereview.appspot.com/4274045/diff/1/Lib/bz2.py#newcode386
Lib/bz2.py:386: class BZ2Compressor:
I don't think there's a point in a Python wrapper, since the wrapper is
so trivial. Just do the lock operations in C.
Same for BZ2Decompressor.

http://codereview.appspot.com/4274045/diff/1/Modules/_bz2module.c
File Modules/_bz2module.c (left):

http://codereview.appspot.com/4274045/diff/1/Modules/_bz2module.c#oldcode123
Modules/_bz2module.c:123: #ifdef WITH_THREAD
As mentioned in Lib/bz2.py, I would keep the lock on the C side since it
isn't significantly more complicated, and it avoids having to write a
Python wrapper around the compressor and decompressor types.

http://codereview.appspot.com/4274045/diff/1/Modules/_bz2module.c
File Modules/_bz2module.c (right):

http://codereview.appspot.com/4274045/diff/1/Modules/_bz2module.c#newcode3
Modules/_bz2module.c:3: #include Python.h
Since this is a new start, perhaps we should add
   #define PY_SSIZE_T_CLEAN
before including Python.h?
This will ensure no code in the module will rely on the old behaviour.

http://codereview.appspot.com/4274045/diff/1/Modules/_bz2module.c#newcode48
Modules/_bz2module.c:48: libbzip2 was not compiled correctly);
Just a nit, but I'm not sure there's any point in renaming the bz2
library to libbzip2?
(also, under Windows I'm not sure the library naming convention is the
same as under Unix)

http://codereview.appspot.com/4274045/diff/1/Modules/_bz2module.c#newcode78
Modules/_bz2module.c:78: Unrecognized error from libbzip2: %d,
bzerror);
Out of curiousity, did you encounter this condition?

http://codereview.appspot.com/4274045/diff/1/Modules/_bz2module.c#newcode122
Modules/_bz2module.c:122: c-bzs.avail_out = PyBytes_GET_SIZE(result);
Do note that avail_in and avail_out are 32-bit ints, and therefore this
is not 64-bit clean. I guess you're just copying the old code here, but
that would deserve a separate patch later. Perhaps add a comment in the
meantime.

http://codereview.appspot.com/4274045/diff/1/Modules/_bz2module.c#newcode209
Modules/_bz2module.c:209: Provide a block of data to the compressor.},
You could instead re-use the old, more precise docstrings. Also, using
PyDoc_STRVAR is preferred so as to make it easier to modify multi-line
docstrings.

http://codereview.appspot.com/4274045/diff/1/setup.py
File setup.py (right):

http://codereview.appspot.com/4274045/diff/1/setup.py#newcode1236
setup.py:1236: exts.append( Extension('_bz2', ['_bz2module.c'],
The Windows build files probably need updating as well. Can you do it?
Otherwise I'll have a try.

Please review this at http://codereview.appspot.com/4274045/

Affected files:
   A Lib/bz2.py
   M Lib/test/test_bz2.py
   M Modules/_bz2module.c
   M setup.py

--
title: bz2.BZ2File should accept other file-like objects. - bz2.BZ2File should 
accept other file-like objects. (issue4274045)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects. (issue4274045)

2011-03-13 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Thanks for the review. I'll try and have an updated patch ready by next weekend.

Regarding your comments:

 Is there any reason it doesn't inherit io.BufferedIOBase?
No, there isn't; I'll fix that in my revised patch.

 Since this is a new start, perhaps we should add
#define PY_SSIZE_T_CLEAN
 before including Python.h?
Sounds like a good idea.

 Just a nit, but I'm not sure there's any point in renaming the bz2 library
 to libbzip2?
 (also, under Windows I'm not sure the library naming convention is the same
 as under Unix)
Well, the official name for the library is libbzip2 bzip.org. I thought that
the lib prefix would make it clearer that the error is referring to the that
library and not _bz2module.c. But if you think it would be better not to make
this change, I'll leave it out.

 Modules/_bz2module.c:78: Unrecognized error from libbzip2: %d, bzerror);
 Out of curiousity, did you encounter this condition?
No, I was just programming defensively (in case the underlying library adds
more error codes in future). Unlikely, but I would think it's better than
taking the risk of silently ignoring an error.

 Do note that avail_in and avail_out are 32-bit ints, and therefore this is
 not 64-bit clean. I guess you're just copying the old code here, but that
 would deserve a separate patch later. Perhaps add a comment in the meantime.
Good catch. I'll make a note of it. This would only be a problem for avail_in,
though. The output buffer never grows by more than BIGCHUNK (512KiB) at a time
(see grow_buffer()) so there is no risk of overflowing in avail_out.

 The Windows build files probably need updating as well. Can you do it?
 Otherwise I'll have a try.
I'll give it a try, and let you know if I can't get it to work.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-03-11 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Patch posted for review at http://codereview.appspot.com/4274045/. Still have 
to do a review though :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-02-08 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Here's a revised version of bz2-v3.diff, with docstrings that are more 
consistent with the updated documentation.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-02-08 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Weird, the patch didn't upload...

--
Added file: http://bugs.python.org/file20721/bz2-v3b.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-02-05 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Here's an update to the documentation for the bz2 module.

--
Added file: http://bugs.python.org/file20692/bz2-doc.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-30 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

OK, I've rewritten the whole bz2 module (patch attached), and I think it is now 
ready for review. The BZ2File implementation is a cleaned-up version of the one 
from my previous patch, with some further additions. I've factored out the 
common compressor/decompressor stuff into classes Compressor and Decompressor 
in the _bz2 extension module; with these, BZ2Compressor, BZ2Decompressor, 
compress() and decompress() are trivial to implement in Python.

My earlier efficiency concerns seem to have been unfounded; I ran some quick 
tests with a 4MB bz2 file, and there wasn't any measurable performance 
difference from the existing all-C implementation.

I have added a peek() method to BZ2File, in accordance with Antoine's 
suggestion, but it's not clear how it should interpret its argument. I followed 
the lead of io.BufferedReader, and simply ignored the arg, returning whatever 
data as is already buffered. The patch also includes tests for peek() in 
test_bz2, based on test_io's BufferedRWPairTest.

Also, while looking at io.BufferedReader's implementation, I noticed that it 
doesn't actually seem to use raw.peek() at all. If this is correct, then 
perhaps peek() is unnecessary, and shouldn't be added.

The patch also adds a property 'eof' to BZ2Decompressor, so that the user can 
test whether EOF has been reached on the compressed stream.

For the new files (Modules/_bz2module.c and Lib/bz2.py), I'm guessing there 
should be some license boilerplate stuff added at the top of each. I wasn't 
sure exactly what this should look like, though - some advice would be helpful 
here.

--
Added file: http://bugs.python.org/file20621/bz2-v3.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-26 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 * The read*() methods are implemented very inefficiently. Since they
 have to deal with the bytes objects returned by
 BZ2Decompressor.decompress(), a large read results in lots of
 allocations that weren't necessary in the C implementation.

It probably depends on the buffer size. Trying to fix this /might/ be
premature optimization.

Also, as with GzipFile one goal should be for BZFile to be wrappable in
a io.BufferedReader, which has its own very fast buffering layer (and
also a fast readline() if you implement peek() in BZFile).

 * Fixed a typo in test_bz2's testReadChunk10() that caused the test to
 pass regardless of whether the data read was correct
 (self.assertEqual(text, text) - self.assertEqual(text, self.TEXT)).
 This one might be worth committing now, since it isn't dependent on
 the rewrite.

Ah, thank you. Will take a look.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-26 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

 * The read*() methods are implemented very inefficiently. Since they
 have to deal with the bytes objects returned by
 BZ2Decompressor.decompress(), a large read results in lots of
 allocations that weren't necessary in the C implementation.

 It probably depends on the buffer size. Trying to fix this /might/ be
 premature optimization.

Actually, looking at the code again (and not being half-asleep this time), I
think readline() and readlines() are fine. My worry is about read(), where the
problem isn't the size of the buffer but rather the fact that every byte that is
read gets copied around more than necessary:
* Read into the readahead buffer in _fill_readahead().
* Copy into 'data' in _read_block()
* Copy into newly-allocated bytes object for read()'s return value

But you're right; this is probably premature optimization. I'll do some proper
performance measurements before I jump into rewriting. In the meanwhile, FWIW,
I noticed that with the Python implementation, test_bz2 took 20% longer than
with my C implementation (~1.5s up from ~1.25s). I don't think this is a very
reliable indicator of real-world performance, though.

 Also, as with GzipFile one goal should be for BZFile to be wrappable in
 a io.BufferedReader, which has its own very fast buffering layer (and
 also a fast readline() if you implement peek() in BZFile).

Ah, OK. I suppose that is a sensible way of using it. peek() will be quite easy
to implement. How should it interpret its argument, though? PEP3116 (New I/O)
makes no mention of the function. BufferedReader appears to ignore it and
return however much data is convenient.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-25 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Interesting! If you are motivated, a further approach would be to expose the 
compressor and decompressor objects from the C extension, and write the file 
object in Python (as in Lib/gzip.py).

 One thing I was unsure of is how to handle exceptions that occur in
 BZ2File_dealloc(). Does the error status need to be cleared before it
 returns?

Yes, it should. Actually, it would be better to write out the exception using 
PyErr_WriteUnraisable().

 On a related note, the 'buffering' argument to __init__() is ignored, 
 and I was wondering whether this should be documented explicitly?

Yes, it should probably be deprecated if it's not useful anymore.

By the way, the current patch produces reference leaks:

$ ./python -m test -R 3:2 test_bz2
[1/1] test_bz2
beginning 5 repetitions
12345
.
test_bz2 leaked [44, 44] references, sum=88

--
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-25 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

 Interesting! If you are motivated, a further approach would be to expose
 the compressor and decompressor objects from the C extension, and write
 the file object in Python (as in Lib/gzip.py).
I had initially considered doing something that, but I decided not to for 
reasons that I can't quite remember. However, in hindsight it seems like it 
would have been a better approach than doing everything in C. I'll start on it 
ASAP.

 On a related note, the 'buffering' argument to __init__() is ignored, 
 and I was wondering whether this should be documented explicitly?
 Yes, it should probably be deprecated if it's not useful anymore.
How would I go about doing this? Would it be sufficient to raise a 
DeprecationWarning if the argument is provided by the caller, and add a note to 
the docstring and documentation?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-25 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

* I had initially considered doing something *like* that

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-25 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 How would I go about doing this? Would it be sufficient to raise a
 DeprecationWarning if the argument is provided by the caller, and add
 a note to the docstring and documentation?

Yes, totally.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-25 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Here is a quick-and-dirty reimplementation of BZ2File in Python, on top of the 
existing C implementation of BZ2Compressor and BZ2Decompressor.

There are a couple of issues with this code that need to be fixed:
* BZ2Decompressor doesn't signal when it reaches the EOS marker, so doesn't 
seem possible to detect a premature end-of-file. This was easy in the C 
implementation, when using bzDecompress() directly.
* The read*() methods are implemented very inefficiently. Since they have to 
deal with the bytes objects returned by BZ2Decompressor.decompress(), a large 
read results in lots of allocations that weren't necessary in the C 
implementation.

I hope to resolve both of these issues (and do a general code cleanup), by 
writing a C extension module that provides a thin wrapper around 
bzCompress()/bzDecompress(), and reimplementing the module's public interface 
in Python on top of it. This should reduce the size of the code by close to 
half, and make it easier to read and maintain. I'm not sure when I'll be able 
to get around to it, though, so I thought I should post what I've done so far.

Other changes in the patch:
* write(), writelines() and seek() now return meaningful values instead of 
None, in line with the behaviour of other file-like objects.
* Fixed a typo in test_bz2's testReadChunk10() that caused the test to pass 
regardless of whether the data read was correct (self.assertEqual(text, text) 
- self.assertEqual(text, self.TEXT)). This one might be worth committing now, 
since it isn't dependent on the rewrite.

--
Added file: http://bugs.python.org/file20521/bz2module-v2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-24 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Here is a patch that rewrites BZ2File to implement the requested feature, and 
adds some tests using BytesIO objects.

Some notes:
* iteration and the read*() method now use the same buffering machinery, so 
they can be mixed freely. The test for issue8397 has been updated accordingly.
* readlines() now respects its size argument. The existing implementation 
appears to effectively ignore it.
* writelines() no longer uses the (deprecated) old buffer protocol, and is now 
much simpler.
* Currently, calling next() on a writable BZ2File results in a rather unhelpful 
error message; the patched version checks that the file is readable before 
trying to actually read.
* The docstrings have been rewritten to clarify that all of the methods deal 
with bytes and not text strings.

One thing I was unsure of is how to handle exceptions that occur in 
BZ2File_dealloc(). Does the error status need to be cleared before it returns?

The documentation for the bz2 module appears to be quite out of date; I will 
upload a patch in the next day or so.

On a related note, the 'buffering' argument to __init__() is ignored, and I was 
wondering whether this should be documented explicitly? The current 
documentation claims that it allows the caller to specify a buffer size, or 
request unbuffered I/O.

--
keywords: +patch
Added file: http://bugs.python.org/file20510/bz2module-v1.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-24 Thread Anthony Long

Anthony Long antl...@gmail.com added the comment:

Are there tests for this?

--
nosy: +antlong

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-24 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Yes, see bz2module-v1.diff.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-21 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

I have been working on a patch for this issue. I've implemented everything 
except for readline(), readlines() and the iterator protocol.

In the existing implementation, the reading methods seem to interact weirdly - 
iternext() uses a readahead buffer, while none of the other methods do. Does 
anyone know if there's a reason for this? I was planning on having all the 
reading methods use a common buffer, which should allow free mixing of read 
methods and iteration.

Looking at issue8397, I'm guessing it would be fine, but I wanted to 
double-check in case there's a quirk of the iteration protocol that I've 
overlooked, or something like that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-13 Thread wrobell

wrobell wrob...@pld-linux.org added the comment:

A use case

wget -O http://planet.openstreetmap.org/planet-110112.osm.bz2 | tee 
planet.bz2 | osm2sql | psql osm

planet-*osm.bz2 files are 14GB at the moment. it would be great to read them 
from stdin while downloading from a server and uploading to a database at the 
same time.

Of course, you can insert bzip2 -d into the pipe... but then why to bother 
with bz2 module in Python? ;)

--
nosy: +wrobell

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-13 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

We’ve already agreed the feature is desirable; what’s missing is a patch, not 
user stories :)

--
nosy: +niemeyer
versions: +Python 3.3 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2011-01-13 Thread wrobell

wrobell wrob...@pld-linux.org added the comment:

OK! :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-12-14 Thread Xuanji Li

Xuanji Li xua...@gmail.com added the comment:

Sorry, I'm giving up.

The copyright notice for bz2module.c lists Gustavo Niemeyer as one of the 
holders, is he the maintainer? Maybe he should be notified of this bug.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-12-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Sorry, I'm giving up.

Indeed, I think only an extensive rewrite could fulfill the feature
request here.

 The copyright notice for bz2module.c lists Gustavo Niemeyer as one
 of the holders, is he the maintainer? Maybe he should be notified of
 this bug.

He hasn't been active for years, so I don't think he can still be
considered the maintainer.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-12-04 Thread Xuanji Li

Xuanji Li xua...@gmail.com added the comment:

I'll try working on a patch.

--
nosy: +xuanji

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-11-29 Thread MizardX

MizardX miza...@gmail.com added the comment:

Would if I could. But, No.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-11-29 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

That’s a perfectly fine reply.  Someone will see this feature request and 
propose a patch eventually.  Another way to help is to write tests, since those 
are in Python.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-11-29 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

For the record, this will need a comprehensive rewrite of bz2module, since it 
uses FILE pointers right now.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-11-29 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

Without a patch and compelling use cases, this has no chance.  Recommend 
closing.

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-11-29 Thread Nadeem Vawda

Changes by Nadeem Vawda nadeem.va...@gmail.com:


--
nosy: +nvawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-11-27 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Do you want to work on patch?

--
components: +Extension Modules -IO, Library (Lib)
nosy: +eric.araujo
stage: unit test needed - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2010-07-10 Thread Mark Lawrence

Changes by Mark Lawrence breamore...@yahoo.co.uk:


--
stage:  - unit test needed
versions: +Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5863] bz2.BZ2File should accept other file-like objects.

2009-04-27 Thread MizardX

New submission from MizardX miza...@gmail.com:

bz2.BZ2File should, like gzip.GzipFile, accept a fileobj argument.

If implemented, you could much more easily pipe BZ2-data from other 
sources, such as stdin or a socket.

--
components: IO, Library (Lib)
messages: 86716
nosy: MizardX
severity: normal
status: open
title: bz2.BZ2File should accept other file-like objects.
type: feature request

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5863
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com