Nadeem Vawda <nadeem.va...@gmail.com> added the comment:

Here is a quick-and-dirty reimplementation of BZ2File in Python, on top of the 
existing C implementation of BZ2Compressor and BZ2Decompressor.

There are a couple of issues with this code that need to be fixed:
* BZ2Decompressor doesn't signal when it reaches the EOS marker, so doesn't 
seem possible to detect a premature end-of-file. This was easy in the C 
implementation, when using bzDecompress() directly.
* The read*() methods are implemented very inefficiently. Since they have to 
deal with the bytes objects returned by BZ2Decompressor.decompress(), a large 
read results in lots of allocations that weren't necessary in the C 
implementation.

I hope to resolve both of these issues (and do a general code cleanup), by 
writing a C extension module that provides a thin wrapper around 
bzCompress()/bzDecompress(), and reimplementing the module's public interface 
in Python on top of it. This should reduce the size of the code by close to 
half, and make it easier to read and maintain. I'm not sure when I'll be able 
to get around to it, though, so I thought I should post what I've done so far.

Other changes in the patch:
* write(), writelines() and seek() now return meaningful values instead of 
None, in line with the behaviour of other file-like objects.
* Fixed a typo in test_bz2's testReadChunk10() that caused the test to pass 
regardless of whether the data read was correct (self.assertEqual(text, text) 
-> self.assertEqual(text, self.TEXT)). This one might be worth committing now, 
since it isn't dependent on the rewrite.

----------
Added file: http://bugs.python.org/file20521/bz2module-v2.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue5863>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to