On 06/01/2012 00:10, Ian Kelly wrote:
On Thu, Jan 5, 2012 at 4:52 PM, random joe<pywi...@gmail.com>  wrote:
 Sure. Take the most recent file as example. "2012 - January.txt.gz".
 If you use the python doc example this is the result. If i use "r" or
 "rb" the result is the same.

 import gzip
 f1 = gzip.open('C:\\2012-January.txt.gz', 'rb')
 data = f1.read()
 data[:100]
 '\x1f\x8b\x08\x08x\n\x05O\x02\xff/srv/mailman/archives/private/python-
 list/2012-January.txt\x00\xec\xbdy\x7f\xdb\xc6\xb50\xfcw\xf0)\xa6z|+
 \xaa!!l\xdc\x14[\x8b-;V\xe2-\x92\x12'
 f2 = gzip.open('C:\\2012-January.txt.gz', 'r')
 data = f2.read()
 data[:100]
 '\x1f\x8b\x08\x08x\n\x05O\x02\xff/srv/mailman/archives/private/python-
 list/2012-January.txt\x00\xec\xbdy\x7f\xdb\xc6\xb50\xfcw\xf0)\xa6z|+
 \xaa!!l\xdc\x14[\x8b-;V\xe2-\x92\x12'

 The docs and google provide no clear answer. I even tried 7zip and
 ended up with nothing but gibberish characters. There must be levels
 of compression or something. Why could they not simply use the tar
 format? Is there anywhere else one can download the archives?

Interesting.  I tried this on a Linux system using both gunzip and
your code, and both worked fine to extract that file.  I also tried
your code on a Windows system, and I get the same result that you do.
This appears to be a bug in the gzip module under Windows.

I think there may be something peculiar about the archive files that
the module is not handling correctly.  If I gunzip the file locally
and then gzip it again before trying to open it in Python, then
everything seems to be fine.

I've found that if I gunzip it twice (gunzip it and then gunzip the
result) using the gzip module I get the text file.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to