On Thu, Jan 5, 2012 at 4:52 PM, random joe <pywi...@gmail.com> wrote: > Sure. Take the most recent file as example. "2012 - January.txt.gz". > If you use the python doc example this is the result. If i use "r" or > "rb" the result is the same. > >>>> import gzip >>>> f1 = gzip.open('C:\\2012-January.txt.gz', 'rb') >>>> data = f1.read() >>>> data[:100] > '\x1f\x8b\x08\x08x\n\x05O\x02\xff/srv/mailman/archives/private/python- > list/2012-January.txt\x00\xec\xbdy\x7f\xdb\xc6\xb50\xfcw\xf0)\xa6z|+ > \xaa!!l\xdc\x14[\x8b-;V\xe2-\x92\x12' >>>> f2 = gzip.open('C:\\2012-January.txt.gz', 'r') >>>> data = f2.read() >>>> data[:100] > '\x1f\x8b\x08\x08x\n\x05O\x02\xff/srv/mailman/archives/private/python- > list/2012-January.txt\x00\xec\xbdy\x7f\xdb\xc6\xb50\xfcw\xf0)\xa6z|+ > \xaa!!l\xdc\x14[\x8b-;V\xe2-\x92\x12' > > The docs and google provide no clear answer. I even tried 7zip and > ended up with nothing but gibberish characters. There must be levels > of compression or something. Why could they not simply use the tar > format? Is there anywhere else one can download the archives?
Interesting. I tried this on a Linux system using both gunzip and your code, and both worked fine to extract that file. I also tried your code on a Windows system, and I get the same result that you do. This appears to be a bug in the gzip module under Windows. I think there may be something peculiar about the archive files that the module is not handling correctly. If I gunzip the file locally and then gzip it again before trying to open it in Python, then everything seems to be fine. -- http://mail.python.org/mailman/listinfo/python-list