En Tue, 16 Sep 2008 21:58:31 -0300, Sam <[EMAIL PROTECTED]> escribió:

Gabriel, et al.

It's hard to find a web site that uses deflate these days.

Luckily, slashdot to the rescue.

I even wrote a test script.

If someone can tell me what's wrong that would be great.

Here's what I get when I run it:
Data is compressed using deflate.  Length is:   107160
Traceback (most recent call last):
  File "my_deflate_test.py", line 19, in <module>
    data = zlib.decompress(data)
zlib.error: Error -3 while decompressing data: incorrect header check

And that's true. The slashdot server is sending bogus data:

py> s = socket.socket()
py> s.connect(('slashdot.org',80))
py> s.sendall("GET / HTTP/1.1\r\nHost: slashdot.org\r\nAccept-Encoding: deflate\
r\n\r\n")
py> s.recv(500)
'HTTP/1.1 200 OK\r\nDate: Thu, 18 Sep 2008 20:48:34 GMT\r\nServer: Apache/1.3.41 (Unix) mod_perl/1.31-rc4\r\nSLASH_LOG_DATA: shtml\r\nX-Powered-By: Slash 2.0050 01220\r\nX-Bender: Alright! Closure!\r\nCache-Control: private\r\nPragma: privat e\r\nConnection: close\r\nContent-Type: text/html; charset=iso-8859-1\r\nVary: A ccept-Encoding, User-Agent\r\nContent-Encoding: deflate\r\nTransfer-Encoding: ch unked\r\n\r\n1c76\r\n\x02\x00\x00\x00\xff\xff\x00\xc1\x0f>\xf0<!DOCTYPE HTML PUB LIC "-//W3C//DTD HTML 4.01//EN"\n "http://www.w3.org/TR/html4/str...'

Note those 11 bytes starting with "\x02\x00\x00\xff..." followed by the page contents in plain text. According to RFC 2616 (HTTP 1.1), the deflate content coding consists of the "zlib" format defined in RFC 1950 in combination with the "deflate" compression mechanism described in RFC 1951. RFC 1950 says that the lower 4 bits of the first byte in a zlib stream represent the compression method; the only compression method defined is "deflate" with value 8. The slashdot data contains a 2 instead, so it is not valid.

#!/usr/bin/env python

import urllib2
import zlib

opener = urllib2.build_opener()
opener.addheaders = [('Accept-encoding', 'deflate')]

stream = opener.open('http://www.slashdot.org')
data = stream.read()
encoded = stream.headers.get('Content-Encoding')

if encoded == 'deflate':
    print "Data is compressed using deflate.  Length is:  ",
str(len(data))
    data = zlib.decompress(data)
    print "After uncompressing, length is: ", str(len(data))
else:
    print "Data is not deflated."

The code is correct - try with another server. I tested it with a LightHTTPd server and worked fine.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to