Barry wrote in news:83dc485a-5a20-403b-99ee-c8c627bdbab3 @m21g2000vbr.googlegroups.com in gmane.comp.python.general:
> Hi, > > The code below is giving me the error: > > Traceback (most recent call last): > File "C:\Users\Administratör\Desktop\test.py", line 4, in <module> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: > unexpected code byte > > > What am i doing wrong? It may not be you, en.wiktionary.org is sending gzip encoded content back, it seems to do this even if you set the Accept header as in: request.add_header( "Accept", "text/html" ) But maybe I'm not doing it correctly. #encoding: utf-8 import urllib import urllib.request request = urllib.request.Request (url='http://en.wiktionary.org/wiki/baby',headers={'User- Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11'} ) response = urllib.request.urlopen(request) info = response.info() enc = info[ 'Content-Encoding' ] print( "Encoding: " + enc ) from io import BytesIO import gzip buf = BytesIO( response.read() ) unziped = gzip.GzipFile( "wahatever", mode = 'rb', fileobj = buf ) html = unziped.read().decode('utf-8') print( html.encode( "ascii", "backslashreplace" ) ) Rob.
-- http://mail.python.org/mailman/listinfo/python-list