Peter Pei wrote: > I am trying to read a web page and save it in a .html file. The problem is > that the web page is GB-2312 encoded, and I want to save it to the file with > the same encoding or unicode. I have some code like this: > url = 'http://blah/' > headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows > NT)' } > > req = urllib2.Request(url, None, headers) > page = urllib2.urlopen(req).read() > > file = open('btchina.html','wb') > file.write(page.encode('gb-2312')) > file.close() > > It is obviously not working, and I am hoping someone can help me.
.read() returns the bytes exactly how it downloads them. It doesn't interpret them. If those bytes are GB-2312-encoded text, that's what they are. There's no need to reencode them. Just .write(page) (of course, this way you don't verify that it's correct). (BTW, don't use 'file' as a variable name. It's an alias of the 'open()' function.) -- -- http://mail.python.org/mailman/listinfo/python-list