Re: save gb-2312 web page in a .html file

Martin v. Löwis Wed, 26 Dec 2007 15:41:25 -0800

> .read() returns the bytes exactly how it downloads them. It doesn't
> interpret them. If those bytes are GB-2312-encoded text, that's what
> they are. There's no need to reencode them. Just .write(page) (of
> course, this way you don't verify that it's correct).


Alternatively, if the page is *not* gb-2312, you must first *decode*
it from its original encoding. Suppose the original encoding is
windows-1252, you do

  page = page.decode("windows-1252")
  page = page.encode("gb-2312")

Of course, for HTML, that may be tricky, as the file may include
an encoding declaration (XML declaration or http-equiv header). So if
you recode it, you might have to change such declarations as well.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: save gb-2312 web page in a .html file

Reply via email to