> .read() returns the bytes exactly how it downloads them. It doesn't
> interpret them. If those bytes are GB-2312-encoded text, that's what
> they are. There's no need to reencode them. Just .write(page) (of
> course, this way you don't verify that it's correct).

Alternatively, if the page is *not* gb-2312, you must first *decode*
it from its original encoding. Suppose the original encoding is
windows-1252, you do

  page = page.decode("windows-1252")
  page = page.encode("gb-2312")

Of course, for HTML, that may be tricky, as the file may include
an encoding declaration (XML declaration or http-equiv header). So if
you recode it, you might have to change such declarations as well.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to