Lino Mastrodomenico added the comment:

FYI, the exact algorithm for determining the encoding of HTML documents is 
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding

There are lots of different algorithms documented all over the intertubes for 
determining HTML encoding; the one above is the one used by browsers.

But that should only be used as part of a full HTML parsing library (e.g. 
https://code.google.com/p/html5lib/), urlopen should not attempt to do encoding 
sniffing from the data transferred.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4733>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to