Lino Mastrodomenico added the comment: FYI, the exact algorithm for determining the encoding of HTML documents is http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding
There are lots of different algorithms documented all over the intertubes for determining HTML encoding; the one above is the one used by browsers. But that should only be used as part of a full HTML parsing library (e.g. https://code.google.com/p/html5lib/), urlopen should not attempt to do encoding sniffing from the data transferred. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4733> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com