[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

Lino Mastrodomenico Wed, 26 Sep 2012 12:32:09 -0700

Lino Mastrodomenico added the comment:

FYI, the exact algorithm for determining the encoding of HTML documents is 
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding


There are lots of different algorithms documented all over the intertubes for 
determining HTML encoding; the one above is the one used by browsers.

But that should only be used as part of a full HTML parsing library (e.g. 
https://code.google.com/p/html5lib/), urlopen should not attempt to do encoding 
sniffing from the data transferred.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue4733>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4733] Add a "decode to declared encoding" version of urlopen to urllib

Reply via email to