On Monday 23 September 2013 23:32:39 Ángel González wrote: > On 17/09/13 09:49, Tim Ruehsen wrote: > > On Tuesday 17 September 2013 00:17:21 Ángel González wrote: > >>> [1] http://nikitathespider.com/articles/EncodingDivination.html > >> > >> Note that these steps are outdated now (that was written at most at > >> 2008). > > > > Outdated by exactly what ? RFC3986 is of 2005 and does not contradict to > > [1]. See my explanation above. > > By the HTML Living Standard (formerly known as HTML5) > http://www.whatwg.org/specs/web-apps/current-work/multipage/ > > The Content-type header is sometimes overriden, ISO-8859-1 now means > windows-1252, > there are some well-defined guessing steps when there's such need...
Just for completeness: these guessing steps called "encoding sniffing algorithm" are described in 12.2.2.2. But only "In some cases, it might be impractical to unambiguously determine the encoding before parsing the document.". I found this iso-8859-1 / windows-1252 issue mentioned on the Wikipedia 'windows-1252' page, but couldn't find it on the HTML Living Standard pages. Could you give me a pointer, please ? What do you think, how can we address the iso / windows encoding issue (should we ?) ? As I understood, it is only valid for HTML5... Is there a practical need for the sniffing algorithm ? Do you know any real web sites / pages where the encoding is ambiguous ? Tim