On 8/30/13 12:24, Mike Hoye wrote:
On 2013-08-30 11:17 AM, Adam Roach wrote:
It seems to me that there's an important balance here between (a)
letting developers discover their configuration error and (b)
allowing users to render misconfigured content without specialized
knowledge.
For what it's worth Internet Explorer handled this (before UTF-8 and
caring about JS performance were a thing) by guessing what encoding to
use, comparing a letter-frequency-analysis of a page's content to a
table of what bytes are most common in which in what encodings of
whatever languages.
...
From both the developer and user perspectives, it was amounted to
"something went wrong because of bad magic."
I'd like to clarify two points about what I'm proposing.
First, I'm not proposing that we do anything without explicit user
intervention, other than present an unobtrusive bar helping the user
understand why the headline they're trying to read renders as "Ð'
Ð"оÑ?дÑfме пÑEURедложили оÑ,обÑEURаÑ,ÑOE "Ð?обелÑ?"
Ñf Ðz(бамÑ< " rather than "? ??????? ?????????? ???????? "??????" ?
?????". (No political statement intended here -- that's just the leading
headline on Pravda at the moment).
If the user is happy with the encoding, they do nothing and go about
their business.
If the user determines that the rendering is, in fact, not what they
want, they can simply click on the "Yes" button and (with high
probability), everything is right with the world again.
Also note that I'm not proposing that we try to do generic character set
and language detection. That's fraught with the perils you cite. The
topic we're discussing here is UTF-8, which can be easily detected with
extremely high confidence.
--
Adam Roach
Principal Platform Engineer
a...@mozilla.com
+1 650 903 0800 x863
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform