Re: Detection of unlabeled UTF-8

Anne van Kesteren Fri, 30 Aug 2013 12:18:31 -0700

On Fri, Aug 30, 2013 at 7:33 PM, Joshua Cranmer 🐧 <pidgeo...@gmail.com> wrote:
> The problem I have with this approach is that it assumes that the page is
> authored by someone who definitively knows the charset, which is not a
> scenario which universally holds. Suppose you have a page that serves up the
> contents of a plain text file, so your source data has no indication of its
> charset. What charset should the page report? The choice is between guessing
> (presumably UTF-8) or saying nothing (which causes the browser to guess
> Windows-1252, generally).


Where did the text file come from? There's a source somewhere... And
these days that's hardly how people create content anyway. And again,
it has already been pointed out we cannot scan the entire byte stream
(since text/plain uses the HTML parser it goes for that too, unless we
make an exception I suppose, but what data supports that?), which
would make the situation worse.


-- 
http://annevankesteren.nl/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Detection of unlabeled UTF-8

Reply via email to