2011-12-01 1:28, Faruk Ates wrote:

My understanding is that all browsers* default to Western Latin (ISO-8859-1)
> encoding by default (for Western-world downloads/OSes) due to legacy content on the web.

Browsers default to various encodings, often windows-1252 (rather than ISO-8859-1). They may also investigate the actual data and make a guess based on it.

I'm wondering if it might not be good to start encouraging defaulting to UTF-8,

It would not. There’s no reason to recommend any particular defaulting, especially not something that deviates from past practices.

It might be argued that browsers should do better error detection and reporting, so that they inform the user e.g. if the document’s encoding has not been declared at all and it cannot be inferred fairly reliably (e.g., from BOM). But I’m afraid the general feeling is that browsers should avoid warning users, as that tends to contradict authors’ purposes – and, in fact, mostly things that are serious problems in principle aren’t that serious in practice.

We like to think that “every web developer is surely building things in UTF-8 
nowadays”
> but this is far from true.

There’s a large amount of pages declared as UTF-8 but containing Ascii only, as well as pages mislabeled as UTF-8 but containing e.g. ISO-8859-1.

I still frequently break websites and webapps simply by entering my name (Faruk 
Ateş).

That’s because the server-side software (and possibly client-side software) cannot handle the letter “ş”. It would not help if the page were interpreted as UTF-8. If the author knows that a server-side form

Yucca

Reply via email to