On 06/09/13 16:34, Gervase Markham wrote:

Data! Sounds like a plan.

Or we could ask our friends at Google or some other search engine to run
a version of our detector over their index and see how often it says
"UTF-8" when our normal algorithm would say something else.

Gerv
This website has an interesting, and apparently up-to-date set of statistics:

http://w3techs.com/technologies/overview/character_encoding/all

Their current top ten encodings, as of today, are:

UTF-8: 76.7%
ISO-8859-1: 11.7%
Windows-1251 (Cyrillic): 2.9%
GB2312 (Chinese): 2.5%
Shift JIS (Japanese): 1.5%
Windows-1252 (superset of ISO-8859-1): 1.4%
GBK (Chinese): 0.7%
ISO-8859-2 (Eastern Europe, Latin script): 0.4%
EUC-JP (Japanese): 0.4%
Windows-1256 (Arabic): 0.4%

Although the exact interpretation of these results is tricky, since they don't give their criteria for exactly how they define and detect these decodings, if their results are even approximately right, it's pretty clear that UTF-8 now dominates the web as the single commonest charset/encoding by far.

-- N.

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to