On 06/09/13 16:34, Gervase Markham wrote:

Data! Sounds like a plan.

Or we could ask our friends at Google or some other search engine to run
a version of our detector over their index and see how often it says
"UTF-8" when our normal algorithm would say something else.

This website has an interesting, and apparently up-to-date set of statistics:


Their current top ten encodings, as of today, are:

UTF-8: 76.7%
ISO-8859-1: 11.7%
Windows-1251 (Cyrillic): 2.9%
GB2312 (Chinese): 2.5%
Shift JIS (Japanese): 1.5%
Windows-1252 (superset of ISO-8859-1): 1.4%
GBK (Chinese): 0.7%
ISO-8859-2 (Eastern Europe, Latin script): 0.4%
EUC-JP (Japanese): 0.4%
Windows-1256 (Arabic): 0.4%

Although the exact interpretation of these results is tricky, since they don't give their criteria for exactly how they define and detect these decodings, if their results are even approximately right, it's pretty clear that UTF-8 now dominates the web as the single commonest charset/encoding by far.

-- N.

dev-platform mailing list

Reply via email to