Hi Jason,

Jason Orendorff wrote:
And most (but not all) Unicode string implementations use UTF-16.
Among languages and libraries that are very widely used, the majority
is overwhelming: Java, Microsoft's CLR, Python, JavaScript, Qt,
Xerces-C, and on and on.  (The few counterexamples use UTF-8: glib,
expat.  And expat can be compiled to use UTF-16.)
If this is true, then I would expect to find relatively little mention of UTF-8 compared to UTF-16 on the internet. However, the google test turns up *1,040,000* for *utf-16* versus *173,000,000* for *utf-8*. Now, of course I realize that this is a particularly crude technique for determining the relative popularity of UTF-8 and UTF-16, but even a very crude technique does not cause this much of a discrepancy. 173 : 1 is quite a steep ratio.

I'm sure this all has a simple explanation, but if we're going to use popularity as a criterion for choosing a string representation, then we ought to be really sure that we've got that popularity lined up the right way around.

Incidentally: *497,000* for *utf-32*.

Furthermore, the IETF likes UTF-8 best.  From the UTF-8 wikipedia page:

The Internet Engineering Task Force (IETF) requires all Internet protocols to identify the encoding used for character data with UTF-8 as at least one supported encoding.

Regards,
Jon

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Reply via email to