Hi Jason,
Jason Orendorff wrote:
And most (but not all) Unicode string implementations use UTF-16.
Among languages and libraries that are very widely used, the majority
is overwhelming: Java, Microsoft's CLR, Python, JavaScript, Qt,
Xerces-C, and on and on. (The few counterexamples use UTF-8: glib,
expat. And expat can be compiled to use UTF-16.)
If this is true, then I would expect to find relatively little mention
of UTF-8 compared to UTF-16 on the internet. However, the google test
turns up *1,040,000* for *utf-16* versus *173,000,000* for *utf-8*.
Now, of course I realize that this is a particularly crude technique for
determining the relative popularity of UTF-8 and UTF-16, but even a very
crude technique does not cause this much of a discrepancy. 173 : 1 is
quite a steep ratio.
I'm sure this all has a simple explanation, but if we're going to use
popularity as a criterion for choosing a string representation, then we
ought to be really sure that we've got that popularity lined up the
right way around.
Incidentally: *497,000* for *utf-32*.
Furthermore, the IETF likes UTF-8 best. From the UTF-8 wikipedia page:
The Internet Engineering Task Force (IETF) requires all Internet
protocols to identify the encoding used for character data with UTF-8 as
at least one supported encoding.
Regards,
Jon
_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss