Jon Wilson wrote:
Jason Orendorff wrote: > And most (but not all) Unicode string implementations use UTF-16. > Among languages and libraries that are very widely used, the majority > is overwhelming: Java, Microsoft's CLR, Python, JavaScript, Qt, > Xerces-C, and on and on. (The few counterexamples use UTF-8: glib, > expat. And expat can be compiled to use UTF-16.) If this is true, then I would expect to find relatively little mention of UTF-8 compared to UTF-16 on the internet. However, the google test turns up *1,040,000* for *utf-16* versus *173,000,000* for *utf-8*. Now, of course I realize that this is a particularly crude technique for determining the relative popularity of UTF-8 and UTF-16, but even a very crude technique does not cause this much of a discrepancy. 173 : 1 is quite a steep ratio.
By this reckoning, UTF-8 is more popular than Unicode, which only gets 39,000,000 hits. Actually, according to Google, UTF-8 is more popular than Jesus. Incidentally, if you don't adjust for cluefulness, UTF-16 is more often called "Unicode". Dreadful but true, especially in the Windows and Java worlds. Bottom line: nobody else thinks about this stuff but language designers and highly clueful library designers.
The Internet Engineering Task Force (IETF) requires all Internet protocols to identify the encoding used for character data with UTF-8 as at least one supported encoding.
As a *transmission* format, UTF-8 is much more common than UTF-16, for good reasons--but nowhere near as common as, say, Latin-1. In other words, when doing I/O, a transcoding step is usually necessary anyway. -j _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
