Re: [r6rs-discuss] Strings

Jon Wilson Mon, 26 Mar 2007 12:23:07 -0800

Hi Jason,

Jason Orendorff wrote:

And most (but not all) Unicode string implementations use UTF-16.
Among languages and libraries that are very widely used, the majority
is overwhelming: Java, Microsoft's CLR, Python, JavaScript, Qt,
Xerces-C, and on and on.  (The few counterexamples use UTF-8: glib,
expat.  And expat can be compiled to use UTF-16.)

If this is true, then I would expect to find relatively little mentionof UTF-8 compared to UTF-16 on the internet. However, the google testturns up *1,040,000* for *utf-16* versus *173,000,000* for *utf-8*.Now, of course I realize that this is a particularly crude technique fordetermining the relative popularity of UTF-8 and UTF-16, but even a verycrude technique does not cause this much of a discrepancy. 173 : 1 isquite a steep ratio.

I'm sure this all has a simple explanation, but if we're going to usepopularity as a criterion for choosing a string representation, then weought to be really sure that we've got that popularity lined up theright way around.


Incidentally: *497,000* for *utf-32*.

Furthermore, the IETF likes UTF-8 best.  From the UTF-8 wikipedia page:

The Internet Engineering Task Force (IETF) requires all Internetprotocols to identify the encoding used for character data with UTF-8 asat least one supported encoding.


Regards,
Jon

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] Strings

Reply via email to