Re: [r6rs-discuss] Strings

Marcin 'Qrczak' Kowalczyk Sun, 25 Mar 2007 02:47:48 -0800

Dnia 24-03-2007, sob o godzinie 13:31 -0400, [EMAIL PROTECTED]
napisał(a):


> Summary
> "This document attempts to make the case that it is advantageous to use 
> UTF-16 (or 16-bit Unicode strings) for text processing..."

IMHO this is one of the worst mistakes Unicode is trying to make.
It convinces people that they should not worry about characters above
U+FFFF just because they are very rare. UTF-16 combines the worst
aspects of UTF-8 and UTF-32.

If size is important and variable width of the representation of a code
point is acceptable, then UTF-8 is usually a better choice. If O(1)
indexing by code points is important, then UTF-32 it better. Nobody
wants to process texts in terms of UTF-16 code units. Nobody wants to
have surrogate processing sprinkled around the code, and thus if one
accepts an API which extracts variable width characters, then the API
could as well deal with UTF-8, which is better for interoperability.
UTF-16 makes no sense.

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/


_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] Strings

Reply via email to