[r6rs-discuss] Strings

Jason Orendorff Sat, 24 Mar 2007 19:13:20 -0800

Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote:

A disadvantage of UTF-16 is that character predicates like
char-alphabetic? break for characters above U+FFFF.


This kind of bug is pretty common in Java, but it isn't a
necessary consequence of using UTF-16.

Nor does focusing on scalar values fix the problem:

 (define (all-alphabetic? s)
   (for-all char-alphabetic? (string->list s)))  ;BUG

This bug is both subtler and more likely to bite.

You could fix both by providing higher-level APIs:
 (string-first s) ===> the first grapheme cluster
 (string-rest s) ===> everything else
and so on.  The way this leads is to a realignment of
all the string/character APIs toward grapheme clusters,
away from scalar values.  I offer this because if the
editors want to do something unconventional, I think
this is the way to go.

-j

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

[r6rs-discuss] Strings

Reply via email to