Jason Orendorff scripsit: > I think people who favor strings-as-codepoint-vectors must also think > that breaking a surrogate pair is really bad. But even with a > codepoint-centric view of text you can unwittingly break a grapheme > cluster, which amounts to the same sort of bug--it can lead to garbled > text--and which is probably much *more* common in practice. I never > hear anyone complain about that.
I absolutely disagree that these two problems are analogous at all: Separating surrogate pairs is (a) UTF-16 specific and (b) leaves the result uninterpretable. Gumming up a grapheme cluster is more like an off-by-one error in inserting a character: the output is garbled but not garbage. -- John Cowan <[EMAIL PROTECTED]> http://www.ccil.org/~cowan One time I called in to the central system and started working on a big thick 'sed' and 'awk' heavy duty data bashing script. One of the geologists came by, looked over my shoulder and said 'Oh, that happens to me too. Try hanging up and phoning in again.' --Beverly Erlebacher _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
