Ray Dillinger scripsit: > If you are appending and taking substrings, the codepoint level is one > of several wrong choices to make about where to allow string divisions, > for exactly this reason.
You understate the case: *every* level is a wrong choice for some purposes (and a right choice for others). > What human beings think of as characters, are represented in unicode > by a base codepoint plus nondefective sequence of combining modifiers > and variant selectors, each of which is also a codepoint. The DGC level (which you are describing) is also arbitrary; for some languages it works well, for others not. For example, in all (mainstream) Indic scripts the DGC is a consonant with zero or one vowel added, and this is indeed right for Tamil, whose users think of it as a syllabary. In Hindi, though, it's more common to think of *all* the consonants before a vowel as being part of the character, even though they are in different DGCs according to Unicode, because that's the way they (mostly) ligature together. -- Income tax, if I may be pardoned for saying so, John Cowan is a tax on income. --Lord Macnaghten (1901) [email protected] _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
