Ray Dillinger scripsit: > 40 of which don't count because they're not part of the repertoire of > normalized characters,
That is, normalizations which remove compatibility characters (NKFC and NKFD). There exist good reasons to keep compatibility characters, though, in which case this characterization is inaccurate. > and 88 of which are single characters that > change under casing operations to single characters, confusing only > those who have already confused character lengths with codepoint > lengths. "Character" is a vague term; it has five definitions in the Unicode glossary. You are identifying characters with DCGs, which are sensible for some languages and purposes but misfire for others. Tamil users think of their abugida as a syllabary, and DCGs work well for them; Hindi users think of their closely related abugida as either an alphabet or a set of consonant clusters with vowel marks, depending on the ligaturing behavior they are most familiar with. Likewise, in Swedish ä and ö are as distinct from a and o as i from j or G from C; in German, the umlauted letters are mere variants of their normal counterparts. Furthermore, Spanish é is just an e that bears word stress, whereas in French é, è, and e are three separate entities. The one true answer is that there is no one true answer. Codepoints are the irreducible minimum level: when you go down to code units or octets, you lose too much semantic import and are in the realm of encodings of Unicode rather than Unicode itself. Above that there are many ways to segment strings, some language-specific, some not. I don't see much point in privileging one over another. What's more, using DCGs means that strings are a denumerably infinite domain of finite sequences over another denumerably infinite domain, DCGs. Some might think that one denumerably infinite domain was sufficient. -- My corporate data's a mess! John Cowan It's all semi-structured, no less. http://www.ccil.org/~cowan But I'll be carefree [email protected] Using XSLT On an XML DBMS. _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
