On Mon, 22 Nov 2010 07:34:15 -0500 Michel Fortin <michel.for...@michelf.com> wrote:
> Just to add to the compexity: graphemes aren't always equivalent to > user-perceived characters either. Ligatures can contain more than one > user-perceived characters. If you're looking for the substring > "flourish" in a string, should it fail to match when it encounters > "flourish" just because of the "fl" (fl) ligature? On most Mac > applications it matches both thanks to sensible defaults in NSString's > search and comparison algorithms. That's true. I guess you're thinking at the distinction between NFD/NFC "canonical forms" and NFKD/NFKC ones (so-called "compatibility"). > So perhaps we need yet another layer over graphemes to represent > user-perceived characters. In my view, this is not the responsability of a general-purpose tool. I guess, but may be wrong, we are clearly entering the field of app logics and semantics. These are for me _not_ general-purpose points (but builtin types & libraries often offer clearly non-general routines like one dealing with casing, or even less general: the set of ASCII letters). These issues would have to be dealt with either by apps or by domain-specific libraries. I find it wrong that Unicode even simply provides standard canonical forms for them (but fortunately common libs do not implement them AFAIK) denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com