On Thu, Nov 17, 2011 at 7:45 AM, Nick Wellnhofer <[email protected]> wrote: > On 17/11/2011 13:37, Robert Muir wrote: >> >> The point of the derived property is that there are sneaky >> interactions between these. > > Having a look at the utf8proc code, the function utf8proc_decompose_char > calls itself recursively when substituting characters. So it looks like it > does support NFKC_Casefold properly. > > Nick >
I don't think so: it seems to only decompose the 'output' case folding mapping. this is not enough. If I remember, the problem is that normalization of course uses context, so the algorithm must be done as stated in the standard: toNFKC_Casefold(X): Map each character C in X to NFKC_Casefold(C) and then normalize the resulting string to NFC doing the mappings: then normalizing the whole string. in icu this is instead done as an additional normalization form, so its single-pass/non-recursive there. -- lucidimagination.com
