On Thu, Nov 17, 2011 at 10:52 PM, Marvin Humphrey
<[email protected]> wrote:
>
> OK, I remain at least academically interested in what sort of performance
> advantages 'simple' case folding affords us, and at what penalty in terms of
> relevancy.
>

I think it depends how its implemented, I'm not sure there is really a
performance advantage to the simpler one. In ICU at least, the
recursive part of nfkc_cf is computed up-front, into the data files,
and you get normalization+case folding at runtime in one-pass (versus
utf8proc's multiple passes, and its not clear all the corner cases are
working there)

As far as relevance, I think realistically only german users (ß/SS) or
anyone with ancient greek would care if you cheated and used the
simple one instead, especially if you are already normalizing anyway.

But that was just my point: if you are normalizing anyway, why not
just choose a normalization form that also does the case folding too.

-- 
lucidimagination.com

Reply via email to