Date: Tue, 10 Dec 2002 19:07:55 +0900 (JST)
   From: Mark Keasling <[EMAIL PROTECTED]>
[...]
   I'm in the process of trying to figure out how this stuff works...
   Is it possible to separate the charset to utf-8 conversion from the text to
   search data transformation?

It would be technically possible. It's probably not the easiest thing
to do in the Cyrus code base.

Currently mkchartable.c does casemapping, character decomposition, and
whitespace elimination. It also applies some mappings
(charset/unifix.txt) that help with a language independant match but
may not be appropriate for collation or all UTF-8 comparators.

To make the chartable stuff work for Sieve & our current SEARCH, we
probably should build tables that just output decomposed (or fully
composed) UTF-8 characters.

We can then write a UTF-8 comparator library that, during comparison,
does the canonicalization.

The easier path to make Sieve work would be to just build two
completely seperate tables. I'd prefer to see the more general
solution.

While none of this is rocket science, it is heavily detailed oriented
and requires concentration.

Larry


Reply via email to