Date: Tue, 10 Dec 2002 19:07:55 +0900 (JST) From: Mark Keasling <[EMAIL PROTECTED]> [...] I'm in the process of trying to figure out how this stuff works... Is it possible to separate the charset to utf-8 conversion from the text to search data transformation?
It would be technically possible. It's probably not the easiest thing to do in the Cyrus code base. Currently mkchartable.c does casemapping, character decomposition, and whitespace elimination. It also applies some mappings (charset/unifix.txt) that help with a language independant match but may not be appropriate for collation or all UTF-8 comparators. To make the chartable stuff work for Sieve & our current SEARCH, we probably should build tables that just output decomposed (or fully composed) UTF-8 characters. We can then write a UTF-8 comparator library that, during comparison, does the canonicalization. The easier path to make Sieve work would be to just build two completely seperate tables. I'd prefer to see the more general solution. While none of this is rocket science, it is heavily detailed oriented and requires concentration. Larry