Hi. I have encountered a problem searching in my application because of inconsistant unicode normalization forms in the corpus (and the queries). I would like to normalize to form NFKD in an analyzer (I think). I was thinking about creating a filter similar to the lowercasefilter that would do the unicode normalization. Then I will add that filter to my existing snowball analyzer. I am about to embark on creating said analyzer/filter using the ICU (http://icu-project.org/) icu4j jar.
Is this already accounted for in standard lucene somewhere and I'm just missing it? Anything similar out there? Any other advice? Thanks, Dave Wooodward Library of Congress --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]