Re: i18n query normalization

Ken Krugler Tue, 23 Aug 2005 11:31:01 -0700

   We have a multi-languaged index and we need to match accented
characters with non accented characters. For example, if a document
contains: mângão, the query: mangao should match it.


    I guess I would have to build some sort of analyzer/tokenizer for this.

    I was wondering if there are tokenizers already built for lucene.

Search the archives for a discussion about this,back in June I believe. I'd suggested using ICUto generate sort keys, and indexing those.


-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: i18n query normalization

Reply via email to