That's something we can try. I don't know how much it performance we'd lose doing that as our custom filter has to decompose the tokens to do its operations. So instead of 0..1 conversions we'd be doing 1..2 conversions during indexing and searching.
-----Original Message----- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Saturday, February 21, 2009 8:35 AM To: java-user@lucene.apache.org Subject: Re: 2.3.2 -> 2.4.0 StandardTokenizer issue normalize your text to NFC. then it will be \u0043 \u00F3 \u006D \u006F and will work... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org