On Wed, Jan 9, 2013 at 10:57 AM, Steve Rowe <sar...@gmail.com> wrote: > Trejkaz (and maybe Sai too): ICUTokenizer in Lucene's icu module may be be of > interest to you, along with the token filters in that same module. - Steve
ICUTokenizer sounds like it's implementing UAX #29, which is exactly the standard filled with all the issues I was describing. Unless it does more than that, I would recommend against using that also. TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org