Robert, what if I need to do additional filtering after CollationKeyFilter, like stopwords removal, abbreviations handling, stemming etc? Will that be possible if I use CollationKeyFilter?
I also noticed CKF creates a String out of the char[]. If the code already does that, why not use String.toLowerCase(Locale)? Shai On Mon, Nov 30, 2009 at 9:46 PM, Simon Willnauer < [email protected]> wrote: > On Mon, Nov 30, 2009 at 8:08 PM, Robert Muir <[email protected]> wrote: > >> I am not sure if it is worth to add a new TokenFilter for Turkish > language. > >> I see there exist GreekLowerCaseFilter and RussianLowerCaseFilter. It > would > >> be nice to see TurkishLowerCaseFilter in Lucene. > >> > >> > >> > > just to clarify, GreekLowerCaseFilter really shouldn't exist either. The > > final sigma problem it has (where there are two lowercase forms depending > > upon position in word), this is also solved with unicode case folding or > > collation. This is a perfect example of how lowercase is the wrong > operation > > for search. > > > > and RussianLowerCaseFilter is deprecated now, it does the exact same > thing > > as LowerCaseFilter. > btw. we should fix supplementary chars in there too even if it is > deprecated. > > > > > -- > > Robert Muir > > [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
