Re: Dubious stuff spotted in LowerCaseFilter

Dawid Weiss Thu, 22 Oct 2015 03:01:31 -0700

> LowerCaseFilter will not handle that. So whereas it is "safe" for
> English hard-coded strings, it isn't safe for all fields you might
> index in general.


This filter is a "safe" fallback that works identically regardless of
the locale you
have on your computer (or on the server). This, I believe, is good and
avoids nasty surprises of locale-sensitive environment. Contrary to
the intuition, locale-sensitive methods are more often a headache and
source of problems than whatever value they provide.

If you live in Turkey then I think you should be using the dedicated
TurkishLowerCaseFilter which handles Turkish letter conversion better.

> Hopefully Unicode will never add a code point which lowercases to one with 
> less code units (or I guess
> changes one of the lower ones to lowercase to more than one...)

I agree this is an assumption that will hold... but if you care to provide a
patch then a simple test case like the one I provided would be (I
believe) sufficient to ensure this situation is captured early on
during automated testing.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Dubious stuff spotted in LowerCaseFilter

Reply via email to