Re: Dubious stuff spotted in LowerCaseFilter

Trejkaz Thu, 22 Oct 2015 02:54:03 -0700

On Thu, Oct 22, 2015 at 7:05 PM, Uwe Schindler <u...@thetaphi.de> wrote:
> Hi,
>
>> Setting aside the fact that Character.toLowerCase is already dubious in some 
>> locales (e.g. Turkish),
>
> This is not true. Character.toLowerCase() works locale-independent.
> It is only String.toLowerCase that works using default locale.


Yet if you have a field like "title" and the user and system are
Turkish, the user would expect their locale to apply, yet
LowerCaseFilter will not handle that. So whereas it is "safe" for
English hard-coded strings, it isn't safe for all fields you might
index in general.

Dawid's response shows, though, that at least for the time being,
there is nothing to worry about. Hopefully Unicode will never add a
code point which lowercases to one with less code units (or I guess
changes one of the lower ones to lowercase to more than one...)

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Dubious stuff spotted in LowerCaseFilter

Reply via email to