RE: Dubious stuff spotted in LowerCaseFilter

Uwe Schindler Thu, 22 Oct 2015 01:06:40 -0700

Hi,

> Setting aside the fact that Character.toLowerCase is already dubious in some 
> locales (e.g. Turkish),


This is not true. Character.toLowerCase() works locale-independent. It is only 
String.toLowerCase that works using default locale.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Trejkaz [mailto:trej...@trypticon.org]
> Sent: Thursday, October 22, 2015 7:15 AM
> To: Lucene Users Mailing List
> Subject: Dubious stuff spotted in LowerCaseFilter
> 
> Hi all.
> 
> LowerCaseFilter uses CharacterUtils.toLowerCase to perform its work.
> The latter method looks like this:
> 
> public final void toLowerCase(final char[] buffer, final int offset, final 
> int limit)
> {
>   assert buffer.length >= limit;
>   assert offset <=0 && offset <= buffer.length;
>   for (int i = offset; i < limit;) {
>     i += Character.toChars(
>             Character.toLowerCase(
>                 codePointAt(buffer, i, limit)), buffer, i);
>    }
> }
> 
> Setting aside the fact that Character.toLowerCase is already dubious in some
> locales (e.g. Turkish), I notice that this is using the same "i" index 
> counter to
> refer to both the source offset and the destination offset. So basically, this
> code has an undocumented assumption that Character.toLowerCase always
> returns a code point which takes up the same number of characters as the
> original one.
> 
> Whereas I do suppose that this might be the case, did someone actually
> verify it? Say, by iterating all code points or something? How confident are
> we that this will continue to be the case forever? :)
> 
> TX
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Dubious stuff spotted in LowerCaseFilter

Reply via email to