I think the issue here is what happens if an "uppercase" codepoint requires a surrogate pair and the lowercase counterpart does not -- then the index variable would indeed be screwed.
Dawid On Thu, Oct 22, 2015 at 10:05 AM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > > Setting aside the fact that Character.toLowerCase is already dubious in > some locales (e.g. Turkish), > > This is not true. Character.toLowerCase() works locale-independent. It is > only String.toLowerCase that works using default locale. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Trejkaz [mailto:trej...@trypticon.org] > > Sent: Thursday, October 22, 2015 7:15 AM > > To: Lucene Users Mailing List > > Subject: Dubious stuff spotted in LowerCaseFilter > > > > Hi all. > > > > LowerCaseFilter uses CharacterUtils.toLowerCase to perform its work. > > The latter method looks like this: > > > > public final void toLowerCase(final char[] buffer, final int offset, > final int limit) > > { > > assert buffer.length >= limit; > > assert offset <=0 && offset <= buffer.length; > > for (int i = offset; i < limit;) { > > i += Character.toChars( > > Character.toLowerCase( > > codePointAt(buffer, i, limit)), buffer, i); > > } > > } > > > > Setting aside the fact that Character.toLowerCase is already dubious in > some > > locales (e.g. Turkish), I notice that this is using the same "i" index > counter to > > refer to both the source offset and the destination offset. So > basically, this > > code has an undocumented assumption that Character.toLowerCase always > > returns a code point which takes up the same number of characters as the > > original one. > > > > Whereas I do suppose that this might be the case, did someone actually > > verify it? Say, by iterating all code points or something? How confident > are > > we that this will continue to be the case forever? :) > > > > TX > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >