Sounds like a side effect of possibly different, locale-dependent, results of using String.toLowerCase() and/or Character.toLowerCase().
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#toLowerCase() specifically mentions Turkish. A Google search for "Character.toLowerCase() turkish" gets hits which sound relevant. -- Ian. On Fri, Nov 30, 2012 at 3:30 AM, Trejkaz <trej...@trypticon.org> wrote: > Hi all. > > trying to figure out what I was doing wrong in some of my own code so > I looked to LowerCaseFilter since I thought I remembered it doing this > correctly, and lo and behold, it failed the same test I had written. > > Is this a bug or an intentional difference in behaviour? > > @Test > public void testConsistencyWithStringClass() { > // "Wikipedia" in Turkish, in uppercase. > String str = "V\u0130K\u0130PED\u0130"; > TokenStream stream = new LowerCaseFilter(Version.LUCENE_36, > new WhitespaceTokenizer(Version.LUCENE_36, new > StringReader(str))); > assertTrue(stream.incrementToken()); > assertEquals(str.toLowerCase(), > stream.getAttribute(CharTermAttribute.class).toString()); > } > > This test fails on the assertEquals() because the actual string which > comes back lacks some of the combining marks. > > The reason is that LowerCaseFilter is using Character.toLowerCase(), > which is exactly the method causing the bug I'm experiencing in my own > code, because equalsIgnoreCase() is using it and it's giving > questionable results. > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org