Hi all. trying to figure out what I was doing wrong in some of my own code so I looked to LowerCaseFilter since I thought I remembered it doing this correctly, and lo and behold, it failed the same test I had written.
Is this a bug or an intentional difference in behaviour? @Test public void testConsistencyWithStringClass() { // "Wikipedia" in Turkish, in uppercase. String str = "V\u0130K\u0130PED\u0130"; TokenStream stream = new LowerCaseFilter(Version.LUCENE_36, new WhitespaceTokenizer(Version.LUCENE_36, new StringReader(str))); assertTrue(stream.incrementToken()); assertEquals(str.toLowerCase(), stream.getAttribute(CharTermAttribute.class).toString()); } This test fails on the assertEquals() because the actual string which comes back lacks some of the combining marks. The reason is that LowerCaseFilter is using Character.toLowerCase(), which is exactly the method causing the bug I'm experiencing in my own code, because equalsIgnoreCase() is using it and it's giving questionable results. TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org