Difference in behaviour between LowerCaseFilter and String.toLowerCase()

Trejkaz Thu, 29 Nov 2012 19:31:12 -0800

Hi all.

trying to figure out what I was doing wrong in some of my own code so
I looked to LowerCaseFilter since I thought I remembered it doing this
correctly, and lo and behold, it failed the same test I had written.


Is this a bug or an intentional difference in behaviour?

    @Test
    public void testConsistencyWithStringClass() {
        // "Wikipedia" in Turkish, in uppercase.
        String str = "V\u0130K\u0130PED\u0130";
        TokenStream stream = new LowerCaseFilter(Version.LUCENE_36,
            new WhitespaceTokenizer(Version.LUCENE_36, new StringReader(str)));
        assertTrue(stream.incrementToken());
        assertEquals(str.toLowerCase(),
stream.getAttribute(CharTermAttribute.class).toString());
    }

This test fails on the assertEquals() because the actual string which
comes back lacks some of the combining marks.

The reason is that LowerCaseFilter is using Character.toLowerCase(),
which is exactly the method causing the bug I'm experiencing in my own
code, because equalsIgnoreCase() is using it and it's giving
questionable results.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Difference in behaviour between LowerCaseFilter and String.toLowerCase()

Reply via email to