Dear Tatyana,

As you may know, our (Harmony) implementation just wraps ICU4J's BreakIterator. And the rules of ICU4J's BreakIterator is compliant with Unicode TR29 which is different with the rules of RI.

This is a common issue for most of the classes in "text". If we want implementation to have the same behavior as RI, we should get the rules of RI. However, I think the rules must be controlled by some kinds of license. So a better solution may be wrapping icu4j's implementation for all text (internationalization) classes. As I know, ICU4J is special for i18n.

Any comments? Thanks a lot.

Please refer to ICU's homepage: http://icu.sourceforge.net/

Richard Liang
China Software Development Lab, IBM



tatyana doubtsova (JIRA) wrote:
java.text.BreakIterator.getSentenceInstance().next() treats '\n' as the end of 
the sentence
-------------------------------------------------------------------------------------------

         Key: HARMONY-62
         URL: http://issues.apache.org/jira/browse/HARMONY-62
     Project: Harmony
        Type: Bug
Components: Classlib Reporter: tatyana doubtsova


Problem details:
java.text.BreakIterator.getSentenceInstance().next() stops searching for the 
sentence end, if the new-line character is found in the text and returns the 
index of the last seen non white space character. Due to j2se 1.4.2 method 
next() should return the boundary following the current boundary.

Code for reproducing Test.java:
import java.text.BreakIterator;
public class Test {
    public static void main(String [] args)
    {
        BreakIterator it = BreakIterator.getSentenceInstance();
        it.setText("One sentence \n on two lines.");
        System.out.println(it.next());
    }
}

Steps to Reproduce:
1. Build Harmony (check-out on 2006-01-30) j2se subset as described in 
README.txt.
2. Compile Test.java using BEA 1.4 javac
javac -d . Test.java
3. Run java using compatible VM (J9)
java -showversion Test

Output:
java version 1.4.2 (subset)
(c) Copyright 1991, 2005 The Apache Software Foundation or its licensors, as 
applicable.
14

Output on BEA 1.4.2 to compare with:
28

Suggested junit test case:

package org.apache.harmony.tests.java.text;

import java.text.BreakIterator;
import java.util.Locale;

import junit.framework.TestCase;

public class BreakIteratorTest extends TestCase {

        public void test_next() {
                // Regression test for HARMONY-30
                BreakIterator bi = BreakIterator.getWordInstance(Locale.US);
                bi.setText("This is the test, WordInstance");
                int n = bi.first();
                n = bi.next();
assertEquals("Assert 0: next() returns incorrect value ", 4, n);
                // Regression test for the current issue
                bi = BreakIterator.getSentenceInstance();
                bi.setText("One sentence \n on two lines.");
                n = bi.next();
                assertEquals("Assert 1: next() returns incorrect value ", 28, 
n);
        }
}


Reply via email to