Dear Tatyana,
As you may know, our (Harmony) implementation just wraps ICU4J's
BreakIterator. And the rules of ICU4J's BreakIterator is compliant with
Unicode TR29 which is different with the rules of RI.
This is a common issue for most of the classes in "text". If we want
implementation to have the same behavior as RI, we should get the rules
of RI. However, I think the rules must be controlled by some kinds of
license. So a better solution may be wrapping icu4j's implementation for
all text (internationalization) classes. As I know, ICU4J is special for
i18n.
Any comments? Thanks a lot.
Please refer to ICU's homepage: http://icu.sourceforge.net/
Richard Liang
China Software Development Lab, IBM
tatyana doubtsova (JIRA) wrote:
java.text.BreakIterator.getSentenceInstance().next() treats '\n' as the end of
the sentence
-------------------------------------------------------------------------------------------
Key: HARMONY-62
URL: http://issues.apache.org/jira/browse/HARMONY-62
Project: Harmony
Type: Bug
Components: Classlib
Reporter: tatyana doubtsova
Problem details:
java.text.BreakIterator.getSentenceInstance().next() stops searching for the
sentence end, if the new-line character is found in the text and returns the
index of the last seen non white space character. Due to j2se 1.4.2 method
next() should return the boundary following the current boundary.
Code for reproducing Test.java:
import java.text.BreakIterator;
public class Test {
public static void main(String [] args)
{
BreakIterator it = BreakIterator.getSentenceInstance();
it.setText("One sentence \n on two lines.");
System.out.println(it.next());
}
}
Steps to Reproduce:
1. Build Harmony (check-out on 2006-01-30) j2se subset as described in
README.txt.
2. Compile Test.java using BEA 1.4 javac
javac -d . Test.java
3. Run java using compatible VM (J9)
java -showversion Test
Output:
java version 1.4.2 (subset)
(c) Copyright 1991, 2005 The Apache Software Foundation or its licensors, as
applicable.
14
Output on BEA 1.4.2 to compare with:
28
Suggested junit test case:
package org.apache.harmony.tests.java.text;
import java.text.BreakIterator;
import java.util.Locale;
import junit.framework.TestCase;
public class BreakIteratorTest extends TestCase {
public void test_next() {
// Regression test for HARMONY-30
BreakIterator bi = BreakIterator.getWordInstance(Locale.US);
bi.setText("This is the test, WordInstance");
int n = bi.first();
n = bi.next();
assertEquals("Assert 0: next() returns incorrect value ", 4, n);
// Regression test for the current issue
bi = BreakIterator.getSentenceInstance();
bi.setText("One sentence \n on two lines.");
n = bi.next();
assertEquals("Assert 1: next() returns incorrect value ", 28,
n);
}
}