Re: [classlib][luni]difference between RI and ICU
Hello, I have raised a bug for icu[1], not sure whether it will be fixed in icu4j 3.6. :-) [1] http://bugs.icu-project.org/cgi-bin/icu-bugs/incoming?findid=5391 On 9/12/06, Richard Liang [EMAIL PROTECTED] wrote: Hello, I will clarify this issue with ICU team. ;-) Best regards, Richard Tony Wu wrote: I encounter a problem when implement isWhiteSpace(int) in j.l.Character. There is a difference between RI and ICU. RI spec says, It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F'). but ICU spec says, It is a Unicode space separator (category Zs), but is not a no-break space (\u00A0 or \u202F or \uFEFF). RI excludes U+2007 however ICU excludes U+FEFF And I looked up the definition of these 4 related characters on unicode.org: 00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE 2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N; 202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N; FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK I consider it is a bug of ICU because the U+FEFF is not in category *Zs* as ICU spec described. And I purposed to report that to ICU team. Should I handle the U+2007 by ourselves to follow RI or just document this problem in testcase? -- Richard Liang China Software Development Lab, IBM - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Richard Liang China Development Lab, IBM - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[classlib][luni]difference between RI and ICU
I encounter a problem when implement isWhiteSpace(int) in j.l.Character. There is a difference between RI and ICU. RI spec says, It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F'). but ICU spec says, It is a Unicode space separator (category Zs), but is not a no-break space (\u00A0 or \u202F or \uFEFF). RI excludes U+2007 however ICU excludes U+FEFF And I looked up the definition of these 4 related characters on unicode.org: 00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE 2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N; 202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N; FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK I consider it is a bug of ICU because the U+FEFF is not in category *Zs* as ICU spec described. And I purposed to report that to ICU team. Should I handle the U+2007 by ourselves to follow RI or just document this problem in testcase? -- Tony Wu China Software Development Lab, IBM
Re: [classlib][luni]difference between RI and ICU
Hello, I will clarify this issue with ICU team. ;-) Best regards, Richard Tony Wu wrote: I encounter a problem when implement isWhiteSpace(int) in j.l.Character. There is a difference between RI and ICU. RI spec says, It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F'). but ICU spec says, It is a Unicode space separator (category Zs), but is not a no-break space (\u00A0 or \u202F or \uFEFF). RI excludes U+2007 however ICU excludes U+FEFF And I looked up the definition of these 4 related characters on unicode.org: 00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE 2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N; 202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N; FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK I consider it is a bug of ICU because the U+FEFF is not in category *Zs* as ICU spec described. And I purposed to report that to ICU team. Should I handle the U+2007 by ourselves to follow RI or just document this problem in testcase? -- Richard Liang China Software Development Lab, IBM - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [classlib][luni]difference between RI and ICU
Tony Wu 写道: I encounter a problem when implement isWhiteSpace(int) in j.l.Character. There is a difference between RI and ICU. RI spec says, It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F'). Anyway, spec is our first rule to follow. but ICU spec says, It is a Unicode space separator (category Zs), but is not a no-break space (\u00A0 or \u202F or \uFEFF). RI excludes U+2007 however ICU excludes U+FEFF And I looked up the definition of these 4 related characters on unicode.org: 00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE 2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N; 202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N; FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK So cool... :-) I consider it is a bug of ICU because the U+FEFF is not in category *Zs* as ICU spec described. And I purposed to report that to ICU team. Should I handle the U+2007 by ourselves to follow RI or just document this problem in testcase? IMO, it's natural to follow RI, and the challenge is to fix it gracefully with ICU implementation. -- Robert Hu China Software Development Lab, IBM - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [classlib][luni]difference between RI and ICU
Robert Hu 写道: Tony Wu 写道: I encounter a problem when implement isWhiteSpace(int) in j.l.Character. There is a difference between RI and ICU. RI spec says, It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F'). Anyway, spec is our first rule to follow. Information from unicode.org is also spec. unicode.org is more official. Since RI follows unicode.org, we should also follow RI, in turn follows unicode.org but ICU spec says, It is a Unicode space separator (category Zs), but is not a no-break space (\u00A0 or \u202F or \uFEFF). RI excludes U+2007 however ICU excludes U+FEFF And I looked up the definition of these 4 related characters on unicode.org: 00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE 2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N; 202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N; FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK So cool... :-) I consider it is a bug of ICU because the U+FEFF is not in category *Zs* as ICU spec described. And I purposed to report that to ICU team. Should I handle the U+2007 by ourselves to follow RI or just document this problem in testcase? IMO, it's natural to follow RI, and the challenge is to fix it gracefully with ICU implementation. -- Spark Shen China Software Development Lab, IBM - Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [classlib][luni]difference between RI and ICU
On 9/12/06, Tony Wu [EMAIL PROTECTED] wrote: I encounter a problem when implement isWhiteSpace(int) in j.l.Character. There is a difference between RI and ICU. RI spec says, It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F'). but ICU spec says, It is a Unicode space separator (category Zs), but is not a no-break space (\u00A0 or \u202F or \uFEFF). RI excludes U+2007 however ICU excludes U+FEFF And I looked up the definition of these 4 related characters on unicode.org: 00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE 2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N; 202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N; FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK I consider it is a bug of ICU because the U+FEFF is not in category *Zs* as ICU spec described. And I purposed to report that to ICU team. Should I handle the U+2007 by ourselves to follow RI or just document this problem in testcase? I think we could use workaround at first, add FIXME: before workaround, and write corresponding test case. When ICU team reponses (no matter accepts or rejects), we could make decision then. -- Tony Wu China Software Development Lab, IBM -- Andrew Zhang China Software Development Lab, IBM