Re: [classlib][luni]difference between RI and ICU

2006-09-17 Thread Richard Liang

Hello,

I have raised a bug for icu[1], not sure whether it will be fixed in
icu4j 3.6. :-)

[1] http://bugs.icu-project.org/cgi-bin/icu-bugs/incoming?findid=5391

On 9/12/06, Richard Liang [EMAIL PROTECTED] wrote:

Hello,

I will clarify this issue with ICU team. ;-)

Best regards,
Richard

Tony Wu wrote:
 I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
 There is a difference between RI and ICU.

 RI spec says,


 It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
 PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
 '\u2007', '\u202F').

 but ICU spec says,

 It is a Unicode space separator (category Zs), but is not a no-break
 space (\u00A0 or \u202F or \uFEFF).

 RI excludes U+2007 however ICU excludes U+FEFF

 And I looked up the definition of these 4 related characters on
 unicode.org:

 00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE
 2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N;
 202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;
 FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK


 I consider it is a bug of ICU because the U+FEFF is not in category
 *Zs* as
 ICU spec described. And I purposed to report that to ICU team.
 Should I handle the U+2007 by ourselves to follow RI or just document
 this
 problem in testcase?


--
Richard Liang
China Software Development Lab, IBM



-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--
Richard Liang
China Development Lab, IBM

-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[classlib][luni]difference between RI and ICU

2006-09-12 Thread Tony Wu

I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
There is a difference between RI and ICU.

RI spec says,



It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
'\u2007', '\u202F').


but ICU spec says,


It is a Unicode space separator (category Zs), but is not a no-break
space (\u00A0 or \u202F or \uFEFF).


RI excludes U+2007 however ICU excludes U+FEFF

And I looked up the definition of these 4 related characters on unicode.org:


00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE
2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N;
202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;
FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK



I consider it is a bug of ICU because the U+FEFF is not in category *Zs* as
ICU spec described. And I purposed to report that to ICU team.
Should I handle the U+2007 by ourselves to follow RI or just document this
problem in testcase?

--
Tony Wu
China Software Development Lab, IBM


Re: [classlib][luni]difference between RI and ICU

2006-09-12 Thread Richard Liang

Hello,

I will clarify this issue with ICU team. ;-)

Best regards,
Richard

Tony Wu wrote:

I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
There is a difference between RI and ICU.

RI spec says,



It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
'\u2007', '\u202F').


but ICU spec says,


It is a Unicode space separator (category Zs), but is not a no-break
space (\u00A0 or \u202F or \uFEFF).


RI excludes U+2007 however ICU excludes U+FEFF

And I looked up the definition of these 4 related characters on 
unicode.org:



00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE
2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N;
202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;
FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK



I consider it is a bug of ICU because the U+FEFF is not in category 
*Zs* as

ICU spec described. And I purposed to report that to ICU team.
Should I handle the U+2007 by ourselves to follow RI or just document 
this

problem in testcase?



--
Richard Liang
China Software Development Lab, IBM 




-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [classlib][luni]difference between RI and ICU

2006-09-12 Thread Robert Hu

Tony Wu 写道:

I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
There is a difference between RI and ICU.

RI spec says,



It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
'\u2007', '\u202F').

Anyway, spec is our first rule to follow.

but ICU spec says,


It is a Unicode space separator (category Zs), but is not a no-break
space (\u00A0 or \u202F or \uFEFF).


RI excludes U+2007 however ICU excludes U+FEFF

And I looked up the definition of these 4 related characters on 
unicode.org:



00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE
2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N;
202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;
FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK

So cool... :-)


I consider it is a bug of ICU because the U+FEFF is not in category 
*Zs* as

ICU spec described. And I purposed to report that to ICU team.
Should I handle the U+2007 by ourselves to follow RI or just document 
this

problem in testcase?

IMO, it's natural to follow RI, and the challenge is to fix it 
gracefully with ICU implementation.


--
Robert Hu
China Software Development Lab, IBM



-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [classlib][luni]difference between RI and ICU

2006-09-12 Thread Spark Shen

Robert Hu 写道:

Tony Wu 写道:

I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
There is a difference between RI and ICU.

RI spec says,



It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
'\u2007', '\u202F').

Anyway, spec is our first rule to follow.
Information from unicode.org is also spec. unicode.org is more official. 
Since RI follows

unicode.org, we should also follow RI, in turn follows unicode.org


but ICU spec says,


It is a Unicode space separator (category Zs), but is not a no-break
space (\u00A0 or \u202F or \uFEFF).


RI excludes U+2007 however ICU excludes U+FEFF

And I looked up the definition of these 4 related characters on 
unicode.org:



00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE
2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N;
202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;
FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK

So cool... :-)


I consider it is a bug of ICU because the U+FEFF is not in category 
*Zs* as

ICU spec described. And I purposed to report that to ICU team.
Should I handle the U+2007 by ourselves to follow RI or just document 
this

problem in testcase?

IMO, it's natural to follow RI, and the challenge is to fix it 
gracefully with ICU implementation.





--
Spark Shen
China Software Development Lab, IBM


-
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [classlib][luni]difference between RI and ICU

2006-09-12 Thread Andrew Zhang

On 9/12/06, Tony Wu [EMAIL PROTECTED] wrote:


I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
There is a difference between RI and ICU.

RI spec says,


 It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
 PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
 '\u2007', '\u202F').

but ICU spec says,

 It is a Unicode space separator (category Zs), but is not a no-break
 space (\u00A0 or \u202F or \uFEFF).

RI excludes U+2007 however ICU excludes U+FEFF

And I looked up the definition of these 4 related characters on
unicode.org:

 00A0;NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;NON-BREAKING SPACE
 2007;FIGURE SPACE;Zs;0;WS;noBreak 0020N;
 202F;NARROW NO-BREAK SPACE;Zs;0;CS;noBreak 0020N;
 FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;N;BYTE ORDER MARK


I consider it is a bug of ICU because the U+FEFF is not in category *Zs*
as
ICU spec described. And I purposed to report that to ICU team.
Should I handle the U+2007 by ourselves to follow RI or just document this
problem in testcase?



I think we could use workaround at first, add FIXME: before workaround,
and write corresponding test case.

When ICU team reponses (no matter accepts or rejects), we could make
decision then.

--

Tony Wu
China Software Development Lab, IBM





--
Andrew Zhang
China Software Development Lab, IBM