Hi Andrew, On 4/7/06, Andrew Zhang <[EMAIL PROTECTED]> wrote: > > Hello, Dmirty, > > I agree with you that Harmony's behavior is not consistent with java spec.
:) As you may know, java.nio.charset.Charset wraps ICU to implement > encode/decode operations. > > The following description is cited from ICU: ( > http://icu.sourceforge.net/userguide/unicodeBasics.html) > > *The names "UTF-16" and "UTF-32" are ambiguous. Depending on context, they > refer either to character encoding forms where 16/32-bit words are > processed > and are naturally stored in the platform endianness, or they refer to the > IANA-registered charset names, i.e., to character encoding schemes or byte > serializations. In addition to simple byte serialization, the charsets > with > these names also use optional Byte Order Marks (see **Serialized > Formats*< > http://icu.sourceforge.net/userguide/unicodeBasics.html#serialized_formats > > > * below).* > > Thanks, it's a good point. However, I found the following text in this document that let us think that there is a bug in ICU. Please note the latest sentence, that describes our case exactly, I believe: "In UTF-16 and UTF-32, where the signature also distinguishes between big-endian and little-endian byte orders, it is also called a byte order mark (BOM). The signature works for UTF-16 since the code point that has the byte-swapped encoding, FFFE16, will never be a valid Unicode character. (It is a "non-character" code point.) In Internet protocols, if an encoding specification of "UTF-16" or "UTF-32" is used, it is expected that there is a signature byte sequence (BOM) that identifies the byte ordering, which is not the case for the encoding scheme/charset names with "BE" or "LE". If text is specified to be encoded in the UTF-16 or UTF-32 charset and does not begin with a BOM, then it must be interpreted as UTF-16BE or UTF-32BE, respectively." Harmony and IBM jdk1.4.2 use the ICU to provide java.nio.charsetfunctionality. So, they have the same behavior in our case. This behavior does not follow the java documentation (or I something don't understand :) ). Thus, we probably need to ask about fixing the ICU, don't we? What do you think, does it make sense to file a bug against ICU? Thanks. -- Dmitry M. Kononov Intel Managed Runtime Division