Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Dmitry M. Kononov Thu, 06 Apr 2006 05:46:17 -0700

Hi Richard,

On 4/6/06, Richard Liang <[EMAIL PROTECTED]> wrote:
>
> And as described in Unioccde, UTF-16 can be encoded as either big endian
> or little endian, but a leading byte sequence corresponding to U+FEFF
> will be used to distinguish the two byte orders.
>
> If the leading byte sequence is FE FF, the whole byte sequence will be
> regarded as big-endian
> If the leading byte sequence is FF FE, the whole byte sequence will be
> regarded as little-endian.
>
> From your test, we can see Harmony use little-endian, while RI use
> big-endian.
>
> I'm sorry if my explanation make you confused :-)



I absolutely agreed with you. Thanks a lot for your explanation and sorry
for my brief description of the issue.

As you exactly noticed the cause of this issue that Harmony uses the
little-endian byte order, if an encoded UTF-16 sequence has no byte-order
mark. However, the spec reads such a case explicitly as follows:

"When decoding, the UTF-16 charset interprets a byte-order mark to indicate
the byte order of the stream but defaults to big-endian if there is no
byte-order mark; when encoding, it uses big-endian byte order and writes a
big-endian byte-order mark."

Thanks.

> --
> Dmitry M. Kononov
> Intel Managed Runtime Division

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Reply via email to