Hi Richard, On 4/6/06, Richard Liang <[EMAIL PROTECTED]> wrote: > > And as described in Unioccde, UTF-16 can be encoded as either big endian > or little endian, but a leading byte sequence corresponding to U+FEFF > will be used to distinguish the two byte orders. > > If the leading byte sequence is FE FF, the whole byte sequence will be > regarded as big-endian > If the leading byte sequence is FF FE, the whole byte sequence will be > regarded as little-endian. > > From your test, we can see Harmony use little-endian, while RI use > big-endian. > > I'm sorry if my explanation make you confused :-)
I absolutely agreed with you. Thanks a lot for your explanation and sorry for my brief description of the issue. As you exactly noticed the cause of this issue that Harmony uses the little-endian byte order, if an encoded UTF-16 sequence has no byte-order mark. However, the spec reads such a case explicitly as follows: "When decoding, the UTF-16 charset interprets a byte-order mark to indicate the byte order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark." Thanks. > -- > Dmitry M. Kononov > Intel Managed Runtime Division