On 1/29/16 5:36 PM, Marek Janukowicz wrote:
I have trouble understanding how endianess works for UTF-16.

For example UTF-16 code for 'ł' character is 0x0142. But this program shows
otherwise:

import std.stdio;

public void main () {
   ubyte[] properOrder = [0x01, 0x42];
        ubyte[] reverseOrder = [0x42, 0x01];
        writefln( "proper: %s, reverse: %s",
                cast(wchar[])properOrder,
                cast(wchar[])reverseOrder );
}

output:

proper: 䈁, reverse: ł

Is there anything I should know about UTF endianess?

It's not any different from other endianness.

In other words, a UTF16 code unit is expected to be in the endianness of the platform you are running on.

If you are on x86 or x86_64 (very likely), then it should be little endian.

If your source of data is big-endian (or opposite from your native endianness), it will have to be converted before treating as a wchar[].

Note the version identifiers BigEndian and LittleEndian can be used to compile the correct code.

-Steve

Reply via email to