On 1/29/16 5:36 PM, Marek Janukowicz wrote:
I have trouble understanding how endianess works for UTF-16.
For example UTF-16 code for 'ł' character is 0x0142. But this program shows
otherwise:
import std.stdio;
public void main () {
ubyte[] properOrder = [0x01, 0x42];
ubyte[] reverseOrder = [0x42, 0x01];
writefln( "proper: %s, reverse: %s",
cast(wchar[])properOrder,
cast(wchar[])reverseOrder );
}
output:
proper: 䈁, reverse: ł
Is there anything I should know about UTF endianess?
It's not any different from other endianness.
In other words, a UTF16 code unit is expected to be in the endianness of
the platform you are running on.
If you are on x86 or x86_64 (very likely), then it should be little endian.
If your source of data is big-endian (or opposite from your native
endianness), it will have to be converted before treating as a wchar[].
Note the version identifiers BigEndian and LittleEndian can be used to
compile the correct code.
-Steve