Re: UTF-16 endianess

Steven Schveighoffer via Digitalmars-d-learn Fri, 29 Jan 2016 14:46:28 -0800

On 1/29/16 5:36 PM, Marek Janukowicz wrote:

I have trouble understanding how endianess works for UTF-16.


For example UTF-16 code for 'ł' character is 0x0142. But this program shows
otherwise:

import std.stdio;

public void main () {
   ubyte[] properOrder = [0x01, 0x42];
        ubyte[] reverseOrder = [0x42, 0x01];
        writefln( "proper: %s, reverse: %s",
                cast(wchar[])properOrder,
                cast(wchar[])reverseOrder );
}

output:

proper: 䈁, reverse: ł

Is there anything I should know about UTF endianess?


It's not any different from other endianness.

In other words, a UTF16 code unit is expected to be in the endianness ofthe platform you are running on.


If you are on x86 or x86_64 (very likely), then it should be little endian.

If your source of data is big-endian (or opposite from your nativeendianness), it will have to be converted before treating as a wchar[].

Note the version identifiers BigEndian and LittleEndian can be used tocompile the correct code.


-Steve

Re: UTF-16 endianess

Reply via email to