Re: How to print unicode characters (no library)?

Patrick Schluter via Digitalmars-d-learn Tue, 28 Dec 2021 04:26:55 -0800

On Monday, 27 December 2021 at 07:12:24 UTC, rempas wrote:

I don't understand that. Based on your calculations, theresults should have been different. Also how are the numbersfixed? Like you said the amount of bytes of each encoding isnot always standard for every character. Even if they werefixed this means 2-bytes for each UTF-16 character and 4-bytesfor each UTF-32 character so still the numbers doesn't makesense to me. So still the number of the "length" propertyshould have been the same for every encoding or at least forUTF-16 and UTF-32. So are the sizes of every character fixed ornot?

Your string is represented by 8 codepoints. The number ofcodeunits to represent them in memory depends on the encoding. Dsupports to work with 3 different encodings (in the Unicodestandard there are more than these 3)


    string  utf8s  = "Hello 😂\n";
    wstring utf16s = "Hello 😂\n"w;
    dstring utf32s = "Hello 😂\n"d;

Here the canonical Unicode representation of your string

       H      e      l      l      o             😂     \n
    U+0048 U+0065 U+006C U+006C U+006F U+0020 U+1F602 U+000a

let's see how these 3 variable are represented in memory:

    utf8s : 48 65 6C 6C 6F 20 F0 9F 98 82 0a
11 char in memory using 11 bytes

    utf16s: 0048 0065 006C 006C 006F 0020 D83D DE02 000A
9 wchar in memory using 18 bytes

utf16s: 00000048 00000065 0000006C 0000006C 0000006F 000000200001F602 0000000A

8 dchar in memory using 32 bytes

As you can see, the most compact form is generally UTF-8, that'swhy it is the preferred encoding for Unicode.

UTF-16 is supported because of legacy support reason like it isused in the Windows API and also internally in Java.

UTF-32 has one advantage, in that it has a 1 to 1 mapping betweencodepoint and array index. In practice it is not that much of anadvantage as codepoints and characters are disjoint concepts.UTF-32 uses a lot of memory for practically no benefit (when youread in the forum about the big auto-decode error of D it islinked to this).

Re: How to print unicode characters (no library)?

Reply via email to