Strake dixit:

>In UTF-8 the maximum encoded character length is 6 bytes [1]

Right, but the largest codepoint in Unicode is U-0001FFFF,
which is �: F0 9F BF BF in UTF-8.

Most things are in the BMP anyway – for example, the distance
between the lowest and highest encoded glyph in an X11 font
is roughly 2¹⁶, so you’ll end up using up to 3 octets normally,
but at additional cost for some operations (glyph width, and,
though very minor, movement across characters).

Actually, wint_t is the standard type to use for this. One
could also use wchar_t but that may be an unsigned short on
some systems, or a signed or unsigned int. uint32_t makes
sense, if one doesn’t want to go after the possible savings
on 16-bit Unicode systems, since signed integers in C are
almost Undefined anyway…

bye,
//mirabilos
-- 
15:39⎜«mika:#grml» mira|AO: "mit XFree86® wär’ das nicht passiert" - muhaha
15:48⎜<thkoehler:#grml> also warum machen die xorg Jungs eigentlich alles
kaputt? :)    15:49⎜<novoid:#grml> thkoehler: weil sie als Kinder nie den
gebauten Turm selber umschmeissen durften?      -- ~/.Xmodmap wonders…

Reply via email to