On 03/04/18 20:55, Darac Marjal wrote:
> If these things matter to you, it's better to convert from UTF-8 to
> Unicode, first. I tend to think of Unicode as an arbitrarily large code
> page. Each character maps to a number, but that number could be 1, 1000
> or 500_000 (Unicode seems to be growing without might end in sight).
> Internally, you might store those code points as Integers or QUad Words
> or whatever you like. Only once you're ready to transfer the text to
> another process (print on screen, save to a file, stream across a
> network), do you convert the Unicode back into UTF-8.
> 
> Basically, you consider UTF-8 to be a transfer-only format (like
> Base64). If you want to do anything non-trivial with it, decode it into
> Unicode.

Eh? UTF-8 is an encoding of Unicode. You can't "convert UTF-8 to
Unicode" - it already is Unicode. You could convert it to another
encoding, eg UTF-16 or UTF-32. Perhaps UTF-32 is what you mean, being
fixed-width.

Richard


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to