-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, Apr 03, 2018 at 09:14:22PM +1200, Richard Hector wrote: > On 03/04/18 20:55, Darac Marjal wrote: > > If these things matter to you, it's better to convert from UTF-8 to > > Unicode, first. I tend to think of Unicode as an arbitrarily large code > > page. Each character maps to a number, but that number could be 1, 1000 > > or 500_000 (Unicode seems to be growing without might end in sight). > > Internally, you might store those code points as Integers or QUad Words > > or whatever you like. Only once you're ready to transfer the text to > > another process (print on screen, save to a file, stream across a > > network), do you convert the Unicode back into UTF-8. > > > > Basically, you consider UTF-8 to be a transfer-only format (like > > Base64). If you want to do anything non-trivial with it, decode it into > > Unicode. > > Eh? UTF-8 is an encoding of Unicode. You can't "convert UTF-8 to > Unicode" - it already is Unicode. You could convert it to another > encoding, eg UTF-16 or UTF-32. Perhaps UTF-32 is what you mean, being > fixed-width.
I think Darac was talking about UTF-32 [1], which is a fixed-width encoding of Unicode. Yes, Unicode is strictly speaking the abstract "mapping" between integers ("code points") and characters. A computer has no integers... What's curious is that there's no UTF-24 (although Unicode currently has all its code points below 2^21). That would make for a slightly more compact fixed-width encoding. I think these days fixed-width encodings are losing their charm a bit, since memory access is getting much more expensive than CPU power. Things might change once again when the Chinese dominate culturally, since UTF-8 plays its advantage only with ASCII dominated text. But perhaps then, another encoding will make more sense. Or just UTF-24 is born, for a 25% savings :-) Cheers [1] https://en.wikipedia.org/wiki/UTF-32 - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlrDST0ACgkQBcgs9XrR2kadzQCeO8N1Kjua/p0aOdfE8QQTvv6R PisAn0+0DbwWXb+cWYsUmMwqSqQN+BKQ =y+Fu -----END PGP SIGNATURE-----