Re: Why UTF-8/16 character encodings?

Dmitry Olshansky Sat, 25 May 2013 13:00:24 -0700

25-May-2013 23:51, Joakim пишет:

On Saturday, 25 May 2013 at 19:03:53 UTC, Dmitry Olshansky wrote:

You can map a codepage to a subset of UCS :)
That's what they do internally anyway.
If I take you right you propose to define string as a header that
denotes a set of windows in code space? I still fail to see how that
would scale see below.

Something like that.  For a multi-language string encoding, the header
would contain a single byte for every language used in the string, along
with multiple index bytes to signify the start and finish of every run
of single-language characters in the string. So, a list of languages and
a list of pure single-language substrings.  This is just off the top of
my head, I'm not suggesting it is definitive.


Runs away in horror :) It's mess even before you've got to details.

Another point about using sometimes a 2-byte encoding - welcome to thenice world of BigEndian/LittleEndian i.e. the very trap UTF-16 hasstepped into.


--
Dmitry Olshansky

Re: Why UTF-8/16 character encodings?

Reply via email to