25-May-2013 23:51, Joakim пишет:
On Saturday, 25 May 2013 at 19:03:53 UTC, Dmitry Olshansky wrote:
You can map a codepage to a subset of UCS :)
That's what they do internally anyway.
If I take you right you propose to define string as a header that
denotes a set of windows in code space? I still fail to see how that
would scale see below.
Something like that.  For a multi-language string encoding, the header
would contain a single byte for every language used in the string, along
with multiple index bytes to signify the start and finish of every run
of single-language characters in the string. So, a list of languages and
a list of pure single-language substrings.  This is just off the top of
my head, I'm not suggesting it is definitive.


Runs away in horror :) It's mess even before you've got to details.

Another point about using sometimes a 2-byte encoding - welcome to the nice world of BigEndian/LittleEndian i.e. the very trap UTF-16 has stepped into.

--
Dmitry Olshansky

Reply via email to