NotFound wrote:
>> * Unicode isn't necessarily universal, or might stop to be so in future.
>> If a character is not representable in Unicode, and you chose to use
>> Unicode for everything, you're screwed
> 
> There are provision for private usage codepoints.

If we use them in parrot, we can't use them in HLLs, right? do we really
want that?

>> * related to the previous point, some other character encodings might
>> not have a lossless round-trip conversion.
> 
> Did we need that? The intention is that strings are stored in the
> format wanted and not recoded without a good reason.

But if you can't work with non-Unicode text strings, you have to convert
them, and in the process you possibly lose information. That's why we
want to enable text strings with non-Unicode semantics.

>>> need a translation table. The only point to solve is we need some
>>> special way to work with fixed-8 with no intended character
>>> representation.
>> Introducing the "no character set" character set is just a special case
>> of arbitrary character sets. I see no point in using the special case
>> over the generic one.
> 
> Because is special, and we need to deal with his speciality in any
> case. Just concatenating it with any other is plain wrong. Just
> treating it as iso-8859-1 is not taken in as plain binary at all.

Just as it is plain wrong to concatenate strings in an two
non-compatible character sets (unless you store the strings as trees,
and have each substring carry both its encoding and charset information.
But then you still can't compare them, for example).

> But the main point is that the encoding issues is complicated enough
> even inside unicode, and adding another layer of complexity will make
> it worse.

I think that distinguishing incompatible character sets is no harder
than distinguishing text and binary strings. It's not another layer,
it's just a layer used in a more general way.

Moritz

-- 
Moritz Lenz
http://moritz.faui2k3.org/ |  http://perl-6.de/

Reply via email to