NotFound wrote: >> * Unicode isn't necessarily universal, or might stop to be so in future. >> If a character is not representable in Unicode, and you chose to use >> Unicode for everything, you're screwed > > There are provision for private usage codepoints.
If we use them in parrot, we can't use them in HLLs, right? do we really want that? >> * related to the previous point, some other character encodings might >> not have a lossless round-trip conversion. > > Did we need that? The intention is that strings are stored in the > format wanted and not recoded without a good reason. But if you can't work with non-Unicode text strings, you have to convert them, and in the process you possibly lose information. That's why we want to enable text strings with non-Unicode semantics. >>> need a translation table. The only point to solve is we need some >>> special way to work with fixed-8 with no intended character >>> representation. >> Introducing the "no character set" character set is just a special case >> of arbitrary character sets. I see no point in using the special case >> over the generic one. > > Because is special, and we need to deal with his speciality in any > case. Just concatenating it with any other is plain wrong. Just > treating it as iso-8859-1 is not taken in as plain binary at all. Just as it is plain wrong to concatenate strings in an two non-compatible character sets (unless you store the strings as trees, and have each substring carry both its encoding and charset information. But then you still can't compare them, for example). > But the main point is that the encoding issues is complicated enough > even inside unicode, and adding another layer of complexity will make > it worse. I think that distinguishing incompatible character sets is no harder than distinguishing text and binary strings. It's not another layer, it's just a layer used in a more general way. Moritz -- Moritz Lenz http://moritz.faui2k3.org/ | http://perl-6.de/