Chip Salzenberg wrote:

Would this be a good time to ask for explanation for C<str> being
never Unicode, while C<Str> is always Unicode, thus leading to an
inability to box a non-Unicode string?


That's not quite it. C<str> is a forced Unicode level of "Bytes", with encoding "raw", which happens to not have any Unicode semantics attached to it.

And might I also ask why in Perl 6 (if not Parrot) there seems to be
no type support for strings with known encodings which are not subsets
of Unicode?


There are two different things to consider at the P6 level: Unicode level, and encoding. Level is one of Bytes, CodePoints, Graphemes, or Language Dependent Characters (aka LChars aka Chars). It's the way of determining what a "character" means. This can all get a bit confusing for people who only speak English, since our language happens to map nicely into all the levels at once, with no "merging of multiple code points into a grapheme" monkey business.

Encoding is how a particular string gets mapped into bits. I see P6 as needing to support all the common encodings (raw, ASCII, UTF\d+[be|le]?, UCS\d+) "out of the box", but then allowing the user to add more as they see fit (EBCDIC, etc).

Level and Encoding can be mixed and matched independently, except for the combos that don't make any sense.

-- Rod Adams




Reply via email to