Re: String Theory

Rod Adams Sat, 26 Mar 2005 12:12:59 -0800

Chip Salzenberg wrote:

Would this be a good time to ask for explanation for C<str> being never Unicode, while C<Str> is always Unicode, thus leading to an inability to box a non-Unicode string?

That's not quite it. C<str> is a forced Unicode level of "Bytes", with encoding "raw", which happens to not have any Unicode semantics attached to it.

And might I also ask why in Perl 6 (if not Parrot) there seems to be no type support for strings with known encodings which are not subsets of Unicode?

There are two different things to consider at the P6 level: Unicode level, and encoding. Level is one of Bytes, CodePoints, Graphemes, or Language Dependent Characters (aka LChars aka Chars). It's the way of determining what a "character" means. This can all get a bit confusing for people who only speak English, since our language happens to map nicely into all the levels at once, with no "merging of multiple code points into a grapheme" monkey business.

Encoding is how a particular string gets mapped into bits. I see P6 as needing to support all the common encodings (raw, ASCII, UTF\d+[be|le]?, UCS\d+) "out of the box", but then allowing the user to add more as they see fit (EBCDIC, etc).

Level and Encoding can be mixed and matched independently, except for the combos that don't make any sense.

-- Rod Adams

Re: String Theory

Reply via email to