Re: String Theory

Larry Wall Sat, 26 Mar 2005 19:49:13 -0800

On Fri, Mar 25, 2005 at 07:38:10PM -0000, Chip Salzenberg wrote:
: Would this be a good time to ask for explanation for C<str> being
: never Unicode, while C<Str> is always Unicode, thus leading to an
: inability to box a non-Unicode string?


As Rod said, "str" is just a way of declaring a byte buffer, for which
"characters", "graphemes", "codepoints", and "bytes" all mean the
same thing.  Conversion or coercion to more abstract types must be
specified explicitly.

: And might I also ask why in Perl 6 (if not Parrot) there seems to be
: no type support for strings with known encodings which are not subsets
: of Unicode?

Well, because the main point of Unicode is that there *are* no encodings
that cannot be considered subsets of Unicode.  Perl 6 considers
itself to have abstract Unicode semantics regardless of the underlying
representation of the data, which could be Latin-1 or Big5 or UTF-76.

That being said, abstract Unicode itself has varying levels of
abstraction, which is how we end up with .codes, .graphs, and .chars
in addition to .bytes.

Larry

Re: String Theory

Reply via email to