Dan Sugalski wrote:
> 
>...
> 
> Make sense? Parrot's set up such that the libraries to handle a particular
> kind of data (EBCDIC, Unicode, Shift-JIS, Big5/traditional, Finnish ASCII)
> will be dynamically loadable so we can add them after the fact and you
> don't have to pay the memory price.

I'd suggest that you document character set and encoding totally
separately. Character set is something that is visible to the Perl
programmer. Encoding is *only* an implementation issue that should be
invisible to the programmer doing ordinary string manipulations.

I think that the extra complexity of dealing with multiple character
sets has more cost than benefit. What will chr(10203) return? If I do a
grep for chr(10203) am I looking for the 10203'th character in the
character set of the data or the character that is logically the same as
the 10203'th character of the default character set on my platform? Or
the 10203'th character in Unicode?

> We will, FWIW, transcode to Unicode in those cases where we have to deal
> with data in multiple encodings and shouldn't just throw an error. While
> LCDs are bad, they're better than nothing...

In the paragraph above I would not use the word transcode. I would say
"strings conforming to multiple character sets are combined according to
Unicode semantics." Once again, I don't think the user cares about your
internal encoding. Python could switch to internal UTF-8 tomorrow and
nobody would notice except for performance implications. Will Perl 6 use
a bit, a byte or an integer to represent boolean values? Will Perl
programmers (not implementors) care? Probably only if the use "pack"
(which of course has Unicode equivalents)!

As far as performance implications: if you use variable width encodings
but do integer indexing in terms of *characters* then I think you're in
for a world of performance pain. On the other hand, if you do indexing
in terms of bytes, you're complicating the user model for a performance
gain. That's why I think that the only sane solution is fixed-width
encodings. That doesn't mean that you have to choose a particular width
-- it just means that every string has some width.

 Paul Prescod

Reply via email to