Re: case mangling and binary strings

Sam Ruby Tue, 02 Nov 2004 10:43:16 -0800

Dan Sugalski wrote:

At 1:16 PM -0500 11/2/04, Sam Ruby wrote:
Dan Sugalski wrote:
At 5:43 PM +0000 11/2/04, Nicholas Clark wrote:
On Tue, Nov 02, 2004 at 12:35:26PM -0500, Sam Ruby wrote:
However, str has an upper() method defined on it. The way it operates is to take the range of bytes that correspond to us-ascii and perform a us-ascii uppercase on them. The remaining bytes are left alone.
I'd prefer parrot not to do that, on the basis that Perl 5 supports EBCDIC platforms, and I feel parrot should too, by being completely character encoding agnostic. [Well, maybe better described as "atheist" :-) ]
I think I'd agree. Besides, it also means that we'd mis-mangle Leo's name if we upcased a binary version of it and that just doesn't seem right.

I'll make mangling on binary data throw an optional exception and otherwise leave the string alone. And I'll also make sure we have at least a binary and ASCII charset checked in to start with. I might do Latin-1 as well if I'm feeling adventuresome and it's easy enough. (It's a good thing I don't have the Ora CJKV book easily at hand or I might take a shot at Shift-JIS and that'd blow a day or two... :)
I'm not clear what you intend by "get strings working right", but I take it that it involves ditching ICU. *shrug*
Making ICU optional, at least. It's too problematic on too many platforms, and just turns into a big headache. It seemed like a good idea at the time, and while it's still better than most of the alternatives that doesn't, unfortunately, make it good.

I expect I'll put together a Unicode charset that uses ICU to do its thing, and go from there. We certainly need Unicode support, so it's not like we can't do it. (And we still don't have a better option, unfortunately)


Restating my vote, then.

I don't care if Parrot uses ICU on any platform.

I do care that Parrot supports utf-8 on every platform.

- Sam Ruby

Re: case mangling and binary strings

Reply via email to