At 11:14 AM 10/19/2001 -0400, James Mastros wrote:
>This is a first run at a patch to support the ord and chr opcodes. It
>mainly, I'm afraid, serves as an example to show that we need to be able to
>transcode out of the native encoding; I have to special-case it several ways
>otherwise.
>Limitations
> - Ord only works on native strings if they have 8 bit characters or INTVAL
> sized characters. (So utf16 probably won't work).
> - Both chr and ord assume that the byteorder of a UTF32 string matches the
> byteorder of an INTVAL.
Cool. I think, though, that we might want to want to push off ord and chr
to the strings themselves, if only to deal with the vagaries of
variable-length encodings. UTF-8 characters might be two or three (or four,
or six...) bytes long, but are still a single code point.
It also means that the string encodings can decide whether they want to
force composition or decomposition on their data, and we won't have to
necessarily weld in any knowledge of encoding to the interpreter. We also
need to put a "current default encoding/type" field into the interpreter so
chr will Do The Right Thing for blocks marked "use utf8;" or "use big5
qw(trad);" or "use shiftJIS;".
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk