At 11:14 AM 10/19/2001 -0400, James Mastros wrote:
>This is a first run at a patch to support the ord and chr opcodes.  It
>mainly, I'm afraid, serves as an example to show that we need to be able to
>transcode out of the native encoding; I have to special-case it several ways
>otherwise.

>Limitations
>  - Ord only works on native strings if they have 8 bit characters or INTVAL
>    sized characters.  (So utf16 probably won't work).
>  - Both chr and ord assume that the byteorder of a UTF32 string matches the
>    byteorder of an INTVAL.

Cool. I think, though, that we might want to want to push off ord and chr 
to the strings themselves, if only to deal with the vagaries of 
variable-length encodings. UTF-8 characters might be two or three (or four, 
or six...) bytes long, but are still a single code point.

It also means that the string encodings can decide whether they want to 
force composition or decomposition on their data, and we won't have to 
necessarily weld in any knowledge of encoding to the interpreter. We also 
need to put a "current default encoding/type" field into the interpreter so 
chr will Do The Right Thing for blocks marked "use utf8;" or "use big5 
qw(trad);" or "use shiftJIS;".

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to