Re: Unicode handling

nick Mon, 26 Mar 2001 12:23:43 -0800
Damien Neil <[EMAIL PROTECTED]> writes:
>> >So $c = chr(ord($c)) could change $c?  That seems odd.
>> 
>> It changes its _representation_ (e.g. from 0x45,ASCII to 0xC1,EBCDIC)
>> but not its "fundamental" 'LATIN CAPITAL LETTER A'-ness.
>> Then of course someone will want it to be the number 0x45 and not do 
>> that 'cos they are using chr/ord to mess with JPEG image data...
>> So there needs to be a 'binary' encoding which they can use.
>
>That doesn't seem to be what Dan was saying, however.  

And Dan is the one "in charge" on this list - so my perl5.7-ish view
may be wrong.

>It would make
>perfect sense to me for chr(ord($c)) to return $c in a different
>encoding.  (Assuming, of course, that $c is a single character.)
>
>Assume ord is dependent on the current default encoding.
>
>  use utf8; # set default encoding.
>  my $e : ebcdic = 'a';
>  my $u = chr(ord($e));
>
>If ord is dependent on the current default encoding, I would expect
>the above to leave the UTF-8 string "a" in $u.  This makes sense to
>me.

Good.

>
>If ord is dependent on the encoding of the string it gets, as Dan
>was saying, than ord($e) is 0x81, 

It it could still be 0x81 (from ebcdic) with the encoding carried 
along with the _number_ if we thought that worth the trouble.
(It isn't too bad for assignment but is far from clear 
   what 
     2 (ebcdic) * 0xA1(iso_8859_7)
might mean - perhaps we drop the tag if anything other the + or - happens.

>and $u is "\x81".  This seems
>strange.
>
>Hmm.  It suddenly occurs to me that I may have been misinterpreting:
>ord is dependent on both the encoding of its argument (to determine
>the logical character containing in that argument) and the current
>default encoding (to determine the value in the current character set
>representing that character).
>
>                         - Damien
-- 
Nick Ing-Simmons
Re: Unicode handling

Reply via email to