Re: Unicode handling

Damien Neil Mon, 26 Mar 2001 10:22:43 -0800
On Mon, Mar 26, 2001 at 06:16:00PM +0000, [EMAIL PROTECTED] wrote:
> Damien Neil <[EMAIL PROTECTED]> writes:
> >On Mon, Mar 26, 2001 at 11:32:46AM -0500, Dan Sugalski wrote:
> >> At 05:09 PM 3/23/2001 -0800, Damien Neil wrote:
> >> >So the results of ord are dependent on a global setting for "current
> >> >character set" or some such, not on the encoding of the string that
> >> >is passed to it?
> >> 
> >> Nope, ord is dependent on the string it gets, as those strings know what 
> >> their encoding is. chr is the one dependent on the current default encoding.
> >
> >So $c = chr(ord($c)) could change $c?  That seems odd.
> 
> It changes its _representation_ (e.g. from 0x45,ASCII to 0xC1,EBCDIC)
> but not its "fundamental" 'LATIN CAPITAL LETTER A'-ness.
> Then of course someone will want it to be the number 0x45 and not do 
> that 'cos they are using chr/ord to mess with JPEG image data...
> So there needs to be a 'binary' encoding which they can use.

That doesn't seem to be what Dan was saying, however.  It would make
perfect sense to me for chr(ord($c)) to return $c in a different
encoding.  (Assuming, of course, that $c is a single character.)

Assume ord is dependent on the current default encoding.

  use utf8; # set default encoding.
  my $e : ebcdic = 'a';
  my $u = chr(ord($e));

If ord is dependent on the current default encoding, I would expect
the above to leave the UTF-8 string "a" in $u.  This makes sense to
me.

If ord is dependent on the encoding of the string it gets, as Dan
was saying, than ord($e) is 0x81, and $u is "\x81".  This seems
strange.

Hmm.  It suddenly occurs to me that I may have been misinterpreting:
ord is dependent on both the encoding of its argument (to determine
the logical character containing in that argument) and the current
default encoding (to determine the value in the current character set
representing that character).

                         - Damien
Re: Unicode handling

Reply via email to