On Oct 29, 2010, at 2:30 AM, Aristotle Pagaltzis wrote:

> * Dan Muey <d...@cpanel.net> [2010-10-28 21:55]:
>> For example, note the differences in output between a unicode
>> string and a byte string regarding character 257, as a unicode
>> string it is 257, as a byte string it is 196.
> 
> That is not what’s going on.
> 
>    $ perl -E'say ord "1234"'
>    49
> 
> When you pass a multi-character string to `ord`, you get the code
> point of the first character.

Thank you for clarifying what I was highlighting. 

> You are missing the rest of the bytes from the UTF-8 encoding.
> 
> You are losing data.

Thanks, I do understand that and appreciate you expounding it for me further. 
Allow me to explain why this question came up:

I am using Scalar::Quote on byte strings and it uses ord() to determine if it 
will use byte string grapheme notation (e.g. \xE3\x8A\xB7) or unicode string 
notation (e.g. \x{32B7}).

multivac:~ dmuey$ perl -MScalar::Quote=Q -E 'say Q("Perl is the ㊷™");'
"Perl is the \xe3\x8a\xb7\xe2\x84\xa2"
multivac:~ dmuey$ 

multivac:~ dmuey$ perl -E 'say "Perl is the \xe3\x8a\xb7\xe2\x84\xa2";'
Perl is the ㊷™
multivac:~ dmuey$

It appears to do what I need assuming 2 things:
 a) the string is a byte string 
     (e.g. perl -MScalar::Quote=Q -E 'say Q("Perl is the \x{32b7}\x{2122}");')
 b) we are not under "use utf8"
     (e.g. perl -MScalar::Quote=Q -E 'use utf8; say Q("Perl is the ㊷™");')

 I just wanted to verify that it's use of ord() in it's logic wouldn't 
unexpectedly  result in me getting back \x{32B7} under some weird circumstance 
I overlooked.

Thanks again, everyone. I really appreciate it!

--
Dan Muey

Reply via email to