On Oct 29, 2010, at 2:30 AM, Aristotle Pagaltzis wrote: > * Dan Muey <d...@cpanel.net> [2010-10-28 21:55]: >> For example, note the differences in output between a unicode >> string and a byte string regarding character 257, as a unicode >> string it is 257, as a byte string it is 196. > > That is not what’s going on. > > $ perl -E'say ord "1234"' > 49 > > When you pass a multi-character string to `ord`, you get the code > point of the first character.
Thank you for clarifying what I was highlighting. > You are missing the rest of the bytes from the UTF-8 encoding. > > You are losing data. Thanks, I do understand that and appreciate you expounding it for me further. Allow me to explain why this question came up: I am using Scalar::Quote on byte strings and it uses ord() to determine if it will use byte string grapheme notation (e.g. \xE3\x8A\xB7) or unicode string notation (e.g. \x{32B7}). multivac:~ dmuey$ perl -MScalar::Quote=Q -E 'say Q("Perl is the ㊷™");' "Perl is the \xe3\x8a\xb7\xe2\x84\xa2" multivac:~ dmuey$ multivac:~ dmuey$ perl -E 'say "Perl is the \xe3\x8a\xb7\xe2\x84\xa2";' Perl is the ㊷™ multivac:~ dmuey$ It appears to do what I need assuming 2 things: a) the string is a byte string (e.g. perl -MScalar::Quote=Q -E 'say Q("Perl is the \x{32b7}\x{2122}");') b) we are not under "use utf8" (e.g. perl -MScalar::Quote=Q -E 'use utf8; say Q("Perl is the ㊷™");') I just wanted to verify that it's use of ord() in it's logic wouldn't unexpectedly result in me getting back \x{32B7} under some weird circumstance I overlooked. Thanks again, everyone. I really appreciate it! -- Dan Muey