On Oct 28, 2010, at 5:27 PM, Michael Ludwig wrote: > Dan Muey schrieb am 28.10.2010 um 14:54 (-0500): > >> Am I correct in thinking that the only way to get ord() to return a >> value over 256 is to send the character as a Unicode string instead of >> a byte string? > > Yes. > >> In other words, is there any character that will make ord() return >> over 256 when passed in as a byte string? > > If you pass a character as a byte string, then it's a byte string of 8 > bits per byte, and the maximum for a byte is 255. > >> For example, note the differences in output between a unicode string >> and a byte string regarding character 257, as a unicode string it is >> 257, as a byte string it is 196. > > Yes. > > perl -Mutf8 -lwe 'print ord "Я"' # 1071 > perl -lwe 'print ord "Я"' # 208
Thanks for all of that Michael, now I can rest easier! An educated assumption is better than a anecdotal guess. >> The reason this is relevant is that on a given project I am using >> byte-strings-only for consistency and some encoders (i.e. >> Scalar::Quote::Q() )will change from >> bytes-string-friendly-grapheme-cluster notation (e.g. \xE3\x8A\xB7) >> to unicode-string-notation (e.g. \x{32B7}) and I want to be sure I >> always use data that gets me the former rather than the latter :) > > Well, if you don't need character operations, it might work for you. > Make sure to track whether or not your data is already encoded, and also > to use the correct encoding. > > -- > Michael Ludwig Yeah, it is a pretty strict environment where encoding is strictly handled and always utf-8, the code won't ever `use utf8`, and the strings in question will only be output (i.e. no character operations). Again thank you very much! -- Dan Muey