Re: Am I correct in thinking that the only way to get ord() to return a value over 256 is to send the character as a Unicode string instead of a byte string?

Dan Muey Thu, 28 Oct 2010 15:59:25 -0700

On Oct 28, 2010, at 5:27 PM, Michael Ludwig wrote:

> Dan Muey schrieb am 28.10.2010 um 14:54 (-0500):
> 
>> Am I correct in thinking that the only way to get ord() to return a
>> value over 256 is to send the character as a Unicode string instead of
>> a byte string?
> 
> Yes.
> 
>> In other words, is there any character that will make ord() return
>> over  256 when passed in as a byte string?
> 
> If you pass a character as a byte string, then it's a byte string of 8
> bits per byte, and the maximum for a byte is 255.
> 
>> For example, note the differences in output between a unicode string
>> and a byte string regarding character 257, as a unicode string it is
>> 257, as a byte string it is 196.
> 
> Yes.
> 
>  perl -Mutf8 -lwe 'print ord "Я"'  # 1071
>  perl        -lwe 'print ord "Я"'  #  208


Thanks for all of that Michael, now I can rest easier! An educated assumption 
is better than a anecdotal guess.

>> The reason this is relevant is that on a given project I am using
>> byte-strings-only for consistency and some encoders (i.e.
>> Scalar::Quote::Q() )will change from
>> bytes-string-friendly-grapheme-cluster notation (e.g. \xE3\x8A\xB7)
>> to unicode-string-notation (e.g. \x{32B7}) and I want to be sure I
>> always use data that gets me  the former rather than the latter :)
> 
> Well, if you don't need character operations, it might work for you.
> Make sure to track whether or not your data is already encoded, and also
> to use the correct encoding.
> 
> -- 
> Michael Ludwig

Yeah, it is a pretty strict environment where encoding is strictly handled and 
always utf-8, the code won't ever `use utf8`, and the strings in question will 
only be output (i.e. no character operations).

Again thank you very much!

--
Dan Muey

Re: Am I correct in thinking that the only way to get ord() to return a value over 256 is to send the character as a Unicode string instead of a byte string?

Reply via email to