Peter Dintelmann <[EMAIL PROTECTED]> writes:
>When using the \N{...} notation from the charnames pragma one can easily
>run into a bit of trouble when the bytes pragma is also in effect.
>
>    $ perl -Mcharnames=:full -Mbytes -le 'print ord "\N{WHITE SMILING
>FACE}"'
>    Character 0x263a with name 'WHITE SMILING FACE' is above 0xFF at -e line
>1
>    Propagated at -e line 1, within string
>    Execution of -e aborted due to compilation errors.
>
>I expected ord() to return the numeric (226) of the first byte ("\342") of
>the character ("\342\230\272").

There is no such thing as 'the character "\342\230\272"'
that is 3 characters.

'use bytes' means limit characters to range 0..0xff.
So you cannot put 0x263a in one.

>
>It is not clear to me why charnames checks for "use bytes" to be in
>effect because in this case a string is anyway viewed as a sequence
>of bytes by the functions affected by the bytes pragma.

Quite and you can't put 0x263a in a byte so what you get is 
either (0x263a % 256) i.e. 58 
or a sequence of bytes "in some encoding". 

Which encoding do you want? perl uses UTF-8 internally on ASCII-oid 
machines and UTF-EBCDIC on EBCDIC machines if character is bigger
than 0xFF. But if it is less than 0xff the char is used directly.
So given 0xC0 is that the character 0xC0 or the start of and encoded 
character ? How does your 'use bytes' code know?

>
>Since I am not familar with all the details and implications involved
>with the charnames pragma I do not know in which places the check for
>"use bytes" becomes important. Can anyone give me a short explanation
>please?

use bytes;

was meant for making absoultely sure legacy code that lived in pre-unicode
world behaves that way.

For perl5.6.* which has weak Unicode support there were some "tricks"
that worked by relying on how perl's internals represented things.

For perl5.8 it is better to avoid 'use bytes' and use Encode module 
instead.




>
>TIA,
>
>    Peter

Reply via email to