RE: utf8 pragma, lexical scope

Jan Dubois Thu, 09 Sep 2010 13:13:33 -0700

On Thu, 09 Sep 2010, Michael Ludwig wrote:
> 
> What does not work, however, is to have a variable $käse under utf8
> and then try to refer to it from inside a "no utf8" block, using either
> encoding. Without the utf8 pragma, identifiers are not allowed to have
> funny characters. (Yes, it was a stupid exercise.)


The Perl parser is internally not UTF8-clean, so I would recommend not
to use non-ASCII characters in variable names for now, even if it looks
like it mostly works under "utf8".

>From perltodo.pod:

| =head2 Properly Unicode safe tokeniser and pads.
|
| The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
| variable names are stored in stashes as raw bytes, without the utf-8 flag
| set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
| tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
| source filters.  All this could be fixed.

Cheers,
-Jan

RE: utf8 pragma, lexical scope

Reply via email to