Re: \w regular expressions unicode

Chas. Owens Wed, 22 Apr 2009 15:19:24 -0700

On Wed, Apr 22, 2009 at 18:12, Chas. Owens <chas.ow...@gmail.com> wrote:
> On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
>> Chas. Owens wrote:
>>>
>>> On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson <nore...@gunnar.cc>
>>> wrote:
>>> snip
>>>>>
>>>>> Yeah, it looks so. With "use utf8" (http://perldoc.perl.org/utf8.html)
>>>>> one
>>>>> can however make them parsed (decoded) (provided they are valid UTF-8).
>>>>
>>>> No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g.
>>>> variable
>>>> names or subroutine names.
>>>
>>> snip
>>>
>>> From perldoc utf8[1]:
>>>    Bytes in the source text that have their high-bit set will be
>>>    treated as being part of a literal UTF-X sequence. This includes
>>>    most literals such as identifier names, string constants, and
>>>    constant regular expression patterns.
>>>
>>> The utf8 pragma affects the whole file,
>>
>> Well, only the part of the file that is parsed after the
>>
>>    use utf8;
>>
>> statement, right?
> snip
>
> Hmm, I don't think it would reparse the whole file, but
> it does run in a BEGIN block...hmm, I must test it.
snip


#!/usr/bn/perl

use strict;

my $first;

BEGIN { $first = "é" };

my $next = "é";

use utf8;

my $last = "é";

print "first is ", utf8::is_utf8($first) ? "" : "not ", "UTF-8\n";
print "next is ", utf8::is_utf8($next) ? "" : "not ", "UTF-8\n";
print "last is ", utf8::is_utf8($last) ? "" : "not ", "UTF-8\n";

gives me

first is not UTF-8
next is not UTF-8
last is UTF-8

So I would say that it only takes affect for lines after it is used.

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: \w regular expressions unicode

Reply via email to