Re: regexps with multiple matches and extended ASCII characters

Jeff 'japhy' Pinyan Fri, 04 Jan 2002 07:38:01 -0800

On Jan 4, Birgit Kellner said:

>my $string = "!Rita 1983! and then some text and here is !Künne 1234!
>and !Kußmaul 2001!";
>while ($string =~ /!(\w+)\s(\d{4})!/gi) { print "$1 and $2\n";}
># prints "Rita and 1983"
>while ($string =~ /!(\C+)\s(\d{4})!/gi) { print "$1 and $2\n";}
># prints "Rita 1983! and then some text and here is !Künne 1234! and
>#!Kußmaul and 2001"
>
>And why is the regexp greedy with \C, but not with \w?


\C matches any byte.  \w matches a word character.  Your locale does not
say that u-umlaut is a word character.  Perhaps you should preface your
code with the following:

  use POSIX 'setlocale';
  use locale;

  setlocale( &POSIX::LC_ALL, "de" );

That, at the beginning of your code, works properly with the first while
loop.

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: regexps with multiple matches and extended ASCII characters

Reply via email to