Re: Odd regexp behavior

David Graff Wed, 26 Feb 2003 12:14:50 -0800

[EMAIL PROTECTED] said:
> $ perl -e '$x = "\x{2019}\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
> '    <= this denotes a \x{2019} followed by \n
> k $ perl -e
>
> $ perl -e '$x = "b\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
> b k 
> 
> [snip]
>
> $ perl -e 'print (("\x{2019}" =~ /\S/) . "\n");'
> 1


This behavior certainly does seem to contradict expectations.  I even 
thought that the third test might not be exactly equivalent to the 
first, so I tried this:

$ perl -e '$x = "\x{2019}"; print "x2019 matches \\S\n" if ( $x =~ /\S/ );'
x2019 matches \S


But since perl provides many ways of doing the same thing (or at least 
trying to), there is an "idiom" that will produce the expected result:

 require 5.008;

 use Encode;

 $x = encode( "utf8", "\x{2019}\nk" );
 $x =~ s/(\S)\n(\S)/$1 $2/sg;
 print "$x\n";

 __END__

 __OUTPUT__
 ' k

Even in this case, I was puzzled as to why I got the expected behavior
by using the "encode()" method this way, but not when I used "decode()"
instead. (I should have expected it to be the other way around?)
Go figure...

        Dave Graff

Re: Odd regexp behavior

Reply via email to