[EMAIL PROTECTED] said:
> $ perl -e '$x = "\x{2019}\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
> ' <= this denotes a \x{2019} followed by \n
> k $ perl -e
>
> $ perl -e '$x = "b\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
> b k
>
> [snip]
>
> $ perl -e 'print (("\x{2019}" =~ /\S/) . "\n");'
> 1
This behavior certainly does seem to contradict expectations. I even
thought that the third test might not be exactly equivalent to the
first, so I tried this:
$ perl -e '$x = "\x{2019}"; print "x2019 matches \\S\n" if ( $x =~ /\S/ );'
x2019 matches \S
But since perl provides many ways of doing the same thing (or at least
trying to), there is an "idiom" that will produce the expected result:
require 5.008;
use Encode;
$x = encode( "utf8", "\x{2019}\nk" );
$x =~ s/(\S)\n(\S)/$1 $2/sg;
print "$x\n";
__END__
__OUTPUT__
' k
Even in this case, I was puzzled as to why I got the expected behavior
by using the "encode()" method this way, but not when I used "decode()"
instead. (I should have expected it to be the other way around?)
Go figure...
Dave Graff