Re: Odd regexp behavior
On Wed, 26 Feb 2003 22:20:19 +0200, Jarkko Hietaniemi [EMAIL PROTECTED] said: A bug? Was seemingly broken still in 5.8.0, but 5.8.1-to-be seems to get this right. (I don't off-hand remember this particular kind of problem but there were some s/// fixes that might have helped.) I'll add this to the test suite so that it stays fixed. Just to give Perl::Repository::APC some real life exponation, I ran binsearchaperl against trunk and maint-5.8 branch. And the result is... The bug was fixed on the trunk in patch 18085, in maint-5.8 this was integrated in 18095. -- andreas
Re: Odd regexp behavior
[EMAIL PROTECTED] said: $ perl -e '$x = \x{2019}\nk; $x =~ s/(\S)\n(\S)/$1 $2/sg; print $x\n;' '= this denotes a \x{2019} followed by \n k $ perl -e $ perl -e '$x = b\nk; $x =~ s/(\S)\n(\S)/$1 $2/sg; print $x\n;' b k [snip] $ perl -e 'print ((\x{2019} =~ /\S/) . \n);' 1 This behavior certainly does seem to contradict expectations. I even thought that the third test might not be exactly equivalent to the first, so I tried this: $ perl -e '$x = \x{2019}; print x2019 matches \\S\n if ( $x =~ /\S/ );' x2019 matches \S But since perl provides many ways of doing the same thing (or at least trying to), there is an idiom that will produce the expected result: require 5.008; use Encode; $x = encode( utf8, \x{2019}\nk ); $x =~ s/(\S)\n(\S)/$1 $2/sg; print $x\n; __END__ __OUTPUT__ ' k Even in this case, I was puzzled as to why I got the expected behavior by using the encode() method this way, but not when I used decode() instead. (I should have expected it to be the other way around?) Go figure... Dave Graff
Re: Odd regexp behavior
Dear UTF-8 regular expression gurus: $ perl -e '$x = \x{2019}\nk; $x =~ s/(\S)\n(\S)/$1 $2/sg; print $x\n;' '= this denotes a \x{2019} followed by \n k $ perl -e '$x = b\nk; $x =~ s/(\S)\n(\S)/$1 $2/sg; print $x\n;' b k Any idea, why the Unicode apostrophe is not matched by a \S in the first case, whereas the 'b' is? A bug? Was seemingly broken still in 5.8.0, but 5.8.1-to-be seems to get this right. (I don't off-hand remember this particular kind of problem but there were some s/// fixes that might have helped.) I'll add this to the test suite so that it stays fixed. Also note that $ perl -e 'print ((\x{2019} =~ /\S/) . \n);' 1 so \x{2019} *does* match \S in principle ... odd. (Perl v5.6.0 built for i386-linux) Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: Odd regexp behavior
Looks like we may have some sort of bug here -- compare: # BAD: $ perl-5.8.0 -e '$x = \x{2019}\nk; $x =~ s/\S\n\S/. ./; print $x\n;' Wide character in print at -e line 1. รข = this is how my latin1 xterm reacts to \x{2019} k # NOT BAD: $ perl-5.8.0 -e '$x = \x{2019}\nk; $x =~ s/\S\s\S/. ./; print $x\n;' . . Dave Graff