Re: Odd regexp behavior

2003-02-27 Thread Andreas J. Koenig
 On Wed, 26 Feb 2003 22:20:19 +0200, Jarkko Hietaniemi [EMAIL PROTECTED] said:

   A bug?  Was seemingly broken still in 5.8.0, but 5.8.1-to-be seems
   to get this right.  (I don't off-hand remember this particular kind
   of problem but there were some s/// fixes that might have helped.)
   I'll add this to the test suite so that it stays fixed.

Just to give Perl::Repository::APC some real life exponation, I ran
binsearchaperl against trunk and maint-5.8 branch. And the result
is...

The bug was fixed on the trunk in patch 18085, in maint-5.8 this was
integrated in 18095.

-- 
andreas


Re: Odd regexp behavior

2003-02-26 Thread David Graff

[EMAIL PROTECTED] said:
 $ perl -e '$x = \x{2019}\nk; $x =~ s/(\S)\n(\S)/$1 $2/sg; print $x\n;'
 '= this denotes a \x{2019} followed by \n
 k $ perl -e

 $ perl -e '$x = b\nk; $x =~ s/(\S)\n(\S)/$1 $2/sg; print $x\n;'
 b k 
 
 [snip]

 $ perl -e 'print ((\x{2019} =~ /\S/) . \n);'
 1

This behavior certainly does seem to contradict expectations.  I even 
thought that the third test might not be exactly equivalent to the 
first, so I tried this:

$ perl -e '$x = \x{2019}; print x2019 matches \\S\n if ( $x =~ /\S/ );'
x2019 matches \S


But since perl provides many ways of doing the same thing (or at least 
trying to), there is an idiom that will produce the expected result:

 require 5.008;

 use Encode;

 $x = encode( utf8, \x{2019}\nk );
 $x =~ s/(\S)\n(\S)/$1 $2/sg;
 print $x\n;

 __END__

 __OUTPUT__
 ' k

Even in this case, I was puzzled as to why I got the expected behavior
by using the encode() method this way, but not when I used decode()
instead. (I should have expected it to be the other way around?)
Go figure...

Dave Graff




Re: Odd regexp behavior

2003-02-26 Thread Jarkko Hietaniemi
 Dear UTF-8 regular expression gurus:
 
 $ perl -e '$x = \x{2019}\nk; $x =~ s/(\S)\n(\S)/$1 $2/sg; print $x\n;'
 '= this denotes a \x{2019} followed by \n
 k
 $ perl -e '$x = b\nk; $x =~ s/(\S)\n(\S)/$1 $2/sg; print $x\n;'
 b k
 
 Any idea, why the Unicode apostrophe is not matched by a \S in the first
 case, whereas the 'b' is?

A bug?  Was seemingly broken still in 5.8.0, but 5.8.1-to-be seems
to get this right.  (I don't off-hand remember this particular kind
of problem but there were some s/// fixes that might have helped.)
I'll add this to the test suite so that it stays fixed.

 Also note that
 
 $ perl -e 'print ((\x{2019} =~ /\S/) . \n);'
 1
 
 so \x{2019} *does* match \S in principle ... odd.
 
 (Perl v5.6.0 built for i386-linux)
 
 Markus
 
 -- 
 Markus Kuhn, Computer Lab, Univ of Cambridge, GB
 http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen


Re: Odd regexp behavior

2003-02-26 Thread David Graff

Looks like we may have some sort of bug here -- compare:

# BAD:
$ perl-5.8.0 -e '$x = \x{2019}\nk; $x =~ s/\S\n\S/. ./; print $x\n;'
Wide character in print at -e line 1.
รข  = this is how my latin1 xterm reacts to \x{2019}
k

# NOT BAD:
$ perl-5.8.0 -e '$x = \x{2019}\nk; $x =~ s/\S\s\S/. ./; print $x\n;'
. .


Dave Graff