> Dear UTF-8 regular expression gurus: > > $ perl -e '$x = "\x{2019}\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";' > ' <= this denotes a \x{2019} followed by \n > k > $ perl -e '$x = "b\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";' > b k > > Any idea, why the Unicode apostrophe is not matched by a \S in the first > case, whereas the 'b' is?
A bug? Was seemingly broken still in 5.8.0, but 5.8.1-to-be seems to get this right. (I don't off-hand remember this particular kind of problem but there were some s/// fixes that might have helped.) I'll add this to the test suite so that it stays fixed. > Also note that > > $ perl -e 'print (("\x{2019}" =~ /\S/) . "\n");' > 1 > > so \x{2019} *does* match \S in principle ... odd. > > (Perl v5.6.0 built for i386-linux) > > Markus > > -- > Markus Kuhn, Computer Lab, Univ of Cambridge, GB > http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen