Dear UTF-8 regular expression gurus: $ perl -e '$x = "\x{2019}\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";' ' <= this denotes a \x{2019} followed by \n k $ perl -e '$x = "b\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";' b k
Any idea, why the Unicode apostrophe is not matched by a \S in the first case, whereas the 'b' is? Also note that $ perl -e 'print (("\x{2019}" =~ /\S/) . "\n");' 1 so \x{2019} *does* match \S in principle ... odd. (Perl v5.6.0 built for i386-linux) Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__