Hi everyone,
I've run into problems matching the regex [^\s] on RedHat 8/9 and the version of perl shipped with it (5.8.0). I've googled around and am aware that there are some problems with UTF-8 on this platform.
I'm trying to write a script that will work with this version and earlier versions of Perl (I can't install a new version as I'm sending out scripts to people who won't want to do this).
The problem: ------------ Given the string: $_ = "%define pfx x"; The regex: m,^%define\s+([^\s]+),;
Does not match on RH8/9 unless you change the LANG environment varible to a non-UTF-8 entry.
For some reason, the pragma: no utf8; doesn't seem to make any difference.
I can get it to work by changing the pattern to: m,^%define\s+([\S]+),; but this is not what I want because I have legacy scripts that I can't easily change. Furthermore, I want to use patterns like: [^\s/] (e.g more than one negated character type).
I found a work around. If I change the start-up line to include LANG=C, it works:
eval 'LANG=C exec perl -w -S $0 ${1+"$@"}' if $running_under_some_shell;
I've attached a test script that shows the problem (remove the LANG=C to make it break).
Question: --------- Does anyone know a better way of working around this problem? (e.g. getting 'no utf8;' to work.
TIA, Stuart
eval 'LANG=C exec perl -w -S $0 ${1+"$@"}' if $running_under_some_shell; $running_under_some_shell = 0;
# This doesn't make any difference ? #no utf8; # The pattern to match in $_ = "%define pfx x"; # write the file to a temp file and read back in $tmpfile = "/tmp/trash991"; open F, ">$tmpfile" or die; print F $_; close F; $/ = undef; open F, $tmpfile or die; $_ = <F>; close F; unlink $tmpfile; # bad on Perl 5.8.0 with LANG set to any UTF-8 m,^%define\s+([^\s]+),; # bad on 5.6 # m,^%define\s+([^\p{IsSpace}]+),; # bad on 5.6 # m,^%define\s+([\P{IsSpace}]+),; # okay on all but don't understand the difference #m,^%define\s+([\S]+),; print "'$1'\n";