I found what I believe to be a failure of grep to match something it
should. It appears that the use of a backreference invalidates the
existence of an --ignore-case switch. I get the same (unexpected)
results in GNU grep 2.4.2, 2.5, and 2.5.1 on Solaris 8 and on grep 2.5
on Mac OSX 10.3.9. The version of grep from Sun in Solaris 8 does work
as expected.
To make sure this hasn't been recently fixed, I downloaded
ftp://ftp.gnu.org/gnu/grep/grep-2.5.1a.tar.gz
and built grep from that.
manresa 181: uname -a
SunOS manresa 5.8 Generic_108528-24 sun4u sparc SUNW,Sun-Blade-100
manresa 182: src/grep --version
grep (GNU grep) 2.5.1
Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
manresa 183: cat /tmp/test
A abcd abcd
manresa 184: src/grep --ignore-case 'a \(abcd\) \1' /tmp/test
manresa 185: src/grep --ignore-case 'A \(abcd\) \1' /tmp/test
A abcd abcd
manresa 186: src/grep --ignore-case 'A \(ABCD\) \1' /tmp/test
manresa 187: src/grep --ignore-case 'A \(ABCD\) ABCD' /tmp/test
A abcd abcd
manresa 188: src/grep --ignore-case 'a \(ABCD\) ABCD' /tmp/test
A abcd abcd
It is my belief that all of these calls to grep should have returned the
line from the file.
The grep distributed with Solaris 8 acts as I expect
manresa 192: /usr/bin/grep -i 'a \(abcd\) \1' /tmp/test
A abcd
abcd
Another test case:
manresa 51: cat /tmp/test2
a abcd aBcD
manresa 52: src/grep --ignore-case 'a \(abcd\) \1' /tmp/test2
manresa 53: /usr/bin/grep -i 'a \(abcd\) \1' /tmp/test2
a abcd aBcD
In this case, however, the documentation is somewhat ambiguous.
--ignore-case is documented as "Ignore case distinctions in both the
PATTERN and the input files." A backreference is documented as "matches
the substring previously matched by the Nth parenthesized subexpression
of the regular expression." It isn't clear whether a backreference
must match the substring exactly, or possibly match it, ignoring
case. It appears that at least the grep used in Solaris matched the
substring, ignoring case if --ignore-case is also given. I would argue
that this is the correct behavior as the --ignore-case indicates to
ignore the case in the input files. However this is resolved, the
documentation should clarify what it does.
It appears that GNU emacs 21.12.1 (on Mac OS X) does regular expression
matching as I expect.
When case-fold-search = t, the expression
(search-forward-regexp "a \\(abcd\\) \\1")
will match each of the lines
a abcd
abcd
A abcd
abcd
a abcd
aBcD