Werner LEMBERG <[EMAIL PROTECTED]> writes:
> I suggest to add that `\?', `\+', and `\|' should not be used in sed
> expressions
Thanks for suggesting that. The problem is a bit more general, so I
installed the following:
2005-12-12 Paul Eggert <[EMAIL PROTECTED]>
* doc/autoconf.texi (Limitations of Usual Tools):
Mention which characters can be escaped with \ in portable regular
expressions used in grep, sed, expr. Mention the leading ^ problem
with expr. Clean up some confusing wording. Mention which
grep options are portable.
--- autoconf.texi 2 Dec 2005 19:19:23 - 1.935
+++ autoconf.texi 12 Dec 2005 18:46:51 - 1.936
@@ -11891,6 +11891,10 @@ replacement @code{grep -E}. Also, some
not work on long input lines. To work around these problems, invoke
@code{AC_PROG_EGREP} and then use @code{$EGREP}.
+Portable extended regular expressions should use @samp{\} only to escape
+characters in the string @samp{$()[EMAIL PROTECTED]|}. For example, @[EMAIL
PROTECTED]
+is not portable, even though it typically matches @[EMAIL PROTECTED]
+
The empty alternative is not portable, use @samp{?} instead. For
instance with Digital Unix v5.0:
@@ -11945,8 +11949,15 @@ Avoid this portability problem by avoidi
@item @command{expr} (@samp{:})
@c
@prindex @command{expr}
-Don't use @samp{\?}, @samp{\+} and @samp{\|} in patterns, as they are
-not supported on Solaris.
+Portable @command{expr} regular expressions should use @samp{\} to
+escape only characters in the string @samp{$()[EMAIL PROTECTED]@}}.
+For example, alternation, @samp{\|}, is common but Posix does not
+require its support, so it should be avoided in portable scripts.
+Similarly, @samp{\+} and @samp{\?} should be avoided.
+
+Portable @command{expr} regular expressions should not begin with
[EMAIL PROTECTED] Patterns are automatically anchored so leading @samp{^} is
+not needed anyway.
The Posix standard is ambiguous as to whether
@samp{expr 'a' : '\(b\)'} outputs @samp{0} or the empty string.
@@ -12045,6 +12056,12 @@ while @acronym{GNU} @command{find} repor
@item @command{grep}
@c -
@prindex @command{grep}
+Portable scripts can rely on the @command{grep} options @option{-c},
[EMAIL PROTECTED], @option{-n}, and @option{-v}, but should avoid other
+options. For example, don't use @option{-w}, as Posix does not require
+it and Irix 6.5.16m's @command{grep} does not support it.
+
+Some of the options required by Posix are not portable in practice.
Don't use @samp{grep -q} to suppress output, because many @command{grep}
implementations (e.g., Solaris) do not support @option{-q}.
Don't use @samp{grep -s} to suppress output either, because Posix
@@ -12070,12 +12087,17 @@ grep 'foo
bar' in.txt
@end example
-Alternation, @samp{\|}, is common but Posix does not require its
+Traditional @command{grep} implementations (e.g., Solaris) do not
+support the @option{-E} or @samp{-F} options. To work around these
+problems, invoke @code{AC_PROG_EGREP} and then use @code{$EGREP}, and
+similarly for @code{AC_PROG_FGREP} and @code{$FGREP}.
+
+Portable @command{grep} regular expressions should use @samp{\} only to
+escape characters in the string @samp{$()[EMAIL PROTECTED]@}}. For example,
+alternation, @samp{\|}, is common but Posix does not require its
support in basic regular expressions, so it should be avoided in
portable scripts. Solaris @command{grep} does not support it.
-
-Don't rely on @option{-w}, as Irix 6.5.16m's @command{grep} does not
-support it.
+Similarly, @samp{\+} and @samp{\?} should be avoided.
@item @command{join}
@@ -12264,8 +12286,8 @@ Patterns should not include the separato
of a character class. In conformance with Posix, the Cray
@command{sed} will reject @samp{s/[^/]*$//}: use @samp{s,[^/]*$,,}.
-Avoid empty patterns within parentheses (i.e., @samp{\(\)}). Posix is
-silent on whether they are allowed, and Unicos 9 @command{sed} rejects
+Avoid empty patterns within parentheses (i.e., @samp{\(\)}). Posix does
+not require support for empty patterns, and Unicos 9 @command{sed} rejects
them.
Unicos 9 @command{sed} loops endlessly on patterns like @samp{.*\n.*}.
@@ -12273,21 +12295,25 @@ Unicos 9 @command{sed} loops endlessly o
Sed scripts should not use branch labels longer than 8 characters and
should not contain comments.
-Don't include extra @samp{;}, as some @command{sed}, such as [EMAIL PROTECTED]
-1.4.2's, try to interpret the second as a command:
+Avoid redundant @samp{;}, as some @command{sed} implementations, such as
[EMAIL PROTECTED] 1.4.2's, incorrectly try to interpret the second
[EMAIL PROTECTED];} as a command:
@example
$ @kbd{echo a | sed 's/x/x/;;s/x/x/'}
sed: 1: "s/x/x/;;s/x/x/": invalid command code ;
@end example
-Input should have reasonably long lines, since some @command{sed} have
-an input buffer limited to 4000 bytes.
+Input should not have unreasonably