thank you for the awesome software you provide to all of us!
I found an error in the expr match handling patterns with null
expression subpatterns.
What I tried to achieve was to append a path string to PATH, but only
if it was not already included in PATH; thus I tested with expr match,
which worked well until it didn't.
The platform:
expr (GNU coreutils) 9.7
Packaged by Debian (9.7-3)
expr (GNU coreutils) 9.7.282-42c457 (the version I tested the fix on)
Platform: Debian 13
Reproduction (shell):
VAR1=/usr/local/lib/erlang28/bin:/sbin:/bin:/usr/sbin:/usr/bin
expr $VAR1 : '\(.*:\)\{0,1\}/usr/local/lib/erlang28/bin\(:.*\)\{0,1\}'
(empty string)
echo $?
1
Expected behaviour (shell):
VAR1=/usr/local/lib/erlang28/bin:/sbin:/bin:/usr/sbin:/usr/bin
expr $VAR1 : '\(.*:\)\{0,1\}/usr/local/lib/erlang28/bin\(:.*\)\{0,1\}'
57
(the total numer of characters matched)
echo $?
0
The expected behaviour is on FreeBSD and OpenBSD, so different code
base, but I think it's more consistent with the POSIX description of
expr.
The problem in coreutils/expr is the assumption that a subpattern can
not have the length null. POSIX states:
"A subexpression repeated by an <asterisk> ( '*' ) or an interval
expression shall not match a null expression unless this is the only
match for the repetition or it is necessary to satisfy the exact or
minimum number of occurrences for the interval expression."
and that's what happens here: the first subexpression must match the
min number of occurences (well, here to have any match at all).
If I prepend something to the constant string, e.g.:
VAR2=/some/path:${VAR1}
expr $VAR2 : '\(.*:\)\{0,1\}/usr/local/lib/erlang28/bin\(:.*\)\{0,1\}'
/some/path:
echo $?
0
everything works as expected.
To fix it, I suggest to change the logic assessing the re_match
results:
diff --git a/src/expr.c b/src/expr.c
index cd87763df..31a1ab17b 100644
--- a/src/expr.c
+++ b/src/expr.c
@@ -600,15 +600,10 @@ docolon (VALUE *sv, VALUE *pv)
if (0 <= matchlen)
{
/* Were \(...\) used? */
- if (re_buffer.re_nsub > 0)
+ if (re_buffer.re_nsub > 0 && re_regs.end[1] > 0)
{
- if (re_regs.end[1] < 0)
- v = str_value ("");
- else
- {
sv->u.s[re_regs.end[1]] = '\0';
v = str_value (sv->u.s + re_regs.start[1]);
- }
}
else
{
With a first subexpression having a length > 0 we return the first
subexpression, otherwise we return the total match length.
So we get the same behaviour as expr on FreeBSD and OpenBSD.
On the other hand the SUN version of expr, at least on OmniOS,
[email protected]:20250903T092704Z exposes the same behaviour as the
GNU/coreutils,
however, there is one comment suggesting that they considered BSD behaviour at
some time...
So maybe you should keep the current behaviour and just document it in
the man page... My preference for what's worth is the BSD behaviour.
Kind regards
Michael Figiel