Yesterday's change to regex.m4 has the effect that now, gnulib's regex code
gets used even on glibc systems. As a consequence, the ASAN+UBSAN build
in gnulib's CI now fails:
FAIL: test-regex
../../gllib/regexec.c:188:36: runtime error: variable length array bound
evaluates to non-positive value 0
What the clang UBSAN is complaining about is this definition of the
regexec function:
int
regexec (const regex_t *__restrict preg, const char *__restrict string,
size_t nmatch, regmatch_t pmatch[_REGEX_NELTS (nmatch)], int eflags)
{ ... }
According to ISO C23 § 6.7.6.2.(5) the value of nmatch must be > 0 here.
Quote:
"If the size is an expression that is not an integer constant expression:
if it occurs in a declaration at function prototype scope, it is treated
as if it were replaced by *; otherwise, each time it is evaluated it
shall have a value greater than zero."
(Here we're in a function definition, not a function prototype.)
But the comments in regexec.c:174..175 indicate that nmatch is allowed to
be 0, and apparently the test suite exercises this case.
So, we can't use the syntax
size_t nmatch, regmatch_t pmatch[nmatch]
here — it is undefined behaviour.
I tried two patches, attached below. The second one has the advantage that
it leaves the declaration of regexec() intact, which is a plus for static
analyzers. But it introduces a new warning:
In file included from ../../gllib/regex.c:71:
../../gllib/regexec.c:192:29: warning: argument 'pmatch' of type 'regmatch_t[]'
with mismatched bound [-Warray-parameter]
192 | size_t nmatch, regmatch_t pmatch[/* nmatch */], int eflags)
| ^
../../gllib/regex.h:687:18: note: previously declared as 'regmatch_t[restrict
__nmatch]' here
687 | regmatch_t __pmatch[_Restrict_arr_
| ^
So, I'm committing the first one.
Bruno
From e9e73bdeab431f29bb263b757bc8558796e475f6 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 14 Apr 2025 16:00:13 +0200
Subject: [PATCH] regex: Fix undefined behaviour.
* lib/regex.h (_REGEX_NELTS): Define to empty; don't use ISO C99
variable-length arrays.
---
ChangeLog | 6 ++++++
lib/regex.h | 8 ++++++--
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 4aa2a83c08..0b1d316a24 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2025-04-14 Bruno Haible <[email protected]>
+
+ regex: Fix undefined behaviour.
+ * lib/regex.h (_REGEX_NELTS): Define to empty; don't use ISO C99
+ variable-length arrays.
+
2025-04-14 Bruno Haible <[email protected]>
select tests: Work around a Cygwin bug.
diff --git a/lib/regex.h b/lib/regex.h
index ff7e43b534..0eb72ce908 100644
--- a/lib/regex.h
+++ b/lib/regex.h
@@ -523,8 +523,12 @@ typedef struct
/* Declarations for routines. */
#ifndef _REGEX_NELTS
-# if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
- && !defined __STDC_NO_VLA__)
+/* The macro _REGEX_NELTS denotes the number of elements in a variable-length
+ array passed to a function.
+ It was meant to make use of ISO C99 variable-length arrays, but this does
+ not work: ISO C23 ?? 6.7.6.2.(5) requires the number of elements to be > 0,
+ but the NMATCH argument to regexec() is allowed to be 0. */
+# if 0
# define _REGEX_NELTS(n) n
# else
# define _REGEX_NELTS(n)
--
2.43.0
From 48e8974874bd5fad45904aed9679ee25b5caefbe Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 14 Apr 2025 16:15:27 +0200
Subject: [PATCH] regex: Fix undefined behaviour.
* lib/regex.h (_REGEX_NELTS): Add comment.
* lib/regexec.c (regexec): Don't use ISO C variable-length array syntax
for the pmatch parameter.
---
ChangeLog | 7 +++++++
lib/regex.h | 2 ++
lib/regexec.c | 6 +++++-
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/ChangeLog b/ChangeLog
index 4aa2a83c08..a835a069d6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2025-04-14 Bruno Haible <[email protected]>
+
+ regex: Fix undefined behaviour.
+ * lib/regex.h (_REGEX_NELTS): Add comment.
+ * lib/regexec.c (regexec): Don't use ISO C variable-length array syntax
+ for the pmatch parameter.
+
2025-04-14 Bruno Haible <[email protected]>
select tests: Work around a Cygwin bug.
diff --git a/lib/regex.h b/lib/regex.h
index ff7e43b534..191bd26836 100644
--- a/lib/regex.h
+++ b/lib/regex.h
@@ -522,6 +522,8 @@ typedef struct
/* Declarations for routines. */
+/* The macro _REGEX_NELTS denotes the number of elements in a variable-length
+ array passed to a function. */
#ifndef _REGEX_NELTS
# if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \
&& !defined __STDC_NO_VLA__)
diff --git a/lib/regexec.c b/lib/regexec.c
index 6923394a08..1f902b1ef6 100644
--- a/lib/regexec.c
+++ b/lib/regexec.c
@@ -183,9 +183,13 @@ static reg_errcode_t extend_buffers (re_match_context_t *mctx, int min_len);
Return 0 if a match is found, REG_NOMATCH if not, REG_BADPAT if
EFLAGS is invalid. */
+/* The declaration of the PMATCH parameter cannot make use of ISO C99
+ variable-length arrays: ISO C23 ?? 6.7.6.2.(5) requires the number of
+ elements to be > 0, but the NMATCH argument is allowed to be 0. */
+
int
regexec (const regex_t *__restrict preg, const char *__restrict string,
- size_t nmatch, regmatch_t pmatch[_REGEX_NELTS (nmatch)], int eflags)
+ size_t nmatch, regmatch_t pmatch[/* nmatch */], int eflags)
{
reg_errcode_t err;
Idx start, length;
--
2.43.0