The regex repetition count parser in awk(1) does not limit the
value of the bound in expressions like {999999999}.
The repeat count is parsed at b.c:1463 as:
num = 10 * num + c - '0';
with no upper bound check. This value is then used in
replace_repeat() to compute a buffer size:
size += atomlen*(firstnum-1);
With large repeat counts, the multiplication overflows signed int.
If the result wraps to a small positive value, malloc allocates a
small buffer and the subsequent memcpy loop writes past it, causing
a heap buffer overflow.
Additionally, the num accumulation itself (10 * num + c) overflows
signed int for values exceeding ~214 million.
The 6-byte awk program /{00}/ triggers a SIGSEGV via this path.
POSIX defines RE_DUP_MAX as the upper bound for repetition counts,
with a minimum value of 255. OpenBSD's regex library already
enforces this limit (_POSIX_RE_DUP_MAX = 255 in limits.h).
Fix: reject repetition counts exceeding _POSIX_RE_DUP_MAX, which
is consistent with the system regex library and POSIX.
Found by AFL++ fuzzing with UBSan.
Index: usr.bin/awk/b.c
===================================================================
RCS file: /cvs/src/usr.bin/awk/b.c,v
retrieving revision 1.55
diff -u -p -r1.55 b.c
--- usr.bin/awk/b.c 5 Feb 2025 20:32:56 -0000 1.55
+++ usr.bin/awk/b.c
@@ -1461,6 +1461,9 @@ int relex(void) /* lexical analyzer for
lastre);
} else if (isdigit(c)) {
num = 10 * num + c - '0';
+ if (num > _POSIX_RE_DUP_MAX)
+ FATAL("repetition count %.20s too
large",
+ lastre);
digitfound = true;
} else if (c == ',') {
if (commafound)