The regex repetition count parser in awk(1) does not limit the
value of the bound in expressions like {999999999}.

The repeat count is parsed at b.c:1463 as:

  num = 10 * num + c - '0';

with no upper bound check.  This value is then used in
replace_repeat() to compute a buffer size:

  size += atomlen*(firstnum-1);

With large repeat counts, the multiplication overflows signed int.
If the result wraps to a small positive value, malloc allocates a
small buffer and the subsequent memcpy loop writes past it, causing
a heap buffer overflow.

Additionally, the num accumulation itself (10 * num + c) overflows
signed int for values exceeding ~214 million.

The 6-byte awk program /{00}/ triggers a SIGSEGV via this path.

POSIX defines RE_DUP_MAX as the upper bound for repetition counts,
with a minimum value of 255.  OpenBSD's regex library already
enforces this limit (_POSIX_RE_DUP_MAX = 255 in limits.h).

Fix: reject repetition counts exceeding _POSIX_RE_DUP_MAX, which
is consistent with the system regex library and POSIX.

Found by AFL++ fuzzing with UBSan.

Index: usr.bin/awk/b.c
===================================================================
RCS file: /cvs/src/usr.bin/awk/b.c,v
retrieving revision 1.55
diff -u -p -r1.55 b.c
--- usr.bin/awk/b.c     5 Feb 2025 20:32:56 -0000       1.55
+++ usr.bin/awk/b.c
@@ -1461,6 +1461,9 @@ int relex(void)           /* lexical analyzer for
                                        lastre);
                        } else if (isdigit(c)) {
                                num = 10 * num + c - '0';
+                               if (num > _POSIX_RE_DUP_MAX)
+                                       FATAL("repetition count %.20s too 
large",
+                                               lastre);
                                digitfound = true;
                        } else if (c == ',') {
                                if (commafound)

Reply via email to