On 3/31/26 14:40, Renaud Allard wrote:
> The regex repetition count parser in awk(1) does not limit the
> value of the bound in expressions like {999999999}.
>
> The repeat count is parsed at b.c:1463 as:
>
> num = 10 * num + c - '0';
>
> with no upper bound check. This value is then used in
> replace_repeat() to compute a buffer size:
>
> size += atomlen*(firstnum-1);
>
> With large repeat counts, the multiplication overflows signed int.
> If the result wraps to a small positive value, malloc allocates a
> small buffer and the subsequent memcpy loop writes past it, causing
> a heap buffer overflow.
>
> Additionally, the num accumulation itself (10 * num + c) overflows
> signed int for values exceeding ~214 million.
>
> The 6-byte awk program /{00}/ triggers a SIGSEGV via this path.
>
> POSIX defines RE_DUP_MAX as the upper bound for repetition counts,
> with a minimum value of 255. OpenBSD's regex library already
> enforces this limit (_POSIX_RE_DUP_MAX = 255 in limits.h).
>
> Fix: reject repetition counts exceeding _POSIX_RE_DUP_MAX, which
> is consistent with the system regex library and POSIX.
Can you explain why you choose _POSIX_RE_DUP_MAX over RE_DUP_MAX?
POSIX defines _POSIX_ as the minimum value, and regexec(3) tells us to
use RE_DUP_MAX as well. These two defines are identical on OpenBSD, but
why limit to the lowest possible value of the two?
martijn@
>
> Found by AFL++ fuzzing with UBSan.
>
> Index: usr.bin/awk/b.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/awk/b.c,v
> retrieving revision 1.55
> diff -u -p -r1.55 b.c
> --- usr.bin/awk/b.c 5 Feb 2025 20:32:56 -0000 1.55
> +++ usr.bin/awk/b.c
> @@ -1461,6 +1461,9 @@ int relex(void) /* lexical analyzer for
> lastre);
> } else if (isdigit(c)) {
> num = 10 * num + c - '0';
> + if (num > _POSIX_RE_DUP_MAX)
> + FATAL("repetition count %.20s too
> large",
> + lastre);
> digitfound = true;
> } else if (c == ',') {
> if (commafound)
>