On 3/31/26 14:40, Renaud Allard wrote:
> The regex repetition count parser in awk(1) does not limit the
> value of the bound in expressions like {999999999}.
> 
> The repeat count is parsed at b.c:1463 as:
> 
>   num = 10 * num + c - '0';
> 
> with no upper bound check.  This value is then used in
> replace_repeat() to compute a buffer size:
> 
>   size += atomlen*(firstnum-1);
> 
> With large repeat counts, the multiplication overflows signed int.
> If the result wraps to a small positive value, malloc allocates a
> small buffer and the subsequent memcpy loop writes past it, causing
> a heap buffer overflow.
> 
> Additionally, the num accumulation itself (10 * num + c) overflows
> signed int for values exceeding ~214 million.
> 
> The 6-byte awk program /{00}/ triggers a SIGSEGV via this path.
> 
> POSIX defines RE_DUP_MAX as the upper bound for repetition counts,
> with a minimum value of 255.  OpenBSD's regex library already
> enforces this limit (_POSIX_RE_DUP_MAX = 255 in limits.h).
> 
> Fix: reject repetition counts exceeding _POSIX_RE_DUP_MAX, which
> is consistent with the system regex library and POSIX.

Can you explain why you choose _POSIX_RE_DUP_MAX over RE_DUP_MAX?
POSIX defines _POSIX_ as the minimum value, and regexec(3) tells us to
use RE_DUP_MAX as well. These two defines are identical on OpenBSD, but
why limit to the lowest possible value of the two?

martijn@
> 
> Found by AFL++ fuzzing with UBSan.
> 
> Index: usr.bin/awk/b.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/awk/b.c,v
> retrieving revision 1.55
> diff -u -p -r1.55 b.c
> --- usr.bin/awk/b.c   5 Feb 2025 20:32:56 -0000       1.55
> +++ usr.bin/awk/b.c
> @@ -1461,6 +1461,9 @@ int relex(void)         /* lexical analyzer for
>                                       lastre);
>                       } else if (isdigit(c)) {
>                               num = 10 * num + c - '0';
> +                             if (num > _POSIX_RE_DUP_MAX)
> +                                     FATAL("repetition count %.20s too 
> large",
> +                                             lastre);
>                               digitfound = true;
>                       } else if (c == ',') {
>                               if (commafound)
> 

Reply via email to