Hello, Prompted by the recent bug reports, I decided to do some targeted fuzzing on gnulib's regex module using afl.
So far I found two obscure bugs, and one pathological case. Can be easily reproduced with: $ echo 1 | grep -E "(\'|^)(\1|)" grep: regexec.c:1375: pop_fail_stack: Assertion `num >= 0' failed. Aborted $ echo A | grep -E "$(printf '(\227|)(\\1\\1|t1|\\\2537)+')" Segmentation fault ## stack overflow due to infinite recursionAnd the following pathological case can easily consume hundreds of MB of RAM (more "+" - more RAM):
$ echo 1 | time grep -E '(.)++++++++++++++++++++++|' Attached are valgrind/gdb details of each crash, and also a C reproducer (if it's easier to debug with a tiny C program instead of grep). (As usual, I don't have a fix yet...) regards, - assaf
==6246== Memcheck, a memory error detector ==6246== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6246== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info ==6246== Command: ./src/grep -E (\\'|^)(\\1|) ==6246== grep: regexec.c:1359: pop_fail_stack: Assertion `num >= 0' failed. ==6246== ==6246== Process terminating with default action of signal 6 (SIGABRT) ==6246== at 0x50DDFFF: raise (raise.c:51) ==6246== by 0x50DF429: abort (abort.c:89) ==6246== by 0x50D6E66: __assert_fail_base (assert.c:92) ==6246== by 0x50D6F11: __assert_fail (assert.c:101) ==6246== by 0x138398: pop_fail_stack (regexec.c:1359) ==6246== by 0x138831: set_regs (regexec.c:1463) ==6246== by 0x136F42: re_search_internal (regexec.c:861) ==6246== by 0x136088: re_search_stub (regexec.c:424) ==6246== by 0x135BCF: rpl_re_search (regexec.c:289) ==6246== by 0x10C702: EGexecute (dfasearch.c:357) ==6246== by 0x10E763: grepbuf (grep.c:1397) ==6246== by 0x10EC80: grep (grep.c:1528) ==6246== ==6246== HEAP SUMMARY: ==6246== in use at exit: 55,444 bytes in 173 blocks ==6246== total heap usage: 366 allocs, 193 frees, 95,752 bytes allocated ==6246== ==6246== LEAK SUMMARY: ==6246== definitely lost: 0 bytes in 0 blocks ==6246== indirectly lost: 0 bytes in 0 blocks ==6246== possibly lost: 0 bytes in 0 blocks ==6246== still reachable: 55,444 bytes in 173 blocks ==6246== suppressed: 0 bytes in 0 blocks ==6246== Rerun with --leak-check=full to see details of leaked memory ==6246== ==6246== For counts of detected and suppressed errors, rerun with: -v ==6246== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==7809== Memcheck, a memory error detector ==7809== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==7809== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info ==7809== Command: ./src/grep -E (?|)(\\1\\1|t1|\\?7)+ ==7809== ==7809== Stack overflow in thread #1: can't grow stack to 0xffe801000 grep: stack overflow ==7809== ==7809== HEAP SUMMARY: ==7809== in use at exit: 20,970 bytes in 157 blocks ==7809== total heap usage: 363 allocs, 206 frees, 45,866 bytes allocated ==7809== ==7809== LEAK SUMMARY: ==7809== definitely lost: 0 bytes in 0 blocks ==7809== indirectly lost: 0 bytes in 0 blocks ==7809== possibly lost: 128 bytes in 1 blocks ==7809== still reachable: 20,842 bytes in 156 blocks ==7809== suppressed: 0 bytes in 0 blocks ==7809== Rerun with --leak-check=full to see details of leaked memory ==7809== ==7809== For counts of detected and suppressed errors, rerun with: -v ==7809== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
/* gnulib regex crash reproducer Copyright (C) 2018 Assaf Gordon <assafgor...@gmail.com> License: GPLv3-or-later */ #define _GNU_SOURCE #include <string.h> #include <regex.h> #include <stdio.h> #include <stdlib.h> #include <err.h> int main(void) { const char *input = "1AAAAAA"; static struct re_pattern_buffer regex; #if 1 /* Crash 1: with gnulib: regexec.c:1375: pop_fail_stack: Assertion `num >= 0' failed. with glibc: Invalid read of size 1 at 0x4F07573: re_compile_pattern (regcomp.c:227) by 0x1088CF: main (1.c:35) Address 0x38 is not stack'd, malloc'd or (recently) free'd */ const char *pat = "(\\'|^)(\\1|)"; int no_sub = 1; #else /* crash 2: too-deep recursion in check_dst_limits_calc_pos_1 (regexec.c:1906) */ const char *pat = "(\227|)(\\1\\1|t1|\\\2537)+"; int no_sub = 0; #endif memset (®ex, 0, sizeof regex); struct re_pattern_buffer *preg = (no_sub)?NULL:®ex; regex.no_sub = no_sub; re_set_syntax(RE_SYNTAX_EGREP); const char *s = re_compile_pattern (pat, strlen(pat), preg); if (s) errx(1,"re_compile_pattern failed: %s\n", s); re_search(®ex, input, strlen(input), 0, /* start */ strlen(input), /* range */ NULL /* registers */ ); return 0; }
crash2.gdb.log.gz
Description: application/gzip