Hello,

Prompted by the recent bug reports, I decided to do some
targeted fuzzing on gnulib's regex module using afl.

So far I found two obscure bugs, and one pathological case.

Can be easily reproduced with:

   $ echo 1 |  grep -E "(\'|^)(\1|)"
   grep: regexec.c:1375: pop_fail_stack: Assertion `num >= 0' failed.
   Aborted

   $ echo A | grep -E "$(printf '(\227|)(\\1\\1|t1|\\\2537)+')"
   Segmentation fault  ## stack overflow due to infinite recursion

And the following pathological case can easily consume hundreds of MB of RAM (more "+" - more RAM):

   $ echo 1 | time grep -E '(.)++++++++++++++++++++++|'


Attached are valgrind/gdb details of each crash,
and also a C reproducer (if it's easier to debug with a tiny
C program instead of grep).

(As usual, I don't have a fix yet...)



regards,
  - assaf

==6246== Memcheck, a memory error detector
==6246== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6246== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==6246== Command: ./src/grep -E (\\'|^)(\\1|)
==6246== 
grep: regexec.c:1359: pop_fail_stack: Assertion `num >= 0' failed.
==6246== 
==6246== Process terminating with default action of signal 6 (SIGABRT)
==6246==    at 0x50DDFFF: raise (raise.c:51)
==6246==    by 0x50DF429: abort (abort.c:89)
==6246==    by 0x50D6E66: __assert_fail_base (assert.c:92)
==6246==    by 0x50D6F11: __assert_fail (assert.c:101)
==6246==    by 0x138398: pop_fail_stack (regexec.c:1359)
==6246==    by 0x138831: set_regs (regexec.c:1463)
==6246==    by 0x136F42: re_search_internal (regexec.c:861)
==6246==    by 0x136088: re_search_stub (regexec.c:424)
==6246==    by 0x135BCF: rpl_re_search (regexec.c:289)
==6246==    by 0x10C702: EGexecute (dfasearch.c:357)
==6246==    by 0x10E763: grepbuf (grep.c:1397)
==6246==    by 0x10EC80: grep (grep.c:1528)
==6246== 
==6246== HEAP SUMMARY:
==6246==     in use at exit: 55,444 bytes in 173 blocks
==6246==   total heap usage: 366 allocs, 193 frees, 95,752 bytes allocated
==6246== 
==6246== LEAK SUMMARY:
==6246==    definitely lost: 0 bytes in 0 blocks
==6246==    indirectly lost: 0 bytes in 0 blocks
==6246==      possibly lost: 0 bytes in 0 blocks
==6246==    still reachable: 55,444 bytes in 173 blocks
==6246==         suppressed: 0 bytes in 0 blocks
==6246== Rerun with --leak-check=full to see details of leaked memory
==6246== 
==6246== For counts of detected and suppressed errors, rerun with: -v
==6246== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

==7809== Memcheck, a memory error detector
==7809== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==7809== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==7809== Command: ./src/grep -E (?|)(\\1\\1|t1|\\?7)+
==7809== 
==7809== Stack overflow in thread #1: can't grow stack to 0xffe801000
grep: stack overflow
==7809== 
==7809== HEAP SUMMARY:
==7809==     in use at exit: 20,970 bytes in 157 blocks
==7809==   total heap usage: 363 allocs, 206 frees, 45,866 bytes allocated
==7809== 
==7809== LEAK SUMMARY:
==7809==    definitely lost: 0 bytes in 0 blocks
==7809==    indirectly lost: 0 bytes in 0 blocks
==7809==      possibly lost: 128 bytes in 1 blocks
==7809==    still reachable: 20,842 bytes in 156 blocks
==7809==         suppressed: 0 bytes in 0 blocks
==7809== Rerun with --leak-check=full to see details of leaked memory
==7809== 
==7809== For counts of detected and suppressed errors, rerun with: -v
==7809== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

/* gnulib regex crash reproducer
   Copyright (C) 2018 Assaf Gordon <assafgor...@gmail.com>
   License: GPLv3-or-later */
#define _GNU_SOURCE
#include <string.h>
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#include <err.h>

int main(void)
{
  const char *input = "1AAAAAA";
  static struct re_pattern_buffer regex;

#if 1
  /* Crash 1:
   with gnulib:
     regexec.c:1375: pop_fail_stack: Assertion `num >= 0' failed.
   with glibc:
      Invalid read of size 1
        at 0x4F07573: re_compile_pattern (regcomp.c:227)
        by 0x1088CF: main (1.c:35)
       Address 0x38 is not stack'd, malloc'd or (recently) free'd
*/
  const char *pat = "(\\'|^)(\\1|)";
  int no_sub = 1;
#else
  /* crash 2:
     too-deep recursion in check_dst_limits_calc_pos_1 (regexec.c:1906) */
  const char *pat = "(\227|)(\\1\\1|t1|\\\2537)+";
  int no_sub = 0;
#endif

  memset (&regex, 0, sizeof regex);
  struct re_pattern_buffer *preg = (no_sub)?NULL:&regex;
  regex.no_sub = no_sub;

  re_set_syntax(RE_SYNTAX_EGREP);
  const char *s = re_compile_pattern (pat, strlen(pat), preg);
  if (s)
    errx(1,"re_compile_pattern failed: %s\n", s);

  re_search(&regex, input, strlen(input),
	    0, /* start */
            strlen(input), /* range */
            NULL /* registers */
	    );
  return 0;
}

Attachment: crash2.gdb.log.gz
Description: application/gzip

Reply via email to