On Wed, Jan 29, 2014 at 1:43 AM, Santiago <[email protected]> wrote:
> Package: grep
> Version: 2.16
> Severity: important
>
> Hi there,
>
> I forward this bug from debian's BTS. Last changes in -P brought another
> problem. I've confirmed this behavior on last debian package:
>
> ----- Forwarded message from Vincent Lefevre <[email protected]> -----
>
> [snip]
>
>
> grep -P loops on some files with invalid UTF-8 sequences, e.g.
>
> $ /usr/bin/printf "\xe9\x65\n\xab\n" | grep -P '.e|.?z' | head
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
>
> (the infinite loop is interrupted here by a broken pipe due to
> the "head").
>
> It seems that the fix of
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730472
Thanks for the heads-up. That appears to be a problem with pcre.
I've just build grep (git head) against pcre (git head), and adjusted
your example slightly and built with gcc's address sanitizer mode.
Now, libpcre gets an internal segfault:
$ printf "\xe9\n\xab\n" > k; src/grep -P 'e|.?z' k
ASAN:SIGSEGV
=================================================================
==11821==ERROR: AddressSanitizer: SEGV on unknown address
0x62cfffffffff (pc 0x00\
00004f0743 sp 0x7fff6b32f4a0 bp 0x7fff6b32f760 T0)
#0 0x4f0742 in match /w/co/pcre/pcre_exec.c:5943
#1 0x4f26d5 in pcre_exec /w/co/pcre/pcre_exec.c:6941
#2 0x46f421 in Pexecute /w/co/grep/src/pcresearch.c:178
#3 0x4717a3 in do_execute /w/co/grep/src/main.c:1075
#4 0x4717a3 in grepbuf /w/co/grep/src/main.c:1111
#5 0x472249 in grep /w/co/grep/src/main.c:1222
#6 0x472249 in grepdesc /w/co/grep/src/main.c:1476
#7 0x4073ca in main /w/co/grep/src/main.c:2396
#8 0x7f6f21a53cdc in __libc_start_main (/lib64/libc.so.6+0x1ecdc)
#9 0x408a54 (/w/u/w/co/grep/src/grep+0x408a54)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /w/co/pcre/pcre_exec.c:5943 match
==11821==ABORTING
Sorry, but I don't have time to debug further. Quick glance suggests
it is backing up too far:
(gdb) b __asan_report_error
Breakpoint 1 at 0x448c40: file
../../.././libsanitizer/asan/asan_report.cc, line 711.
(gdb) r
Starting program: /w/u/w/co/grep/src/grep -P e\|.\?z k
warning: no loadable sections found in added symbol-file
system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00000000004f0743 in match (eptr=0x62cfffffffff "",
ecode=0x60700000df8a "\035zx",
mstart=0x62d00000b002 "\253\n", '\276' <repeats 198 times>...,
offset_top=2, md=0x7fffffffce30, eptrb=0x0, rdepth=0)
at pcre_exec.c:5943
5943 BACKCHAR(eptr);
(gdb) l
5938 {
5939 if (eptr == pp) goto TAIL_RECURSE;
5940 RMATCH(eptr, ecode, offset_top, md, eptrb, RM46);
5941 if (rrc != MATCH_NOMATCH) RRETURN(rrc);
5942 eptr--;
5943 BACKCHAR(eptr);
5944 if (ctype == OP_ANYNL && eptr > pp && UCHAR21(eptr)
== CHAR_NL &&
5945 UCHAR21(eptr - 1) == CHAR_CR) eptr--;
5946 }
5947 }