On 30 Jan 2013, Paul Eggert spake thusly: > + /* This test is from glibc bug 15078. > + The test case is from Andreas Schwab in > + > <http://www.sourceware.org/ml/libc-alpha/2013-01/msg00967.html>. > + */ > + static char const pat[] = "[^x]x"; > + static char const data[] = > + > "\xe1\x80\x80\xe1\x80\xbb\xe1\x80\xbd\xe1\x80\x94\xe1\x80" > + "\xba\xe1\x80\xaf\xe1\x80\x95\xe1\x80\xbax"; > + re_set_syntax (0); > + memset (®ex, 0, sizeof regex); > + s = re_compile_pattern (pat, sizeof pat - 1, ®ex); > + if (s) > + result |= 1; > + else if (re_search (®ex, data, sizeof data - 1, > + 0, sizeof data - 1, 0) > + != 20) > + result |= 1; > + }
I note that a glibc 2.17 with 7e2f0d2d77e4bc273fe00f99d970605d8e38d4d6 and a445af0bc722d620afed7683cd320c0e4c7c6059 (Andreas's fix) applied does not crash on this test -- but does not appear to work as it expects (or as I'd expect) either, returning 0. Introducing a single spurious character after the first byte of 'data', like so: static char const data[] = "\xe1""a\x80\x80\xe1\x80\xbb\xe1\x80\xbd\xe1\x80\x94\xe1\x80" "\xba\xe1\x80\xaf\xe1\x80\x95\xe1\x80\xbax"; changes the return value of re_search() to 16, still not right. (This is with locale set to en_US.UTF-8, just as in the original glibc testcase, which *does* pass.) This is the same behaviour that is exhibited in the glibc testcase (posix/bug-regex34.c): there, re_search() also returns 0, but since that test does not check the return value of re_search() but merely checks that it does not segfault, the glibc test passes. Perhaps this failure is known, but I would say that all is not yet well in the state of regex. -- NULL && (void)