Package: libc6 Version: 2.3.2.ds1-11 Severity: normal Followup-For: Bug #216512
I ended up with a similar problem due to this bug. I orignially tried to report the bug using GNU's bug tracker, but since it's been defunct for over a week now, I'll attach my bug report here. I believe Anders's patch will fix the problem (my test program crashes at the same point), although I haven't tested it because I don't want to sit through a glibc compile right now. Here's my GNU bug report: >Submitter-Id: net >Originator: Ben Winslow >Organization: >Confidential: no >Synopsis: regexec() causes SIGSEGV with invalid multibyte string >Severity: serious >Priority: medium >Category: libc >Class: sw-bug >Release: libc-2.3.2 >Environment: Host type: i386-pc-linux-gnu System: Linux portal 2.6.1 #2 Thu Jan 29 01:34:38 EST 2004 i686 GNU/Linux Architecture: i686 Addons: linuxthreads Build CFLAGS: -g -O2 Build CC: gcc-3.3 Compiler version: 3.3.3 20031229 (prerelease) (Debian) Kernel headers: UTS_RELEASE Symbol versioning: yes Build static: yes Build shared: yes Build pic-default: no Build profile: yes Build omitfp: no Build bounded: no Build static-nss: no >Description: regexec() (in find_collation_sequence_value) causes a segmentation violation in when an invalid multibyte string is passed in the 'string' parameter. Backtrace: #0 0x400d847a in find_collation_sequence_value (mbs=0x805f170 "\xC2\xB8\xEF\xBF\xBD\bZ]TEST", mbs_len=2) at regexec.c:3644 #1 0x400d8223 in check_node_accept_bytes (preg=0xbffff420, node_idx=1016740, input=0x0, str_idx=1) at regexec.c:3534 #2 0x400d5ca4 in transit_state_mb (preg=0xbffff420, pstate=0x805f580, mctx=0xbffff124) at regexec.c:2305 #3 0x400d596e in transit_state (err=0xbffff0c8, preg=0xbffff420, mctx=0xbffff124, state=0x805f580, fl_search=0) at regexec.c:2067 #4 0x400d3951 in check_matching (preg=0xbffff420, mctx=0xbffff124, fl_search=0, fl_longest_match=0) at regexec.c:1009 #5 0x400d3193 in re_search_internal (preg=0xbffff420, string=0x804873f "\xEF\xBF\xBD\xC2\xB8\xEF\xBF\xBD", length=4, start=0, range=4, stop=1016740, nmatch=0, pmatch=0x0, eflags=0) at regexec.c:744 #6 0x400d2701 in __regexec (preg=0xbffff420, string=0x804873f "\xEF\xBF\xBD\xC2\xB8\xEF\xBF\xBD", nmatch=1016740, pmatch=0xf83a4, eflags=0) at regexec.c:221 #7 0x08048592 in main () >How-To-Repeat: Set your locale to en_US.UTF-8. Build and execute the following code: ------------------------------ 8< ------------------------------ #include <sys/types.h> #include <regex.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <locale.h> int main(int argc, char *argv[]) { regex_t expression; char errbuf[512]; int error; setlocale(LC_ALL, ""); if ((error = regcomp(&expression, "[^a-z]test", REG_EXTENDED | REG_ICASE)) != 0) { regerror(error, &expression, errbuf, sizeof(errbuf)); fprintf(stderr, "regexp compilation failed: %s\n", errbuf); return 1; } printf("regexec: %d\n", regexec(&expression, "\xe2\xc2\xb8\xe2", 0, NULL, 0)); regfree(&expression); return 0; } -- System Information: Debian Release: testing/unstable APT prefers unstable APT policy: (500, 'unstable'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.1 Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 Versions of packages libc6 depends on: ii libdb1-compat 2.1.3-7 The Berkeley database routines [gl -- no debconf information