Paolo Bonzini wrote: > On Sat, Jun 4, 2011 at 09:48, Jim Meyering <j...@meyering.net> wrote: >>> The b2 == EOF part is required for the somewhat similar bug I fixed >>> a month ago: >>> >>> fix a bug whereby echo c|grep '[c]' would fail for any c in 0x80..0xff >>> 8da41c930e03a8635cbd8c89e3e591374c232c89 >>> >>> The corresponding test demonstrates the need: >>> >>> tests: exercise bug with 0x80..0xff in [...] >>> d98338ebf842ec9b69631837eee50ebdcd543505 > > [\xff] is not well defined for a UTF-8 locale at all, actually. > Perhaps FETCH_WC should return wc = EOF in this case (and c = 255), > and it could be handled on a case-by-case basis elsewhere. > > But if wctob returns EOF, and b > UCHAR_MAX, you have introduced an > out-of-bounds access in setbit.
Yes, I saw that. That's why I added the guard I mentioned in previous mail. I would like a test case that would segfault without the (b < NOTCHAR) guard below. If someone can construct one, I'll be more than happy to add it to the test suite. Here's the patch I expect to push: >From 168577596e38981d93ea57d56d325172cfed7dc7 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@redhat.com> Date: Thu, 2 Jun 2011 18:03:49 +0200 Subject: [PATCH 1/5] fix the [...] bug also for relatively unusual uni-byte encodings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * src/dfa.c (setbit_case_fold): Also handle uni-byte locales like the one mentioned in the original report: see 2011-05-07 commit d98338eb. Re-reported by Santiago Ruano Rincón. Note that most uni-byte locales are not affected. * NEWS (Bug fixes): Mention it. --- NEWS | 4 ++++ src/dfa.c | 10 +++++++--- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 312c803..67b3fad 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,10 @@ GNU grep NEWS -*- outline -*- ** Bug fixes + echo c|grep '[c]' would fail for any c in 0x80..0xff, with a uni-byte + encoding for which the byte-to-wide-char mapping is nontrivial. For + example, the ISO-88591 locales are not affected, but ru_RU.KOI8-R is. + grep -P no longer aborts when PCRE's backtracking limit is exceeded Before, echo aaaaaaaaaaaaaab |grep -P '((a+)*)+$' would abort. Now, it diagnoses the problem and exits with status 2. diff --git a/src/dfa.c b/src/dfa.c index b41cbb6..83386aa 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -573,10 +573,14 @@ setbit_case_fold ( else { #if MBS_SUPPORT - int b2 = wctob ((unsigned char) b); - if (b2 == EOF || b2 == b) + /* Below, note how when b2 != b and we have a uni-byte locale + (MB_CUR_MAX == 1), we set b = b2. I.e., in a uni-byte locale, + we can safely call setbit with a non-EOF value returned by wctob. */ + int b2 = wctob (b); + if (b2 == EOF || b2 == b || (MB_CUR_MAX == 1 ? (b=b2), 1 : 0)) #endif - setbit (b, c); + if (b < NOTCHAR) + setbit (b, c); } } -- 1.7.6.rc0.254.gf37de -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org