Santiago Ruano Rincón wrote: > Follow-up Comment #3, bug #33198 (project grep): > It seems the problem is still unsolved. I've tried both, 2.8 and patching 2.7, > but I got the same results. Igor Ladygin confirms this. > > santiago@nomada:~$ echo Пример| LC_ALL=ru_RU.KOI8-R grep -qE "[Пп]"; > echo $? > 1
Thank you. At first I was going to say this: You are using ru_RU.KOI8-R, which is a uni-byte locale, yet your inputs (both stdin and the grep regexp) use the two-byte representation, П (\xd0\9f), instead of the uni-byte П (\360). But it fails even with the single-byte version. So it is indeed a bug in grep, but at least this time it affects relatively few locales. Here's the fix I expect to use and a test case to exercise it. >From 8e214a2ecc4bac7f8341deb3646b6f1c3819dac3 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@redhat.com> Date: Thu, 2 Jun 2011 18:03:49 +0200 Subject: [PATCH 1/2] fix the range bug also for relatively unusual uni-byte encodings * src/dfa.c (setbit_case_fold) Bug fix. FIXME * NEWS (Bug fixes): Mention it. --- NEWS | 4 ++++ src/dfa.c | 7 +++++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 312c803..67b3fad 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,10 @@ GNU grep NEWS -*- outline -*- ** Bug fixes + echo c|grep '[c]' would fail for any c in 0x80..0xff, with a uni-byte + encoding for which the byte-to-wide-char mapping is nontrivial. For + example, the ISO-88591 locales are not affected, but ru_RU.KOI8-R is. + grep -P no longer aborts when PCRE's backtracking limit is exceeded Before, echo aaaaaaaaaaaaaab |grep -P '((a+)*)+$' would abort. Now, it diagnoses the problem and exits with status 2. diff --git a/src/dfa.c b/src/dfa.c index b41cbb6..0ce6242 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -573,8 +573,11 @@ setbit_case_fold ( else { #if MBS_SUPPORT - int b2 = wctob ((unsigned char) b); - if (b2 == EOF || b2 == b) + /* Below, note how when b2 != b and we have a uni-byte locale + (MB_CUR_MAX == 1), we set b = b2. I.e., in a uni-byte locale, + we can safely call setbit with a non-EOF value returned by wctob. */ + int b2 = wctob (b); + if (b2 == EOF || b2 == b || (MB_CUR_MAX == 1 ? (b=b2), 1 : 0)) #endif setbit (b, c); } -- 1.7.6.rc0.254.gf37de >From c93e621ac20d085abda4cf3c269f5cf902671a84 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@redhat.com> Date: Thu, 2 Jun 2011 11:01:35 +0200 Subject: [PATCH 2/2] tests: exercise a non-UTF8 multi-byte range bug: requires ru_RU.KOI8-R * tests/mb-non-utf8-range: New file. * tests/Makefile.am (TESTS): Add it. * init.cfg (require_ru_RU_koi8_r): New function. --- tests/Makefile.am | 1 + tests/init.cfg | 9 +++++++++ tests/mb-non-utf8-range | 41 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+), 0 deletions(-) create mode 100644 tests/mb-non-utf8-range diff --git a/tests/Makefile.am b/tests/Makefile.am index a01b004..2d0527a 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -63,6 +63,7 @@ TESTS = \ inconsistent-range \ khadafy \ max-count-vs-context \ + mb-non-utf8-range \ high-bit-range \ options \ pcre \ diff --git a/tests/init.cfg b/tests/init.cfg index 3429f0d..f6ead9c 100644 --- a/tests/init.cfg +++ b/tests/init.cfg @@ -69,3 +69,12 @@ require_en_utf8_locale_() *) skip_test_ 'en_US.UTF-8 locale not found' ;; esac } + +require_ru_RU_koi8_r() +{ + path_prepend_ . + case $(get-mb-cur-max ru_RU.KOI8-R) in + 1) ;; + *) skip_test_ 'ru_RU.KOI8-R locale not found' ;; + esac +} diff --git a/tests/mb-non-utf8-range b/tests/mb-non-utf8-range new file mode 100644 index 0000000..a0b51dd --- /dev/null +++ b/tests/mb-non-utf8-range @@ -0,0 +1,41 @@ +#!/bin/sh +# Exercise a DFA range bug that arises only with a unibyte encoding +# for which the wide-char-to-single-byte mapping is nontrivial. +# E.g., the regexp, [C] would fail to match C in a unibyte locale like +# ru_RU.KOI8-R for any C whose wide-char representation differed from +# its single-byte equivalent. + +# Copyright (C) 2011 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +require_ru_RU_koi8_r +LC_ALL=ru_RU.KOI8-R +export LC_ALL + +fail=0 + +for i in 8 9 a b c d e f; do + for j in 0 1 2 3 4 5 6 7 8 9 a b c d e f; do + in=in-$i$j + b=$(printf "\\x$i$j") + echo "$b" > $in || framework_failure_ + cp $in /t + grep "[$b]" $in > out || fail=1 + compare out $in || fail=1 + done +done + +Exit $fail -- 1.7.6.rc0.254.gf37de -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org