grep -F skips an extra character after matched middle of multi-byte
character.  As a result, fails to match at the next position.  A test
case for this bug is already included in tests/sjis-mb.

For example, Following test is failure on grep-2.19 or later.

$ printf '\203AA\n' >in
$ env LC_ALL=ja_JP.SHIFT-JIS src/grep -F A in

We expect that it matches at the second A and outputs the line, but
doesn't output.
From bd5e4650a4d6af7322c623acff647444138e8360 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <[email protected]>
Date: Tue, 18 Nov 2014 13:36:42 +0900
Subject: [PATCH] grep: grep -F fails to match at the next position after
 matched middle of a multi-byte character

grep -F skips an extra character after matched middle of multi-byte
character.  As a result, fails to match at the next position.  A test
case for this bug is already included in tests/sjis-mb.

src/kwsearch.c (Fexecute): Skip correctly after matched middle of a
multi-byte character.
---
 NEWS           | 4 ++++
 src/kwsearch.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 1597528..6c5d8e1 100644
--- a/NEWS
+++ b/NEWS
@@ -48,6 +48,10 @@ GNU grep NEWS                                    -*- outline 
-*-
   of a multibyte character when using a '^'-anchored alternate in a pattern,
   leading it to print non-matching lines.  [bug present since "the beginning"]
 
+  grep -F no longer fails to match at the next position after matched
+  middle of a multi-byte character.
+  [bug introduced in grep-2.19]
+
   grep -E rejected unmatched ')', instead of treating it like '\)'.
   [bug present since "the beginning"]
 
diff --git a/src/kwsearch.c b/src/kwsearch.c
index aa965f6..c0cf0ad 100644
--- a/src/kwsearch.c
+++ b/src/kwsearch.c
@@ -135,7 +135,7 @@ Fexecute (char const *buf, size_t size, size_t *match_size,
         {
           /* The match was a part of multibyte character, advance at least
              one byte to ensure no infinite loop happens.  */
-          beg = mb_start;
+          beg = mb_start - 1;
           continue;
         }
       beg += offset;
-- 
2.1.3

Reply via email to