> Hello,
> 
> I found a strange bug in grep.
> some Japanese runes does not match ‘[^0-9]’.
> 
> for example ‘ま' (307e) and ‘み’(307f).
> 

i can't replicate here with 9atom's fixes to grep.
with the same t3 file as you've got,

        ; wc -l /tmp/t3
             21 /tmp/t3
        ; grep -v '^[0-9]' /tmp/t3 | wc -l
             21

i have some other differences in grep, including -I (same
as -i, except fold runes), but i think the differences in
comp.c are what cause the bug.  in particular, you really
need that 0xffff entry in the tabs.

/n/sources/plan9/sys/src/cmd/grep/comp.c:135,145 - comp.c:135,147
  {
        0x007f,
        0x07ff,
+       0xffff,
  };
  Rune  tab2[] =
  {
        0x003f,
        0x0fff,
+       0xffff,
  };
  
  Re2

the additional pairs and the correction to the combining case
here were not accepted to sources, but they allow for large character
classes generated used by folding.  many of the characters are contiguous
so getting the contiguous case right is important.

/n/sources/plan9/sys/src/cmd/grep/comp.c:215,221 - comp.c:217,223
  Re2
  re2class(char *s)
  {
-       Rune pairs[200+2], *p, *q, ov;
+       Rune pairs[400+2], *p, *q, ov;
        int nc;
        Re2 x;
  
/n/sources/plan9/sys/src/cmd/grep/comp.c:234,240 - comp.c:236,242
                        break;
                p[1] = *p;
                p += 2;
-               if(p >= pairs + nelem(pairs) - 2)
+               if(p == pairs + nelem(pairs) - 2)
                        error("class too big");
                s += chartorune(p, s);
                if(*p != '-')
/n/sources/plan9/sys/src/cmd/grep/comp.c:254,260 - comp.c:256,262
        for(p=pairs+2; *p; p+=2) {
                if(p[0] > p[1])
                        continue;
-               if(p[0] > q[1] || p[1] < q[0]) {
+               if(p[0] > q[1]+1 || p[1] < q[0]) {
                        q[2] = p[0];
                        q[3] = p[1];
                        q += 2;

i believe this case is also critical.  split the bmp off.

/n/sources/plan9/sys/src/cmd/grep/comp.c:275,281 - comp.c:277,283
                        x = re2or(x, rclass(ov, p[0]-1));
                        ov = p[1]+1;
                }
-               x = re2or(x, rclass(ov, Runemask));
+               x = re2or(x, rclass(ov, 0xffff));
        } else {
                x = rclass(p[0], p[1]);
                for(p+=2; *p; p+=2)

- erik

Reply via email to