This seems to fix the issue for me. OK?
martijn@ On Tue, 2020-06-23 at 19:29 -0700, Jordan Geoghegan wrote: > Hello, > > I was working on a couple POSIX regular expressions to search for and > validate IPv4 and IPv6 addresses with optional CIDR blocks, and > encountered some strange behaviour from the base system grep. > > I wanted to validate my regex against a list of every valid IPv4 > address, so I generated a list with a zsh 1 liner: > > for i in {0..255}; do; echo $i.{0..255}.{0..255}.{0..255} ; done | > tr '[:space:]' '\n' > IPv4.txt > > My intentions were to test the regex by running it with 'grep -c' to > confirm there was indeed 2^32 addresses matched, and I also wanted to > benchmark and compare performance between BSD grep, GNU grep and > ripgrep. The command I used: > > grep -Eoc > "((25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])(/[1-9]|/[1-2][[:digit:]]|/3[0-2])?" > > My findings were surprising. Both GNU grep and ripgrep were able get > through the file in roughly 10 and 20 minutes respectively, whereas the > base system grep took over 20 hours! What interested me the most was > that the base system grep when run with '-c' returned '0' for match > count. It seems that 'grep -c' will have its counter overflow if there > are more than 2^32-1 matches (4294967295) and then the counter will > start counting from zero again for further matches. > > ryzen$ time zcat IPv4.txt.gz | grep -Eoc "((25[0-5]|(2[0-4]|1{0,1}... > 0 > 1222m09.32s real 1224m28.02s user 1m16.17s system > > ryzen$ time zcat allip.txt.gz | ggrep -Eoc "((25[0-5]|(2[0-4]|1{0,1}... > 4294967296 > 10m00.38s real 11m40.57s user 0m30.55s system > > ryzen$ time rg -zoc "((25[0-5]|(2[0-4]|1{0,1}... > 4294967296 > 21m06.36s real 27m06.04s user 0m50.08s system > > # See the counter overflow/reset: > jot 4294967350 | grep -c "^[[:digit:]]" > 54 > > All testing was done on a Ryzen desktop machine running 6.7 stable. > > The grep counting bug can be reproduced with this command: > jot 4294967296 | nice grep -c "^[[:digit:]]" > > Regards, > > Jordan > Index: util.c =================================================================== RCS file: /cvs/src/usr.bin/grep/util.c,v retrieving revision 1.62 diff -u -p -r1.62 util.c --- util.c 3 Dec 2019 09:14:37 -0000 1.62 +++ util.c 24 Jun 2020 06:46:52 -0000 @@ -106,7 +106,8 @@ procfile(char *fn) { str_t ln; file_t *f; - int c, t, z, nottext; + int t, z, nottext; + unsigned long long c; mcount = mlimit; @@ -169,7 +170,7 @@ procfile(char *fn) if (cflag) { if (!hflag) printf("%s:", ln.file); - printf("%u\n", c); + printf("%llu\n", c); } if (lflag && c != 0) printf("%s\n", fn);