This seems to fix the issue for me.

OK?

martijn@

On Tue, 2020-06-23 at 19:29 -0700, Jordan Geoghegan wrote:
> Hello,
> 
> I was working on a couple POSIX regular expressions to search for and 
> validate IPv4 and IPv6 addresses with optional CIDR blocks, and 
> encountered some strange behaviour from the base system grep.
> 
> I wanted to validate my regex against a list of every valid IPv4 
> address, so I generated a list with a zsh 1 liner:
> 
>       for i in {0..255}; do; echo $i.{0..255}.{0..255}.{0..255} ; done | 
> tr '[:space:]' '\n' > IPv4.txt
> 
> My intentions were to test the regex by running it with 'grep -c' to 
> confirm there was indeed 2^32 addresses matched, and I also wanted to 
> benchmark and compare performance between BSD grep, GNU grep and 
> ripgrep. The command I used:
> 
>     grep -Eoc 
> "((25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])(/[1-9]|/[1-2][[:digit:]]|/3[0-2])?"
> 
> My findings were surprising. Both GNU grep and ripgrep were able get 
> through the file in roughly 10 and 20 minutes respectively, whereas the 
> base system grep took over 20 hours! What interested me the most was 
> that the base system grep when run with '-c' returned '0' for match 
> count. It seems that 'grep -c' will have its counter overflow if there 
> are more than 2^32-1 matches (4294967295) and then the counter will 
> start counting from zero again for further matches.
> 
>      ryzen$ time zcat IPv4.txt.gz | grep -Eoc "((25[0-5]|(2[0-4]|1{0,1}...
>      0
>      1222m09.32s real  1224m28.02s user     1m16.17s system
> 
>      ryzen$ time zcat allip.txt.gz | ggrep -Eoc "((25[0-5]|(2[0-4]|1{0,1}...
>      4294967296
>      10m00.38s real    11m40.57s user     0m30.55s system
> 
>      ryzen$ time rg -zoc "((25[0-5]|(2[0-4]|1{0,1}...
>      4294967296
>      21m06.36s real    27m06.04s user     0m50.08s system
> 
> # See the counter overflow/reset:
>      jot 4294967350 | grep -c "^[[:digit:]]"
>      54
> 
> All testing was done on a Ryzen desktop machine running 6.7 stable.
> 
> The grep counting bug can be reproduced with this command:
>     jot 4294967296 | nice grep -c "^[[:digit:]]"
> 
> Regards,
> 
> Jordan
> 
Index: util.c
===================================================================
RCS file: /cvs/src/usr.bin/grep/util.c,v
retrieving revision 1.62
diff -u -p -r1.62 util.c
--- util.c      3 Dec 2019 09:14:37 -0000       1.62
+++ util.c      24 Jun 2020 06:46:52 -0000
@@ -106,7 +106,8 @@ procfile(char *fn)
 {
        str_t ln;
        file_t *f;
-       int c, t, z, nottext;
+       int t, z, nottext;
+       unsigned long long c;
 
        mcount = mlimit;
 
@@ -169,7 +170,7 @@ procfile(char *fn)
        if (cflag) {
                if (!hflag)
                        printf("%s:", ln.file);
-               printf("%u\n", c);
+               printf("%llu\n", c);
        }
        if (lflag && c != 0)
                printf("%s\n", fn);

Reply via email to