On 2020-06-23 22:29, Jordan Geoghegan wrote: > Hello, > > I was working on a couple POSIX regular expressions to search for and > validate IPv4 and IPv6 addresses with optional CIDR blocks, and encountered > some strange behaviour from the base system grep. > > I wanted to validate my regex against a list of every valid IPv4 address, so > I generated a list with a zsh 1 liner: > > for i in {0..255}; do; echo $i.{0..255}.{0..255}.{0..255} ; done | tr > '[:space:]' '\n' > IPv4.txt > > My intentions were to test the regex by running it with 'grep -c' to confirm > there was indeed 2^32 addresses matched, and I also wanted to benchmark and > compare performance between BSD grep, GNU grep and ripgrep. The command I > used: > > grep -Eoc > "((25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])(/[1-9]|/[1-2][[:digit:]]|/3[0-2])?" > > My findings were surprising. Both GNU grep and ripgrep were able get through > the file in roughly 10 and 20 minutes respectively, whereas the base system > grep took over 20 hours! What interested me the most was that the base system > grep when run with '-c' returned '0' for match count. It seems that 'grep -c' > will have its counter overflow if there are more than 2^32-1 matches > (4294967295) and then the counter will start counting from zero again for > further matches.
Does OpenBSD use an NFA/DFA regular expression implementation, or a backtracking one? If it uses the latter, then your regex is probably causing catastrophic backtracking. > Regards, > > Jordan Sincerely, Demi
signature.asc
Description: OpenPGP digital signature