On 2020-06-23 22:29, Jordan Geoghegan wrote:
> Hello,
> 
> I was working on a couple POSIX regular expressions to search for and 
> validate IPv4 and IPv6 addresses with optional CIDR blocks, and encountered 
> some strange behaviour from the base system grep.
> 
> I wanted to validate my regex against a list of every valid IPv4 address, so 
> I generated a list with a zsh 1 liner:
> 
>      for i in {0..255}; do; echo $i.{0..255}.{0..255}.{0..255} ; done | tr 
> '[:space:]' '\n' > IPv4.txt
> 
> My intentions were to test the regex by running it with 'grep -c' to confirm 
> there was indeed 2^32 addresses matched, and I also wanted to benchmark and 
> compare performance between BSD grep, GNU grep and ripgrep. The command I 
> used:
> 
>    grep -Eoc 
> "((25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])(/[1-9]|/[1-2][[:digit:]]|/3[0-2])?"
> 
> My findings were surprising. Both GNU grep and ripgrep were able get through 
> the file in roughly 10 and 20 minutes respectively, whereas the base system 
> grep took over 20 hours! What interested me the most was that the base system 
> grep when run with '-c' returned '0' for match count. It seems that 'grep -c' 
> will have its counter overflow if there are more than 2^32-1 matches 
> (4294967295) and then the counter will start counting from zero again for 
> further matches.

Does OpenBSD use an NFA/DFA regular expression implementation, or a
backtracking one?  If it uses the latter, then your regex is probably
causing catastrophic backtracking.

> Regards,
> 
> Jordan

Sincerely,

Demi

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to