grep -f scales extremely poorly with number of lines in pattern file

Narfi Stefansson Mon, 20 Mar 2006 09:34:56 -0800

Hi all,

I was expecting it to take order n time to match against n patterns, where n 
is the number of lines in the pattern file.


What I am experiencing:  The execution time of grep -f behaves like n^a, with 
a somewhere in the range [2.5, 3.0]. 

The execution time with a 100 line pattern file was 0.17 sec, but the 
execution time with a 1000 line pattern file was 59.7 seconds, nowhere near 
the 10 * 0.17 sec = 1.7 seconds that I expected.



Details  and reproduction steps are given below.

Thanks,

Narfi

Here is what I did, I created 1000 strings and 1000 regexps:

cat /dev/urandom  | od -x | head -1000 > regexps.txt 
cat /dev/urandom  | od -x | head -1000 > text.txt

Avoid any problems with the locale:
export LC_ALL=C
export LANG=C
export LANGUAGE=C

I then timed the execution of matching against the first 100, 200, etc 
patterns in regexps.txt:
for n in 100 200 300 400 500 600 700 800 900 1000; do 
    head -$n regexps.txt > current_regexps.txt
    /usr/bin/time grep -f current_regexps.txt text.txt
done

Following is the user time in seconds as reported by /usr/bin/time, for n in 
100, 200, ..., 1000:

0.17 0.80 2.08 4.10 7.02 11.33 17.31 28.68 41.39 59.70

This is on Mandriva 2006.
% grep --version
grep (GNU grep) 2.5.1

grep -f scales extremely poorly with number of lines in pattern file

Reply via email to