Hi all,
I was expecting it to take order n time to match against n patterns, where n
is the number of lines in the pattern file.
What I am experiencing: The execution time of grep -f behaves like n^a, with
a somewhere in the range [2.5, 3.0].
The execution time with a 100 line pattern file was 0.17 sec, but the
execution time with a 1000 line pattern file was 59.7 seconds, nowhere near
the 10 * 0.17 sec = 1.7 seconds that I expected.
Details and reproduction steps are given below.
Thanks,
Narfi
Here is what I did, I created 1000 strings and 1000 regexps:
cat /dev/urandom | od -x | head -1000 > regexps.txt
cat /dev/urandom | od -x | head -1000 > text.txt
Avoid any problems with the locale:
export LC_ALL=C
export LANG=C
export LANGUAGE=C
I then timed the execution of matching against the first 100, 200, etc
patterns in regexps.txt:
for n in 100 200 300 400 500 600 700 800 900 1000; do
head -$n regexps.txt > current_regexps.txt
/usr/bin/time grep -f current_regexps.txt text.txt
done
Following is the user time in seconds as reported by /usr/bin/time, for n in
100, 200, ..., 1000:
0.17 0.80 2.08 4.10 7.02 11.33 17.31 28.68 41.39 59.70
This is on Mandriva 2006.
% grep --version
grep (GNU grep) 2.5.1