i have a >1GB text file (say, input), and want to count lines matching
some pattern (say, '^>>').  using grep, i got the following timings:

    time (grep -c '^>>' input)
    # ~6m20s

    time (grep '^>>' input | wc -l)
   # ~5m20s

sed is much faster:

    time (sed -n '/^>>/p' input | wc -l)
    # ~0m5s

what's the difference between grep and sed that makes grep so much
slower here?

interestingly,

    time (grep -cP '^>>' input)
    # ~0m0.2s

it could be that grep buffers the lines before it outputs them, and this
causes slowdown on large files, but then -P would not change it, would
it?  or does -P change not only regexing, but also outputting?

in all the examples above, the actual output (the line count) was correct.

vQ


Reply via email to