i have a >1GB text file (say, input), and want to count lines matching
some pattern (say, '^>>'). using grep, i got the following timings:
time (grep -c '^>>' input)
# ~6m20s
time (grep '^>>' input | wc -l)
# ~5m20s
sed is much faster:
time (sed -n '/^>>/p' input | wc -l)
# ~0m5s
what's the difference between grep and sed that makes grep so much
slower here?
interestingly,
time (grep -cP '^>>' input)
# ~0m0.2s
it could be that grep buffers the lines before it outputs them, and this
causes slowdown on large files, but then -P would not change it, would
it? or does -P change not only regexing, but also outputting?
in all the examples above, the actual output (the line count) was correct.
vQ