I can not really share the log file in question (it has sensitive customer data), but I did run the following test.
root@piks:/tmp# time result=`grep -a 'best_bid' ff.log` real 0m0.026s user 0m0.020s sys 0m0.004s root@piks:/tmp# time result=`grep -a 'best_bid\|fixed' ff.log` real 0m1.881s user 0m1.868s sys 0m0.008s root@piks:/tmp# wc -l ff.log 790754 ff.log root@piks:/tmp# locale (this is normally UTF-8 but I changed it) LANG=C LANGUAGE=en_US:en LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL= root@piks:/tmp# cat /etc/debian_version 9.6 jan@mm1:~$ time result=`grep -a 'best_bid' ff.log` real 0m0.039s user 0m0.020s sys 0m0.016s jan@mm1:~$ time result=`grep -a 'best_bid\|fixed' ff.log` real 0m0.173s user 0m0.164s sys 0m0.008s jan@mm1:~$ wc -l ff.log 790754 ff.log jan@mm1:~$ locale LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= jan@mm1:~$ cat /etc/debian_version 7.11 This is the exact same log file (I scp-ed it) and ran the commands, notice how it takes more time on the Debian 9 machine than on the Debian 7 machine (which is similar to my experience with Debian 8). And the mm1 is even an older machine, with less CPU and memory. So even if something is wrong with the logfile (or has non-textual chars), this would not explain why grep is so much faster on older Debian versions (and a older machine). Jan Op do 15 nov. 2018 om 12:55 schreef Santiago Ruano Rincón < santiag...@riseup.net>: > Control: tag -1 + moreinfo > > El 15/11/18 a las 12:18, Jan van den Berg escribió: > > Just a fraction better (with -a and LANG=C) > > Ran it multiple times, stays just under 6 seconds now: > > real 0m5.835s > > user 0m5.720s > > sys 0m0.060s > > Still a far cry from the original / other results (under a second). > > The logfile it greps is valid XML data. > > It can be valid XML, but that doesn't mean it doesn't have non-textual > characters (or invalid characters). > > Could you provide a way to reproduce this? > > Cheers, > > S >