I can not really share the log file in question (it has sensitive customer
data), but I did run the following test.

root@piks:/tmp# time result=`grep -a 'best_bid' ff.log`
real    0m0.026s
user    0m0.020s
sys     0m0.004s
root@piks:/tmp# time result=`grep -a 'best_bid\|fixed' ff.log`
real    0m1.881s
user    0m1.868s
sys     0m0.008s
root@piks:/tmp# wc -l ff.log
790754 ff.log
root@piks:/tmp# locale  (this is normally UTF-8 but I changed it)
LANG=C
LANGUAGE=en_US:en
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=

root@piks:/tmp# cat /etc/debian_version
9.6

jan@mm1:~$ time result=`grep -a 'best_bid' ff.log`
real    0m0.039s
user    0m0.020s
sys     0m0.016s
jan@mm1:~$  time result=`grep -a 'best_bid\|fixed' ff.log`
real    0m0.173s
user    0m0.164s
sys     0m0.008s
jan@mm1:~$ wc -l ff.log
790754 ff.log
jan@mm1:~$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

jan@mm1:~$ cat /etc/debian_version
7.11

This is the exact same log file (I scp-ed it) and ran the commands, notice
how it takes more time on the Debian 9 machine than on the Debian 7 machine
(which is similar to my experience with Debian 8).
And the mm1 is even an older machine, with less CPU and memory.

So even if something is wrong with the logfile (or has non-textual chars),
this would not explain why grep is so much faster on older Debian versions
(and a older machine).

Jan

Op do 15 nov. 2018 om 12:55 schreef Santiago Ruano Rincón <
santiag...@riseup.net>:

> Control: tag -1 + moreinfo
>
> El 15/11/18 a las 12:18, Jan van den Berg escribió:
> >    Just a fraction better (with -a and LANG=C)
> >    Ran it multiple times, stays just under 6 seconds now:
> >    real    0m5.835s
> >    user    0m5.720s
> >    sys     0m0.060s
> >    Still a far cry from the original / other results (under a second).
> >    The logfile it greps is valid XML data.
>
> It can be valid XML, but that doesn't mean it doesn't have non-textual
> characters (or invalid characters).
>
> Could you provide a way to reproduce this?
>
> Cheers,
>
> S
>

Reply via email to