Bug#913657: grep: Regex grep on stretch is slower than jessie
I can not really share the log file in question (it has sensitive customer data), but I did run the following test. root@piks:/tmp# time result=`grep -a 'best_bid' ff.log` real0m0.026s user0m0.020s sys 0m0.004s root@piks:/tmp# time result=`grep -a 'best_bid\|fixed' ff.log` real0m1.881s user0m1.868s sys 0m0.008s root@piks:/tmp# wc -l ff.log 790754 ff.log root@piks:/tmp# locale (this is normally UTF-8 but I changed it) LANG=C LANGUAGE=en_US:en LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL= root@piks:/tmp# cat /etc/debian_version 9.6 jan@mm1:~$ time result=`grep -a 'best_bid' ff.log` real0m0.039s user0m0.020s sys 0m0.016s jan@mm1:~$ time result=`grep -a 'best_bid\|fixed' ff.log` real0m0.173s user0m0.164s sys 0m0.008s jan@mm1:~$ wc -l ff.log 790754 ff.log jan@mm1:~$ locale LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= jan@mm1:~$ cat /etc/debian_version 7.11 This is the exact same log file (I scp-ed it) and ran the commands, notice how it takes more time on the Debian 9 machine than on the Debian 7 machine (which is similar to my experience with Debian 8). And the mm1 is even an older machine, with less CPU and memory. So even if something is wrong with the logfile (or has non-textual chars), this would not explain why grep is so much faster on older Debian versions (and a older machine). Jan Op do 15 nov. 2018 om 12:55 schreef Santiago Ruano Rincón < santiag...@riseup.net>: > Control: tag -1 + moreinfo > > El 15/11/18 a las 12:18, Jan van den Berg escribió: > >Just a fraction better (with -a and LANG=C) > >Ran it multiple times, stays just under 6 seconds now: > >real0m5.835s > >user0m5.720s > >sys 0m0.060s > >Still a far cry from the original / other results (under a second). > >The logfile it greps is valid XML data. > > It can be valid XML, but that doesn't mean it doesn't have non-textual > characters (or invalid characters). > > Could you provide a way to reproduce this? > > Cheers, > > S >
Bug#913657: grep: Regex grep on stretch is slower than jessie
Just a fraction better (with -a and LANG=C) Ran it multiple times, stays just under 6 seconds now: real0m5.835s user0m5.720s sys 0m0.060s Still a far cry from the original / other results (under a second). The logfile it greps is valid XML data. Jan Op wo 14 nov. 2018 om 15:19 schreef Santiago Ruano Rincón < santiag...@riseup.net>: > Dear Jan, > > El 13/11/18 a las 17:09, Jan van den Berg escribió: > > Package: grep > > Version: 2.27-2 > > Severity: normal > > > > Dear Maintainer, > > > > I just upgraded from Debian 8 to 9 and noticed that a script which I run > > several times per day was really slow: > > > > real0m6.384s > > user0m6.288s > > sys 0m0.036s > > > > This used to take well under a second. > > > > I dug a little deeper and noticed the problem was here: > > > > grep 'best_bid\|fixed_' /var/www/logs/large_log_file > > > > Playing around with the grep parameters en locale settings, and narrowed > it > > down to the regex, because this is way faster: > > > > grep -F best_bid /var/www/logs/large_log_file > > grep -F fixed /var/www/logs/large_log_file > > > > So much faster in fact, that I can run 2 grep command faster than one. > > > > real0m0.199s > > user0m0.108s > > sys 0m0.032s > > > > However, this is strange and unexpected that after an upgrade a > > unaltered grep script is slower. I dug a little deeper and it seem > related to #761157 > > (and #18454) because of a change in de PCRE library between jessie and > > stretch. > > I am not sure of that, since you are not using the -P matcher that > relies on libpcre3. > > > > > I have not seen a real fix yet (other than altering my script/grep > commands), but I expect the regex library needs work, to match the previous > behaviour so therefore I'm deeming it a 'bug'? > … > > There have been behaviour changes between the version of grep released > in jessie and stretch. See e.g. #891086. > > Could you please run your script with the -a option, and also setting > LANG=C ? I suspect there is a non-textual file, a multi-byte encoding, > or a wrong encoding causing your problem. Before going any further, > please check that. > > Cheers, > > Santiago >
Bug#913657: grep: Regex grep on stretch is slower than jessie
Package: grep Version: 2.27-2 Severity: normal Dear Maintainer, I just upgraded from Debian 8 to 9 and noticed that a script which I run several times per day was really slow: real0m6.384s user0m6.288s sys 0m0.036s This used to take well under a second. I dug a little deeper and noticed the problem was here: grep 'best_bid\|fixed_' /var/www/logs/large_log_file Playing around with the grep parameters en locale settings, and narrowed it down to the regex, because this is way faster: grep -F best_bid /var/www/logs/large_log_file grep -F fixed /var/www/logs/large_log_file So much faster in fact, that I can run 2 grep command faster than one. real0m0.199s user0m0.108s sys 0m0.032s However, this is strange and unexpected that after an upgrade a unaltered grep script is slower. I dug a little deeper and it seem related to #761157 (and #18454) because of a change in de PCRE library between jessie and stretch. I have not seen a real fix yet (other than altering my script/grep commands), but I expect the regex library needs work, to match the previous behaviour so therefore I'm deeming it a 'bug'? -- Jan -- System Information: Debian Release: 9.6 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-8-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages grep depends on: ii dpkg 1.18.25 ii install-info 6.3.0.dfsg.1-1+b2 ii libc6 2.24-11+deb9u3 ii libpcre3 2:8.41-1+0~20180910100527.3+stretch~1.gbp97d153 grep recommends no packages. Versions of packages grep suggests: ii libpcre3 2:8.41-1+0~20180910100527.3+stretch~1.gbp97d153 -- no debconf information