Bug#913657: grep: Regex grep on stretch is slower than jessie

2018-11-15 Thread Jan van den Berg
I can not really share the log file in question (it has sensitive customer
data), but I did run the following test.

root@piks:/tmp# time result=`grep -a 'best_bid' ff.log`
real0m0.026s
user0m0.020s
sys 0m0.004s
root@piks:/tmp# time result=`grep -a 'best_bid\|fixed' ff.log`
real0m1.881s
user0m1.868s
sys 0m0.008s
root@piks:/tmp# wc -l ff.log
790754 ff.log
root@piks:/tmp# locale  (this is normally UTF-8 but I changed it)
LANG=C
LANGUAGE=en_US:en
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=

root@piks:/tmp# cat /etc/debian_version
9.6

jan@mm1:~$ time result=`grep -a 'best_bid' ff.log`
real0m0.039s
user0m0.020s
sys 0m0.016s
jan@mm1:~$  time result=`grep -a 'best_bid\|fixed' ff.log`
real0m0.173s
user0m0.164s
sys 0m0.008s
jan@mm1:~$ wc -l ff.log
790754 ff.log
jan@mm1:~$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

jan@mm1:~$ cat /etc/debian_version
7.11

This is the exact same log file (I scp-ed it) and ran the commands, notice
how it takes more time on the Debian 9 machine than on the Debian 7 machine
(which is similar to my experience with Debian 8).
And the mm1 is even an older machine, with less CPU and memory.

So even if something is wrong with the logfile (or has non-textual chars),
this would not explain why grep is so much faster on older Debian versions
(and a older machine).

Jan

Op do 15 nov. 2018 om 12:55 schreef Santiago Ruano Rincón <
santiag...@riseup.net>:

> Control: tag -1 + moreinfo
>
> El 15/11/18 a las 12:18, Jan van den Berg escribió:
> >Just a fraction better (with -a and LANG=C)
> >Ran it multiple times, stays just under 6 seconds now:
> >real0m5.835s
> >user0m5.720s
> >sys 0m0.060s
> >Still a far cry from the original / other results (under a second).
> >The logfile it greps is valid XML data.
>
> It can be valid XML, but that doesn't mean it doesn't have non-textual
> characters (or invalid characters).
>
> Could you provide a way to reproduce this?
>
> Cheers,
>
> S
>


Bug#913657: grep: Regex grep on stretch is slower than jessie

2018-11-15 Thread Jan van den Berg
Just a fraction better (with -a and LANG=C)

Ran it multiple times, stays just under 6 seconds now:

real0m5.835s
user0m5.720s
sys 0m0.060s

Still a far cry from the original / other results (under a second).
The logfile it greps is valid XML data.

Jan


Op wo 14 nov. 2018 om 15:19 schreef Santiago Ruano Rincón <
santiag...@riseup.net>:

> Dear Jan,
>
> El 13/11/18 a las 17:09, Jan van den Berg escribió:
> > Package: grep
> > Version: 2.27-2
> > Severity: normal
> >
> > Dear Maintainer,
> >
> > I just upgraded from Debian 8 to 9 and noticed that a script which I run
> > several times per day was really slow:
> >
> > real0m6.384s
> > user0m6.288s
> > sys 0m0.036s
> >
> > This used to take well under a second.
> >
> > I dug a little deeper and noticed the problem was here:
> >
> > grep 'best_bid\|fixed_' /var/www/logs/large_log_file
> >
> > Playing around with the grep parameters en locale settings, and narrowed
> it
> > down to the regex, because this is way faster:
> >
> > grep -F best_bid /var/www/logs/large_log_file
> > grep -F fixed /var/www/logs/large_log_file
> >
> > So much faster in fact, that I can run 2 grep command faster than one.
> >
> > real0m0.199s
> > user0m0.108s
> > sys 0m0.032s
> >
> > However, this is strange and unexpected that after an upgrade a
> > unaltered grep script is slower. I dug a little deeper and it seem
> related to #761157
> > (and #18454) because of a change in de PCRE library between jessie and
> > stretch.
>
> I am not sure of that, since you are not using the -P matcher that
> relies on libpcre3.
>
> >
> > I have not seen a real fix yet (other than altering my script/grep
> commands), but I expect the regex library needs work, to match the previous
> behaviour so therefore I'm deeming it a 'bug'?
> …
>
> There have been behaviour changes between the version of grep released
> in jessie and stretch. See e.g. #891086.
>
> Could you please run your script with the -a option, and also setting
> LANG=C ? I suspect there is a non-textual file, a multi-byte encoding,
> or a wrong encoding causing your problem. Before going any further,
> please check that.
>
> Cheers,
>
> Santiago
>


Bug#913657: grep: Regex grep on stretch is slower than jessie

2018-11-13 Thread Jan van den Berg
Package: grep
Version: 2.27-2
Severity: normal

Dear Maintainer,

I just upgraded from Debian 8 to 9 and noticed that a script which I run
several times per day was really slow:

real0m6.384s
user0m6.288s
sys 0m0.036s

This used to take well under a second.

I dug a little deeper and noticed the problem was here:

grep 'best_bid\|fixed_' /var/www/logs/large_log_file

Playing around with the grep parameters en locale settings, and narrowed it
down to the regex, because this is way faster:

grep -F best_bid /var/www/logs/large_log_file
grep -F fixed /var/www/logs/large_log_file

So much faster in fact, that I can run 2 grep command faster than one.

real0m0.199s
user0m0.108s
sys 0m0.032s

However, this is strange and unexpected that after an upgrade a
unaltered grep script is slower. I dug a little deeper and it seem related to 
#761157
(and #18454) because of a change in de PCRE library between jessie and
stretch.

I have not seen a real fix yet (other than altering my script/grep commands), 
but I expect the regex library needs work, to match the previous behaviour so 
therefore I'm deeming it a 'bug'?

--
Jan


-- System Information:
Debian Release: 9.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-8-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages grep depends on:
ii  dpkg  1.18.25
ii  install-info  6.3.0.dfsg.1-1+b2
ii  libc6 2.24-11+deb9u3
ii  libpcre3  2:8.41-1+0~20180910100527.3+stretch~1.gbp97d153

grep recommends no packages.

Versions of packages grep suggests:
ii  libpcre3  2:8.41-1+0~20180910100527.3+stretch~1.gbp97d153

-- no debconf information