Your message dated Thu, 28 Apr 2016 15:37:46 +0200
with message-id <[email protected]>
and subject line Re: grep: pathetically slow for some REs; fine in old stable
has caused the Debian Bug report #649109,
regarding grep: pathetically slow for some REs; fine in old stable
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
649109: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649109
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: grep
Version: 2.6.3-3
Severity: normal



*** FILE
//my comments on lines starting with //
//MAINTAINER(S) MAY WISH TO INCREASE BUG PRIORITY based on bug scope
//and impact (it may cause things to quite unexpectedly fail or
//consume excessive resources and time, where such was not the case
//before)
//bug may - or may not - be related to (or "same"?) as bug 503658

//under at least certain not-too-unusual circumstances, grep RE
//performance is abysmal, e.g.:
$ time grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    1m7.503s
user    1m7.432s
sys     0m0.012s
$
//top(1) also shows us excessive CPU consumption:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3193 mpaoli    20   0  7756 1136  692 R 97.8  0.2   0:24.09 grep
//I did also use strace(1) - didn't seem to show anything particularly
//unusual - seems the bug consumes excess CPU (is quite CPU bound),
//but no obvious excessive system calls or unusual delays on any
//system calls noted in strace(1) output

//however, when we add the -i option the performance for the above
//becomes quite reasonable:
$ time grep -i '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
19

real    0m0.582s
user    0m0.580s
sys     0m0.004s
$

//likewise performance is fine if we use LC_ALL=C
$ time LC_ALL=C grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    0m0.390s
user    0m0.392s
sys     0m0.000s
$

//bug is also present if we explicitly use LC_ALL=en_US.UTF-8
$ time LC_ALL=en_US.UTF-8 grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    1m5.347s
user    1m5.320s
sys     0m0.008s
$

//bug appears to NOT be present in other common BRE utilities, e.g.
//sed(1), ex(1), ed(1):
$ time sed -ne '/^\(.\)\(.\).\2\1$/p' /usr/share/dict/words | wc -l
16

real    0m0.267s
user    0m0.256s
sys     0m0.012s
$ time ex /usr/share/dict/words << \__EOF__ | wc -l
> g/^\(.\)\(.\).\2\1$/p
> q
> __EOF__
16

real    0m1.004s
user    0m0.920s
sys     0m0.020s
$ time ed /usr/share/dict/words << \__EOF__ | wc -l
> g/^\(.\)\(.\).\2\1$/p
> q
> __EOF__
931708
16

real    0m0.300s
user    0m0.292s
sys     0m0.008s
$

//for the examples above, most any relatively similar file could be used
//instead of /usr/share/dict/words, I specifically used (in case it
//matters):
$ dpkg -S /usr/share/dict/words
diversion by dictionaries-common from: /usr/share/dict/words
diversion by dictionaries-common to: 
/usr/share/dict/words.pre-dictionaries-common
wamerican, dictionaries-common: /usr/share/dict/words
$ dpkg -l dictionaries-common | tail -n 1
ii  dictionaries-common                  1.5.17                            
Common utilities for spelling dictionary tools
$
//and locale information (unless/except where explicity shown set
//differently above)
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$

//bug is NOT present in old stable:
$ cat /etc/debian_version
5.0.9
$ time grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    0m0.925s
user    0m0.896s
sys     0m0.000s
$

//even if we explicitly set LC_ALL=en_US.UTF-8, bug still not present in
//old stable:
$ time LC_ALL=en_US.UTF-8 grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    0m0.825s
user    0m0.808s
sys     0m0.000s
$
//also bug not present in old stable with en_US.utf8
$ locale -a | fgrep -i en_us.utf
en_US.utf8
$ time LC_ALL=en_US.utf8 grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    0m0.814s
user    0m0.812s
sys     0m0.000s
$


-- System Information:
Debian Release: 6.0.3
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages grep depends on:
ii  dpkg                      1.15.8.11      Debian package management system
ii  install-info              4.13a.dfsg.1-6 Manage installed documentation in 
ii  libc6                     2.11.2-10      Embedded GNU C Library: Shared lib

grep recommends no packages.

Versions of packages grep suggests:
ii  libpcre3                      8.02-1.1   Perl 5 Compatible Regular Expressi



--- End Message ---
--- Begin Message ---
Closing this bug. It was fixed since 2.7-1

Santiago

Attachment: signature.asc
Description: PGP signature


--- End Message ---

Reply via email to