Package: grep
Version: 2.6.3-3
Severity: normal


*** FILE
//my comments on lines starting with //
//MAINTAINER(S) MAY WISH TO INCREASE BUG PRIORITY based on bug scope
//and impact (it may cause things to quite unexpectedly fail or
//consume excessive resources and time, where such was not the case
//before)
//bug may - or may not - be related to (or "same"?) as bug 503658

//under at least certain not-too-unusual circumstances, grep RE
//performance is abysmal, e.g.:
$ time grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    1m7.503s
user    1m7.432s
sys     0m0.012s
$
//top(1) also shows us excessive CPU consumption:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3193 mpaoli    20   0  7756 1136  692 R 97.8  0.2   0:24.09 grep
//I did also use strace(1) - didn't seem to show anything particularly
//unusual - seems the bug consumes excess CPU (is quite CPU bound),
//but no obvious excessive system calls or unusual delays on any
//system calls noted in strace(1) output

//however, when we add the -i option the performance for the above
//becomes quite reasonable:
$ time grep -i '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
19

real    0m0.582s
user    0m0.580s
sys     0m0.004s
$

//likewise performance is fine if we use LC_ALL=C
$ time LC_ALL=C grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    0m0.390s
user    0m0.392s
sys     0m0.000s
$

//bug is also present if we explicitly use LC_ALL=en_US.UTF-8
$ time LC_ALL=en_US.UTF-8 grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    1m5.347s
user    1m5.320s
sys     0m0.008s
$

//bug appears to NOT be present in other common BRE utilities, e.g.
//sed(1), ex(1), ed(1):
$ time sed -ne '/^\(.\)\(.\).\2\1$/p' /usr/share/dict/words | wc -l
16

real    0m0.267s
user    0m0.256s
sys     0m0.012s
$ time ex /usr/share/dict/words << \__EOF__ | wc -l
> g/^\(.\)\(.\).\2\1$/p
> q
> __EOF__
16

real    0m1.004s
user    0m0.920s
sys     0m0.020s
$ time ed /usr/share/dict/words << \__EOF__ | wc -l
> g/^\(.\)\(.\).\2\1$/p
> q
> __EOF__
931708
16

real    0m0.300s
user    0m0.292s
sys     0m0.008s
$

//for the examples above, most any relatively similar file could be used
//instead of /usr/share/dict/words, I specifically used (in case it
//matters):
$ dpkg -S /usr/share/dict/words
diversion by dictionaries-common from: /usr/share/dict/words
diversion by dictionaries-common to: 
/usr/share/dict/words.pre-dictionaries-common
wamerican, dictionaries-common: /usr/share/dict/words
$ dpkg -l dictionaries-common | tail -n 1
ii  dictionaries-common                  1.5.17                            
Common utilities for spelling dictionary tools
$
//and locale information (unless/except where explicity shown set
//differently above)
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$

//bug is NOT present in old stable:
$ cat /etc/debian_version
5.0.9
$ time grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    0m0.925s
user    0m0.896s
sys     0m0.000s
$

//even if we explicitly set LC_ALL=en_US.UTF-8, bug still not present in
//old stable:
$ time LC_ALL=en_US.UTF-8 grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    0m0.825s
user    0m0.808s
sys     0m0.000s
$
//also bug not present in old stable with en_US.utf8
$ locale -a | fgrep -i en_us.utf
en_US.utf8
$ time LC_ALL=en_US.utf8 grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | wc -l
16

real    0m0.814s
user    0m0.812s
sys     0m0.000s
$


-- System Information:
Debian Release: 6.0.3
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages grep depends on:
ii  dpkg                      1.15.8.11      Debian package management system
ii  install-info              4.13a.dfsg.1-6 Manage installed documentation in 
ii  libc6                     2.11.2-10      Embedded GNU C Library: Shared lib

grep recommends no packages.

Versions of packages grep suggests:
ii  libpcre3                      8.02-1.1   Perl 5 Compatible Regular Expressi



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to