On Tue, Oct 06, 2015 at 11:50:00PM +0200, Jan Pokorný wrote: > On 06/10/15 10:28 +0200, Dejan Muhamedagic wrote: > > On Mon, Oct 05, 2015 at 07:00:18PM +0300, Vladislav Bogdanov wrote: > >> 14.09.2015 02:31, Andrew Beekhof wrote: > >>> > >>>> On 8 Sep 2015, at 10:18 pm, Ulrich Windl > >>>> <ulrich.wi...@rz.uni-regensburg.de> wrote: > >>>> > >>>>>>> Vladislav Bogdanov <bub...@hoster-ok.com> schrieb am 08.09.2015 um > >>>>>>> 14:05 in > >>>> Nachricht <55eecefb.8050...@hoster-ok.com>: > >>>>> Hi, > >>>>> > >>>>> just discovered very interesting issue. > >>>>> If there is a system user with very big UID (80000002 in my case), > >>>>> then crm_report (actually 'grep' it runs) consumes too much RAM. > >>>>> > >>>>> Relevant part of the process tree at that moment looks like (word-wrap > >>>>> off): > >>>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > >>>>> ... > >>>>> root 25526 0.0 0.0 106364 636 ? S 12:37 0:00 > >>>>> \_ > >>>>> /bin/sh /usr/sbin/crm_report --dest=/var/log/crm_report -f 0000-01-01 > >>>>> 00:00:00 > >>>>> root 25585 0.0 0.0 106364 636 ? S 12:37 0:00 > >>>>> \_ bash /var/log/crm_report/collector > >>>>> root 25613 0.0 0.0 106364 152 ? S 12:37 0:00 > >>>>> \_ bash /var/log/crm_report/collector > >>>>> root 25614 0.0 0.0 106364 692 ? S 12:37 0:00 > >>>>> \_ bash /var/log/crm_report/collector > >>>>> root 27965 4.9 0.0 100936 452 ? S 12:38 0:01 > >>>>> | \_ cat /var/log/lastlog > >>>>> root 27966 23.0 82.9 3248996 1594688 ? D 12:38 0:08 > >>>>> | \_ grep -l -e Starting Pacemaker > >>>>> root 25615 0.0 0.0 155432 600 ? S 12:37 0:00 > >>>>> \_ sort -u > >>>>> > >>>>> ls -ls /var/log/lastlog shows: > >>>>> 40 -rw-r--r--. 1 root root 23360000876 Sep 8 04:36 /var/log/lastlog > >>>>> > >>>>> That is sparse binary file, which consumes only 40k of disk space. > >>>>> At the same time its size is 23GB, and grep takes all the RAM trying to > >>>>> grep a string from a 23GB of mostly zeroes without new-lines. > >>>>> > >>>>> I believe this is worth fixing, > >>> > >>> Shouldn’t this be directed to the grep folks? > >> > >> Actually, not everything in /var/log are textual logs. Currently > >> findmsg() [z,bz,xz]cats _every_ file there and greps for a pattern. > >> Shouldn't it skip some well-known ones? btmp, lastlog and wtmp are > >> good candidates to be skipped. They are not intended to be handled > >> as a text. > >> > >> Or may be just test that file is a text in a find_decompressor() and > >> to not cat it if it is not? > >> > >> something like > >> find_decompressor() { > >> if echo $1 | grep -qs 'bz2$'; then > >> echo "bzip2 -dc" > >> elif echo $1 | grep -qs 'gz$'; then > >> echo "gzip -dc" > >> elif echo $1 | grep -qs 'xz$'; then > >> echo "xz -dc" > >> elif file $1 | grep -qs 'text'; then > >> echo "cat" > >> else > >> echo "echo" > > > > Good idea. > > Even better might be using process substitution and avoid cat'ing if > not needed even for plain text files, assuming GNU grep 2.13+ that, > in combination with kernel, attempts to detect sparse files, marking > them as binary files[1], which can then be utilized in combination > with -I option.
Something like the below, maybe. Untested direct-to-email PoC code. if echo . | grep -q -I . 2>/dev/null; then have_grep_dash_I=true else have_grep_dash_I=false fi # similar checks can be made for other decompressors mygrep() { ( # sub shell for ulimit # ulimit -v ... but maybe someone wants to mmap a huge file, # and limiting the virtual size cripples mmap unnecessarily, # so let's limit resident size instead. Let's be generous, when # decompressing stuff that was compressed with xz -9, we may # need ~65 MB according to my man page, and if it was generated # by something else, the decompressor may need even more. # Grep itself should not use much more than single digit MB, # so if the pipeline below needs more than 200 MB resident, # we probably are not interested in that file in any case. # ulimit -m 200000 # Actually no need for "local" anymore, # this is a subshell already. Just a habbit. local file=$1 case $file in *.bz2) bzgrep "$file";; # or bzip2 -dc | grep, if you prefer *.gz) zgrep "$file";; *.xz) xzgrep "$file";; # ... *) local file_type=$(file "$file") case $file_type in *text*) grep "$file" ;; *) # try anyways, let grep use its own heuristic $have_grep_dash_I && grep --binary-files=without-match "$file" ;; esac ;; esac ) } -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org