On Monday 29 June 2009 19:04:44 Steve wrote: > Today my gentoo server that has sat happily churning my mundane (and > lightweight) tasks froze and I noticed when it stopped serving DNS > queries... and the server was even unresponsive from the command > prompt. I rebooted.... and was a bit taken aback at what I found. > > The server currently runs, but has a load of over 60, where I'd expect a > load of below 0.1. Investigations using top did not suggest that a > single process was using vast amounts of processing time... but there > were significantly more clamascan processes than I'd expect... and even > more procmail processes.... > > -- > $ ps auwx | grep clamscan | grep -v grep | wc -l > 42 > $ ps auwx | grep procmail | grep -v grep | wc -l > 94 > $ ps auwx | grep clamassassin | grep -v grep | wc -l > 55 > -- > > The first few lines from top say: > > -- > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 15451 usr 20 0 35944 33m 872 D 2.7 3.3 0:00.60 clamscan > 216 root 15 -5 0 0 0 S 0.7 0.0 0:03.80 kswapd0 > 15116 usr 20 0 76136 15m 668 D 0.7 1.6 0:03.30 clamscan > 15299 usr 20 0 2584 1224 840 R 0.7 0.1 0:04.36 top > 15428 usr 20 0 61288 57m 872 D 0.7 5.7 0:01.38 clamscan > 1 root 20 0 1648 196 172 S 0.0 0.0 0:00.64 init > 2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd > -- > > The procmail configuration I've adopted hasn't changed in years... > -- > DEFAULT=$HOME/.maildir/ > SHELL=/bin/sh > MAILDIR=$HOME/.maildir > > :0fw > > * < 1024000 > > | /usr/bin/clamassassin | /usr/bin/spamc -f > > -- > > I'm assuming that my suddenly starting to have problems with this is > something to do with an update to clamd/clamassassin... I've a vague > recollection that one or the other of them might have been updated when > I last synchronised and emerged updates... but I can't remember. > > Any ideas? This isn't a heavily loaded server usually - I've more > procmail processes than I usually receive in emails in an hour. > Something's wrong - can anyone offer any hints? Has anyone else run > into this problem? Is there a known 'quick fix'?
Looks like you have 200 processes sitting there blocking I/O. Is there anything related in the logs? Your best bet is to examine emerge.log (better still - genlop) and find all recent upgrades that might affect this. Then roll them back one by one till the problem goes away. Once you know the errant package, we can start to examine diffs and see why it might behave like that. -- alan dot mckinnon at gmail dot com