On 06/11/2011 05:45 PM, Massimiliano Ferrero wrote: >> > No, it happens only on heavy load, the storage server performs also > several other functions other than iscsi target, one of these is > monitoring all the virtual machines with nagios and munin: munin updates > run every 5 minutes with a lot of data written to disk, this has never > been a problem. > The only situation that triggers the problem is while performing an > operation like dd or a file system check that is really i/o intensive on > the storage server. > The second time I experienced the problem I was connected with ssh to > the system during the dd watching the log files: i saw real time the > abort messages and I was able to stop the dd and react to the failure > (restart cluster nodes, check virtual machines, etc.), this just to say > that it was not necessary to restart the iscsi target and that the > system was not unresponsive. > > I will update the bug as soon as I can test it on another machine.
I'm afraid this won't be an easy bug to root cause. But let's give it a shot. Next time, when you try to test/re-create the bug, capture dstat output. The default dstat output is good enough to report us on the system state was during starvation. -- Ritesh Raj Sarraf | http://people.debian.org/~rrs Debian - The Universal Operating System
signature.asc
Description: OpenPGP digital signature