Thanks, had to install the sysstat package. I'll watch both of those (batch processing, writing to a file) and see if anything pops up.
On Wed, Feb 15, 2012 at 1:38 PM, The Donald Cowart <[email protected]>wrote: > No problem Mike, > > Disk IO is actually easy to see, try "iostat 2". iostat gives > io-statistics (durh :-) ) for the disks in a system. the 2 tells it > to wait 2 seconds between runs and it runs until you cancel it. You > can also do "iostat 2 5", which is 5 runs 2 seconds apart. It's not > easily parse-able in it's normal output, but it does make it easy for > human eyes. > > --Donald > > On Wed, Feb 15, 2012 at 1:20 PM, Dean, Mike <[email protected]> wrote: > > Donald, > > > > Thanks for the tips. Disk IO was my first thought, but I'm not sure of > the > > best way to keep an eye on that. Is there a util/command that I can > run/log > > to see the disk IO? > > > > Mike > > > > > > On Wed, Feb 15, 2012 at 1:16 PM, The Donald Cowart <[email protected]> > > wrote: > >> > >> Hmmm... based on this I have two ideas about what it might be, > >> > >> It could be something doing a bunch of DNS queries and so processes > >> are waiting for dig results or TCP timeouts, not sure how to prove > >> that though. > >> > >> It might be something compressing/rotating log files all at once, just > >> enough to slow the system down while it's happening. Probably > >> triggered within on of the applications for monitoring. You may want > >> to record some iostat values too just to see if disk activity is > >> spiking during an event. > >> > >> I hope this helps! > >> > >> --Donald > >> > >> On Wed, Feb 15, 2012 at 12:57 PM, Dean, Mike <[email protected]> > wrote: > >> > Unfortunately, I have not been able to get a snapshot from top when it > >> > is > >> > running slow. The slowdowns typically only last a few seconds and do > >> > not > >> > occur with any sort of regularity (that I've been able to determine, > >> > anyway. > >> > > >> > I started a "top -b" piped to a file to see if I can catch a snapshot > >> > during > >> > a slow period. > >> > > >> > As for the box itself (and apologies for not including this info > >> > originally), it is used as a network monitor/management station. Two > of > >> > the > >> > applications that run on it are Nagios (up/down and other monitoring) > >> > which > >> > has various checks that are shell based, Perl and compiled, and > >> > Smokeping, > >> > which sends out TCP and ICMP probes every 5 minutes to some hosts > (less > >> > than > >> > 50) to record round trip times. > >> > > >> > It is also one of our syslog machines with a script that runs every 5 > >> > minutes parsing the log files (none of the files have grown to a large > >> > size > >> > or increased in syslog input/output). > >> > > >> > And, none of these systems has been changed in the last week. > >> > > >> > So, other than top, is there other things to check or monitors to set? > >> > > >> > Thanks again, in advance! > >> > > >> > On Wed, Feb 15, 2012 at 11:02 AM, The Donald Cowart < > [email protected]> > >> > wrote: > >> >> > >> >> Can you get the output from top during a slowdown or just after? > >> >> Also, is the boxes' function a webserver, fileserver, mathematical > >> >> processing, etc? Was the box rebooted after the patching? > >> >> > >> >> Something that may help run "top -b" in a while loop (with a sleep in > >> >> between runs) and dump it to a file or series of files, so you've got > >> >> snapshots over time of the system performance to help troubleshoot > >> >> this. > >> >> > >> >> --Donald > >> >> > >> >> On Wed, Feb 15, 2012 at 10:49 AM, Dean, Mike <[email protected]> > >> >> wrote: > >> >> > Hello all, hoping you can help. We have a RedHat box that two days > >> >> > ago > >> >> > starting having periods of slow performance. The slow down is bad > >> >> > enough > >> >> > that you can see it when trying to type at a terminal and some > >> >> > processes, > >> >> > such as SNMP, don't respond. Users have also been disconnected. > >> >> > > >> >> > The last change that was made was applying the normal monthly > patches > >> >> > on > >> >> > 2/7 (the problem only started showing up yesterday). According to > >> >> > the > >> >> > information from 'top', the system seems to be fine. A typical > >> >> > snapshot > >> >> > looks like this: > >> >> > > >> >> > Tasks: 282 total, 8 running, 274 sleeping, 0 stopped, 0 > zombie > >> >> > Cpu(s): 1.1%us, 0.3%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, > >> >> > 0.0%si, > >> >> > 0.0%st > >> >> > Mem: 3909268k total, 1669580k used, 2239688k free, 231832k > >> >> > buffers > >> >> > Swap: 6094840k total, 0k used, 6094840k free, 972756k > >> >> > cached > >> >> > > >> >> > Any ideas on where to look? > >> >> > > >> >> > Thanks! > >> >> > > >> >> > Mike > >> >> > >> >> > >> >> > >> >> -- > >> >> Donald Cowart > >> >> http://www.rdex.net/ > >> > > >> > > >> > >> > >> > >> -- > >> Donald Cowart > >> http://www.rdex.net/ > > > > > > > > -- > Donald Cowart > http://www.rdex.net/ >

