Thanks, had to install the sysstat package.  I'll watch both of those
(batch processing, writing to a file) and see if anything pops up.

On Wed, Feb 15, 2012 at 1:38 PM, The Donald Cowart <[email protected]>wrote:

> No problem Mike,
>
> Disk IO is actually easy to see, try "iostat 2".  iostat gives
> io-statistics (durh :-) ) for the disks in a system.  the 2 tells it
> to wait 2 seconds between runs and it runs until you cancel it.   You
> can also do "iostat 2 5", which is 5 runs 2 seconds apart.  It's not
> easily parse-able in it's normal output, but it does make it easy for
> human eyes.
>
> --Donald
>
> On Wed, Feb 15, 2012 at 1:20 PM, Dean, Mike <[email protected]> wrote:
> > Donald,
> >
> > Thanks for the tips.  Disk IO was my first thought, but I'm not sure of
> the
> > best way to keep an eye on that.  Is there a util/command that I can
> run/log
> > to see the disk IO?
> >
> > Mike
> >
> >
> > On Wed, Feb 15, 2012 at 1:16 PM, The Donald Cowart <[email protected]>
> > wrote:
> >>
> >> Hmmm... based on this I have two ideas about what it might be,
> >>
> >> It could be something doing a bunch of DNS queries and so processes
> >> are waiting for dig results or TCP timeouts, not sure how to prove
> >> that though.
> >>
> >> It might be something compressing/rotating log files all at once, just
> >> enough to slow the system down while it's happening.  Probably
> >> triggered within on of the applications for monitoring.  You may want
> >> to record some iostat values too just to see if disk activity is
> >> spiking during an event.
> >>
> >> I hope this helps!
> >>
> >> --Donald
> >>
> >> On Wed, Feb 15, 2012 at 12:57 PM, Dean, Mike <[email protected]>
> wrote:
> >> > Unfortunately, I have not been able to get a snapshot from top when it
> >> > is
> >> > running slow.  The slowdowns typically only last a few seconds and do
> >> > not
> >> > occur with any sort of regularity (that I've been able to determine,
> >> > anyway.
> >> >
> >> > I started a "top -b" piped to a file to see if I can catch a snapshot
> >> > during
> >> > a slow period.
> >> >
> >> > As for the box itself (and apologies for not including this info
> >> > originally), it is used as a network monitor/management station.  Two
> of
> >> > the
> >> > applications that run on it are Nagios (up/down and other monitoring)
> >> > which
> >> > has various checks that are shell based, Perl and compiled, and
> >> > Smokeping,
> >> > which sends out TCP and ICMP probes every 5 minutes to some hosts
> (less
> >> > than
> >> > 50) to record round trip times.
> >> >
> >> > It is also one of our syslog machines with a script that runs every 5
> >> > minutes parsing the log files (none of the files have grown to a large
> >> > size
> >> > or increased in syslog input/output).
> >> >
> >> > And, none of these systems has been changed in the last week.
> >> >
> >> > So, other than top, is there other things to check or monitors to set?
> >> >
> >> > Thanks again, in advance!
> >> >
> >> > On Wed, Feb 15, 2012 at 11:02 AM, The Donald Cowart <
> [email protected]>
> >> > wrote:
> >> >>
> >> >> Can you get the output from top during a slowdown or just after?
> >> >> Also, is the boxes' function a webserver, fileserver, mathematical
> >> >> processing, etc?  Was the box rebooted after the patching?
> >> >>
> >> >> Something that may help run "top -b" in a while loop (with a sleep in
> >> >> between runs) and dump it to a file or series of files, so you've got
> >> >> snapshots over time of the system performance to help troubleshoot
> >> >> this.
> >> >>
> >> >> --Donald
> >> >>
> >> >> On Wed, Feb 15, 2012 at 10:49 AM, Dean, Mike <[email protected]>
> >> >> wrote:
> >> >> > Hello all, hoping you can help.  We have a RedHat box that two days
> >> >> > ago
> >> >> > starting having periods of slow performance.  The slow down is bad
> >> >> > enough
> >> >> > that you can see it when trying to type at a terminal and some
> >> >> > processes,
> >> >> > such as SNMP, don't respond.  Users have also been disconnected.
> >> >> >
> >> >> > The last change that was made was applying the normal monthly
> patches
> >> >> > on
> >> >> > 2/7 (the problem only started showing up yesterday).  According to
> >> >> > the
> >> >> > information from 'top', the system seems to be fine.  A typical
> >> >> > snapshot
> >> >> > looks like this:
> >> >> >
> >> >> > Tasks: 282 total,   8 running, 274 sleeping,   0 stopped,   0
> zombie
> >> >> > Cpu(s):  1.1%us,  0.3%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,
> >> >> >  0.0%si,
> >> >> >  0.0%st
> >> >> > Mem:   3909268k total,  1669580k used,  2239688k free,   231832k
> >> >> > buffers
> >> >> > Swap:  6094840k total,        0k used,  6094840k free,   972756k
> >> >> > cached
> >> >> >
> >> >> > Any ideas on where to look?
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> > Mike
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Donald Cowart
> >> >> http://www.rdex.net/
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Donald Cowart
> >> http://www.rdex.net/
> >
> >
>
>
>
> --
> Donald Cowart
> http://www.rdex.net/
>

Reply via email to