Re: Determining cause of system slowdowns

The Donald Cowart Wed, 15 Feb 2012 10:38:26 -0800

No problem Mike,

Disk IO is actually easy to see, try "iostat 2".  iostat gives
io-statistics (durh :-) ) for the disks in a system.  the 2 tells it
to wait 2 seconds between runs and it runs until you cancel it.   You
can also do "iostat 2 5", which is 5 runs 2 seconds apart.  It's not
easily parse-able in it's normal output, but it does make it easy for
human eyes.


--Donald

On Wed, Feb 15, 2012 at 1:20 PM, Dean, Mike <[email protected]> wrote:
> Donald,
>
> Thanks for the tips.  Disk IO was my first thought, but I'm not sure of the
> best way to keep an eye on that.  Is there a util/command that I can run/log
> to see the disk IO?
>
> Mike
>
>
> On Wed, Feb 15, 2012 at 1:16 PM, The Donald Cowart <[email protected]>
> wrote:
>>
>> Hmmm... based on this I have two ideas about what it might be,
>>
>> It could be something doing a bunch of DNS queries and so processes
>> are waiting for dig results or TCP timeouts, not sure how to prove
>> that though.
>>
>> It might be something compressing/rotating log files all at once, just
>> enough to slow the system down while it's happening.  Probably
>> triggered within on of the applications for monitoring.  You may want
>> to record some iostat values too just to see if disk activity is
>> spiking during an event.
>>
>> I hope this helps!
>>
>> --Donald
>>
>> On Wed, Feb 15, 2012 at 12:57 PM, Dean, Mike <[email protected]> wrote:
>> > Unfortunately, I have not been able to get a snapshot from top when it
>> > is
>> > running slow.  The slowdowns typically only last a few seconds and do
>> > not
>> > occur with any sort of regularity (that I've been able to determine,
>> > anyway.
>> >
>> > I started a "top -b" piped to a file to see if I can catch a snapshot
>> > during
>> > a slow period.
>> >
>> > As for the box itself (and apologies for not including this info
>> > originally), it is used as a network monitor/management station.  Two of
>> > the
>> > applications that run on it are Nagios (up/down and other monitoring)
>> > which
>> > has various checks that are shell based, Perl and compiled, and
>> > Smokeping,
>> > which sends out TCP and ICMP probes every 5 minutes to some hosts (less
>> > than
>> > 50) to record round trip times.
>> >
>> > It is also one of our syslog machines with a script that runs every 5
>> > minutes parsing the log files (none of the files have grown to a large
>> > size
>> > or increased in syslog input/output).
>> >
>> > And, none of these systems has been changed in the last week.
>> >
>> > So, other than top, is there other things to check or monitors to set?
>> >
>> > Thanks again, in advance!
>> >
>> > On Wed, Feb 15, 2012 at 11:02 AM, The Donald Cowart <[email protected]>
>> > wrote:
>> >>
>> >> Can you get the output from top during a slowdown or just after?
>> >> Also, is the boxes' function a webserver, fileserver, mathematical
>> >> processing, etc?  Was the box rebooted after the patching?
>> >>
>> >> Something that may help run "top -b" in a while loop (with a sleep in
>> >> between runs) and dump it to a file or series of files, so you've got
>> >> snapshots over time of the system performance to help troubleshoot
>> >> this.
>> >>
>> >> --Donald
>> >>
>> >> On Wed, Feb 15, 2012 at 10:49 AM, Dean, Mike <[email protected]>
>> >> wrote:
>> >> > Hello all, hoping you can help.  We have a RedHat box that two days
>> >> > ago
>> >> > starting having periods of slow performance.  The slow down is bad
>> >> > enough
>> >> > that you can see it when trying to type at a terminal and some
>> >> > processes,
>> >> > such as SNMP, don't respond.  Users have also been disconnected.
>> >> >
>> >> > The last change that was made was applying the normal monthly patches
>> >> > on
>> >> > 2/7 (the problem only started showing up yesterday).  According to
>> >> > the
>> >> > information from 'top', the system seems to be fine.  A typical
>> >> > snapshot
>> >> > looks like this:
>> >> >
>> >> > Tasks: 282 total,   8 running, 274 sleeping,   0 stopped,   0 zombie
>> >> > Cpu(s):  1.1%us,  0.3%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,
>> >> >  0.0%si,
>> >> >  0.0%st
>> >> > Mem:   3909268k total,  1669580k used,  2239688k free,   231832k
>> >> > buffers
>> >> > Swap:  6094840k total,        0k used,  6094840k free,   972756k
>> >> > cached
>> >> >
>> >> > Any ideas on where to look?
>> >> >
>> >> > Thanks!
>> >> >
>> >> > Mike
>> >>
>> >>
>> >>
>> >> --
>> >> Donald Cowart
>> >> http://www.rdex.net/
>> >
>> >
>>
>>
>>
>> --
>> Donald Cowart
>> http://www.rdex.net/
>
>



-- 
Donald Cowart
http://www.rdex.net/

---------------------------------------------------------------------
Archive      http://marc.info/?l=jaxlug-list&r=1&w=2
RSS Feed     http://www.mail-archive.com/[email protected]/maillist.xml
Unsubscribe  [email protected]

Re: Determining cause of system slowdowns

Reply via email to