Re: Determining cause of system slowdowns

The Donald Cowart Wed, 15 Feb 2012 10:16:27 -0800

Hmmm... based on this I have two ideas about what it might be,

It could be something doing a bunch of DNS queries and so processes
are waiting for dig results or TCP timeouts, not sure how to prove
that though.


It might be something compressing/rotating log files all at once, just
enough to slow the system down while it's happening.  Probably
triggered within on of the applications for monitoring.  You may want
to record some iostat values too just to see if disk activity is
spiking during an event.

I hope this helps!

--Donald

On Wed, Feb 15, 2012 at 12:57 PM, Dean, Mike <[email protected]> wrote:
> Unfortunately, I have not been able to get a snapshot from top when it is
> running slow.  The slowdowns typically only last a few seconds and do not
> occur with any sort of regularity (that I've been able to determine, anyway.
>
> I started a "top -b" piped to a file to see if I can catch a snapshot during
> a slow period.
>
> As for the box itself (and apologies for not including this info
> originally), it is used as a network monitor/management station.  Two of the
> applications that run on it are Nagios (up/down and other monitoring) which
> has various checks that are shell based, Perl and compiled, and Smokeping,
> which sends out TCP and ICMP probes every 5 minutes to some hosts (less than
> 50) to record round trip times.
>
> It is also one of our syslog machines with a script that runs every 5
> minutes parsing the log files (none of the files have grown to a large size
> or increased in syslog input/output).
>
> And, none of these systems has been changed in the last week.
>
> So, other than top, is there other things to check or monitors to set?
>
> Thanks again, in advance!
>
> On Wed, Feb 15, 2012 at 11:02 AM, The Donald Cowart <[email protected]>
> wrote:
>>
>> Can you get the output from top during a slowdown or just after?
>> Also, is the boxes' function a webserver, fileserver, mathematical
>> processing, etc?  Was the box rebooted after the patching?
>>
>> Something that may help run "top -b" in a while loop (with a sleep in
>> between runs) and dump it to a file or series of files, so you've got
>> snapshots over time of the system performance to help troubleshoot
>> this.
>>
>> --Donald
>>
>> On Wed, Feb 15, 2012 at 10:49 AM, Dean, Mike <[email protected]> wrote:
>> > Hello all, hoping you can help.  We have a RedHat box that two days ago
>> > starting having periods of slow performance.  The slow down is bad
>> > enough
>> > that you can see it when trying to type at a terminal and some
>> > processes,
>> > such as SNMP, don't respond.  Users have also been disconnected.
>> >
>> > The last change that was made was applying the normal monthly patches on
>> > 2/7 (the problem only started showing up yesterday).  According to the
>> > information from 'top', the system seems to be fine.  A typical snapshot
>> > looks like this:
>> >
>> > Tasks: 282 total,   8 running, 274 sleeping,   0 stopped,   0 zombie
>> > Cpu(s):  1.1%us,  0.3%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,  0.0%si,
>> >  0.0%st
>> > Mem:   3909268k total,  1669580k used,  2239688k free,   231832k buffers
>> > Swap:  6094840k total,        0k used,  6094840k free,   972756k cached
>> >
>> > Any ideas on where to look?
>> >
>> > Thanks!
>> >
>> > Mike
>>
>>
>>
>> --
>> Donald Cowart
>> http://www.rdex.net/
>
>



-- 
Donald Cowart
http://www.rdex.net/

---------------------------------------------------------------------
Archive      http://marc.info/?l=jaxlug-list&r=1&w=2
RSS Feed     http://www.mail-archive.com/[email protected]/maillist.xml
Unsubscribe  [email protected]

Re: Determining cause of system slowdowns

Reply via email to