I'm currently experimenting with using check_snmp_load.pl to alarm on system 
overload.

Monitoring CPU usage is giving me a lot of false alarms due to their 
instantaneous nature.

I'm getting good results by using the NETSL option to report load averages.  
I'm setting '-c 99,4,10' to basically ignore the 1 minute value and alarm 
on 5 and 15 minutes.

Unfortunately, unlike the CPU percentages,  the load numbers should be based on 
the number of processors.  The NETSL option doesn't do that.

One option is to have a series of service commands based on the number of 
processors, but  I'm considering writing a new mode that will using the 
"STAND" option to get the number of CPUs and then use that as a multiplication 
factor for alarms.

Does that make sense?   Surely others have run into this problem.  How do you 
alarm on excessive load w/o causing lots of false alarms.

Robert





------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to