I'm currently experimenting with using check_snmp_load.pl to alarm on system overload.
Monitoring CPU usage is giving me a lot of false alarms due to their instantaneous nature. I'm getting good results by using the NETSL option to report load averages. I'm setting '-c 99,4,10' to basically ignore the 1 minute value and alarm on 5 and 15 minutes. Unfortunately, unlike the CPU percentages, the load numbers should be based on the number of processors. The NETSL option doesn't do that. One option is to have a series of service commands based on the number of processors, but I'm considering writing a new mode that will using the "STAND" option to get the number of CPUs and then use that as a multiplication factor for alarms. Does that make sense? Surely others have run into this problem. How do you alarm on excessive load w/o causing lots of false alarms. Robert ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null