I am running a distributed monitoring system using Nagios 2.11 on FreeBSD 6.3. I use NSCA to send host and services events to the central server from the slave servers and have always had the following problem:
A distributed server notices a host service is "non-Ok" and fires off check-host-alive. I have it set up to do check_ICMP and so it fires off five ICMP packets. Since the network isn't always perfect those five packets get dropped. However, I have my max_retry_interval set to 3 so it fires off another check_ICMP which completes just fine. As a result I see the following events take place on the slave server: [01-16-2009 15:18:46] HOST ALERT: s3200.blah.net;UP;SOFT;2;OK - 10.XX.XX.XX: rta 100.294ms, lost 0% [01-16-2009 15:18:46] HOST ALERT: s3200.blah.net;DOWN;SOFT;1;CRITICAL - 10.XX.XX.XX: rta nan, lost 100% However on the central server I see the following: [01-16-2009 15:19:02] HOST NOTIFICATION: NOC-email;s3200.blah.net;UP;host-notify-by-email;OK - 10.XX.XX.XX: rta 100.294ms, lost 0% [01-16-2009 15:19:01] HOST ALERT: s3200.blah.net;UP;HARD;1;OK - 10.XX.XX.XX: rta 100.294ms, lost 0% [01-16-2009 15:19:01] HOST NOTIFICATION: NOC-email;s3200.blah.net;DOWN;host-notify-by-email;CRITICAL - 10.XX.XX.XX: rta nan, lost 100% [01-16-2009 15:19:01] HOST ALERT: s3200.blah.net;DOWN;HARD;1;CRITICAL - 10.XX.XX.XX: rta nan, lost 100% The central server is immediately flagging the host as DOWN, HARD in spite of having the same max_retry_interval = 3 setting. On some hosts this is generating a tone of false "HOST DOWN" notifications. Is there any way to fix it? Jonathan Call This email message is intended for the use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. Verio, Inc. makes no warranty that this email is error or virus free. Thank you. ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null