The problem: how to filter out false alarms caused by short-time breaks in an unreliable network.
Think about a simple monitoring scenario in which you only want to ping various devices to see if they are up or not. So you have 200 hosts with only one service (PING) for each. For a reason or another, short-time breaks occur in the network. That is, a particular host does not reply to PINGs for e.g. 30 seconds. These breaks should not cause a notification to be sent. What comes to services, the filtering is easy with max_check_attempt and retry_check_interval. But the host check becomes a problem: after first PING failure (soft state) the host is checked, and there is no retry_check_interval for hosts. So the host is declared to be down (almost) immediately. The notifications about hosts can be delayed using first_notification_delay. This seems to work fine except for one thing: flap detection. Even if the notification is not sent, the host (and service) is logged to have changed state, and when enough such state changes occur, the host (and service) is placed in flapping state. I do not want to disable flapping detection (or flapping notifications) completely, because they might be useful in many cases. What I would like to achieve is not to count those short-time outages when computing flapping percent state changes. How can I accomplish that? Should I go ahead and disable host checks completely? If there only was retry_check_interval for hosts, it would solve all these problems. I think it is quite common that the short-time outages are network-related, i.e. the complete host is unresponsive instead of a single service. When this is taken into account, it seems weird that there is retry_check_interval for services but not for hosts. Or would it ruin the scheduling logic? Thanks for any help. -tuomas ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null