Hendrik Bäcker wrote: I finally got this figured out, and thought I'd send a summary for the archives. Hopefully it will help others with the search engines and archives.
Keywords; nagios debian parent child network outage unwanted notifications > Your wanted thing is called "Network Outage". This is done by setting > parents and grandparents in the kind of the way that your nagios can > reach each host. I think, you are getting this right but: > > Nagios 2 only checks services, let us say, every 5 minutes. If only > one of 'x' services on a host returns with a non-OK state nagios will > try to check if the host is reachable via the host-check-command, if > given. If you don't set a host-check-command, nagios will never try > this and you are getting a service alert. > > > Network Outage detection relies on host checks. > So: no host checks, no outage detection. > >> and >> >> service checks are performed as long as the host is known or presumed to >> be up. >> > Service checks are performed as long as your nagios process is running > and the service check timeperiod is active. > AFAIK nagios tries to check a service even if it knows that the host > is down, the only difference between host is up or down is, that you > will receive x service alerts if x services are non-OK or just one > host alert. > > I think your logic failure is the "way" that you are thinking how > nagios works. > > Don't think on a checking way like this: > > Parent Host --> Host --> Service > > It is more like this: > > Service --> Host --> Parent > > Nagios intelligence is that it suppress notifications not the service > checks. > This is pretty much the key. Perhaps the documentation could be clearer; it's possible to read the documentation the right way, but a lot of people thought my configs looked right, so it's easy to read it the wrong way as well. The way I thought it would work is that one could define service checks for host "web", and define a parent for web of "pix", and that if host pix was down that services for web would stop. Or at the very least not be reported on, since web certainly couldn't be functional if its parents were down. However, that's not the way it works. Nagios2.x tries hard not to do host checks; it only performs host checks if service checks fail. It also doesn't walk up the tree to see if a parent host check fails a service check fails. Intuitively, that's the behavior I expected. The solution requires a host check be specified for every host with a service check. If the service check fails nagios will perform said host check, determine the host is unreachable. If a parent host (pix) is defined for the unreachable host (web) notifications will be surpressed. So if you don't want notifications about hosts beyond a gateway you have to define a host check on that host, not just the gateway parent. As an aside, the documentation alludes to performance issues from host check, most of which are based on pings. Is this due to the nastiness of handling icmp packets on their return? IE, when an icmp packet is received the kernel hands a copy of it to all processes listening for a reply; this gets ugly quickly if too many processes are pinging at once. If that's the case I have some code I'd be happy to donate that solves that particular problem. Thanks to all the list members who helped! --- David ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
