Re: [Nagios-users] Host down, still doing active checks, causing multiple unwanted service failures

Toussaint OTTAVI Tue, 09 Dec 2008 06:22:50 -0800

Toussaint OTTAVI a écrit:

Following this idea, I will investigate the following :
- Hosts associated themselves with parent/child relationship accordingto WAN topology (already working)- For each host, I will create a "parent" service with only acheck_alive command
- Every other service will be a child of this parent service

Answer to myself... After some investigations and doc readings :-) itseems I made a little confusion between "parent/child" and "dependency" :

- Parent/Child relationship is for hosts only, and should map networktopology. When a host is DOWN, all the children are set to UNREACHABLE.But this parent/child relationship does not exist for services.

- Dependency can be either for hosts or services. When a dependantobject is down, the "depended upon" object is not checked. But noassumption is made to the "depended upon" object status. Thus, it is notset to UNREACHABLE or UNKNOWN, such as for parent/child relationship.



Here's the actual situation :

- Creating a dependancy solves my problem of not checking services whenhosts are unreachable due to WAN failure. This is a smarter solutionthan my previous attempt using event_handlers and DISABLE_ALL_SVC_CHECKSexternal command. Using wildcards, I just have to declare one dependencyfor all services on several hosts like this :


 define servicedependency{

host_name Remote_WAN_Routerservice_description Remote WAN router ping testdependent_host_name REMOTE_HOST1, REMOTE_HOST2, ...,REMOTE_HOSTn

   dependent_service_description      *
   inherits_parent                    1
   execution_failure_criteria         w,u,c
     }

- Doing that, when the WAN fails, the checks are not executed, and theykeep their previous status. That's a good thing. But I would haveprefered they get the status UNKNOWN or UNREACHABLE. In fact, I wouldlike to have the same parent/child behavior that exists for hosts, butfor services.

- I'm not sure it will solve the "latency" problem : if a service checkattempt on remote_host occurs before the remote_wan_router is declaredDOWN and the dependency does its job, then I'll still get criticalfailures for those services. The console will display a mix of FAILEDservices (those executed before the WAN router check) and some OKservices (Previous state of services that will not be checked due todependency). This display would be completely wrong !

Again, in such a situation, I think the right display for services whosestatus could not be determined should be "UNKNOWN". Same as hosts thatare "UNREACHABLE"


Comments and ideas welcome.

Kind regards,
--

*Toussaint OTTAVI*
*MEDI INFORMATIQUE*
***Mail:* [EMAIL PROTECTED]

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/

_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Host down, still doing active checks, causing multiple unwanted service failures

Reply via email to