Hi list,

I've been investigating this problem for a while, but I couldn't find a good solution.

* Example situation :
Assume I have one host with 20 service checks.

* Problem :
If the host becomes DOWN, Nagios still continues to do service checks on this host. So, after a while, all the services will go to a CRITICAL state. Then, in my console, I will see :
 - 1 Host down,
 - 20 Services down
This information is not pertinent. The only information I would see in such a case is the "host down". The 20 "service down" informations are obvious, and generate a "visual pollution" that may prevent to easily identify the problem.

* Expected behavior :
When a host is down, I would like to :
- See only one thing in red in the console : 1 HOST DOWN
- Disabling all the service checks (which at this point do not have any chance of success)
- Put the service into "UNKNOWN" status

Comments:
In Nagios, there are parent/child dependencies. When a host is down, all the child hosts are not tested, and their status becomes "UNREACHABLE". Good thing. Same thing for services. But, as far as I know, there are no dependencies between a host and its services. I googled/read a lot of things in the docs. This seems to be "by design", there's no way to declare a service as a child of its (parent) host ! I didn't really understand the reasons of this choice, but I would like to work around.

Then I played around with event handlers. When a host status changes, the event handler calls a script. The script checks the status of the "calling" host. If the host is DOWN or UNREACHABLE, it sends back to Nagios an "external command" to disable all active service checks. If the status of the host is UP, then it sends the external command to enable all service checks for that particular host. It works. But there is some "latency" between the time the services are disabled by the eventhandler, and the time Nagios stops doing the service checks. Usually, some services are still checked, and provide unwanted "FAILED" status. I think this is because these checks were queued before the handler disabled them, thus they're executed. So I'm not s100% satisfied.

The next step would be to use service event handlers to put every service into "UNKNOWN" status each time a service check is disabled. But I have two problems : - In my external script, I can not determine if a service check is ENABLED of DISABLED. There are a lot of "macros" available, but none of them gives me this information. - This may not solve the "latency" problem, if I manually set an "UNKNOWN" status on a DISABLED service, but an active check is already in the queue, and its result will arrive later...

Of course, the ideal situation would be to have a parent/child dependancy acting between hosts and services...

Any comments and suggestions are welcome. Thank you in advance for your help.

Kind regards
--

*Toussaint OTTAVI*
*MEDI INFORMATIQUE*
***Mail:* [EMAIL PROTECTED]

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to