Hi,

Marc Powell a écrit:
Our ideas of accuracy would seem to differ ;)

Sometimes, in life, it's necessary to be able to say : "I don't know". When a host is simply powered off, or unreachable due to network/wan failure, Nagios actually displays all the service checks with the results depending on how the plugin is written, and also depending on the exact time when the latest service check has occurred. Some results may be UNKNOWN, some other may be CRITICAL, and the others would be OK (if dependancy is used).

This really bothers me, I do think this is inaccurate. In such a situation, I would expect all the services to be in "UNKNOWN" state.


We do not use email notifications, because we are only 2 guys, and this would generate too much messages.

It shouldn't. In your scenario of 1 host down with X number of services on it, you should only receive 1 down message and 1 recovery message per host event (unless you want more).

Nagios is smart enough, and notifications are very tunable, to avoid email notification floods. But other products, such as routers, firewalls or security software, are not. They used to fill our mailboxes with unuseful things. That's the reason why I don't like email notifications, at least for general purpose problems. I use them only for very critical events.

Moreover, parent/child system has been design exactly to handle the situation where a host is unreachable. This system allows to disable notifications for all services, which would necessary fail or return wrong results if host is unreachable. I would like to be able to use this system also do disable "incorrect" service status display, and, when a host is unreachable, having the display saying "UNKNOWN" for all services (such as hosts are displayed as UNREACHABLE).

This is the way I would like to see my results. This may not be the way other users would want to see them. But not two users are the same, have the same configuration, ot the same needs. I just would like to find a solution, allowing to display my results in a way that would be the most usable and valuable for me.


Possibly but with an additional requirement that regularly scheduled host checks are enabled for those hosts. Those are still considered optional and have been undesirable for all prior versions of nagios before current. If someone were to code the patch they would need to ensure they were enabled for the hosts with this new feature enabled otherwise the host would never be checked and return out of it's critical state.

I agree with you. Checks should be for services, and hosts should only be "containers" for services. Having to enable checks also for the hosts is a little bit confusing for beginners. I also consider host checks as "undesirable".

But, if I understand well, host checks are here to determine parent/child reachability, which then allows to determine UNREACHABLE status, then disable unuseful service failure notifications. Then, why not creating parent/child relationship between services ? This would remove the need of host checks, and this would allow services to be displayed as UNREACHABLE or UNKNOWN, if their parent service check fails.

Dependancy already exists for both hosts and services. Why not parent/child/unreachable relationship ?

Of course, this is only a feature suggestion, everybody should be free to use it or not. But I'll be happy to use it ;-)


This is promising. http://nagios.sourceforge.net/docs/3_0/objecttricks.html#same_host_dependency will help with the config if you haven't seen it.

It works fine. Ability to use wildcards is a great feature. Services now don't fail when a host is unreachable, but some problems (for me) remain : - all services keep their previous status, which is usually OK. As previously said, in such a situation, I would prefer UNKNOWN - "latency" problem : some service checks are sometimes scheduled AFTER the WAN failure, but BEFORE the dependancy service check. Then, they fail.Using "soft dependancy" and scheduling the dependancy service check more often, helps to reduce this situation. But it still happens from times to times.

 Am I the only one having this problem ?

I don't consider it a problem myself, just that nagios doesn't work as you want it to in your environment. I personally prefer the current behavior since it provides more accurate information over a wider variety of outage scenarios.

Let's be clear. Nagios has no problems, it behaves exactly as it is intended to. The one who as a problem is ME. I need to present the results in a different way in case of unreachable host, and I'm looking for a solution to do that.

I just would like to know if I am the only guy thinking results of service checks for unreachable hosts should be displayable differently ?

KInd regards,
--

*Toussaint OTTAVI*

*MEDI INFORMATIQUE*
***Mail:* [email protected]

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to