Hi, I'm using Nagios 2.9 on FreeBSD, on a wide area network that has remote networks scattered across the USA and Mexico.
We have a problem where latency on some remote circuits rises due to congestion. This means that various service checks time out, as they take more than 10 seconds to complete. (Yes, this is a real problem, and we're addressing it. I'm using smokeping to track latency at these sites now, analyzing traffic, etc.) When we get a latency delay, Nagios checks the host to see if it's alive. Latency is too high, so the host check times out. Host checks "crawl" up the chain to the parent router for the site, and flag it as down. The end result is that Nagios sees brief two-minute outages at the remote site. When we get a Nagios alert, it goes into our trouble ticket system and is distributed to the appropriate administrator. When the ticket is issued for latency, however, it is a) viewed as a "false positive" and b) detracts from real remote site outages. With Nagios 3 I would repeat the host check five minutes later before sending an alert. That's not an option in Nagios 2.9. I'm not entirely comfortable running beta code in this production environment, for political reasons rather than technical ones. I'd like to separate the latency problem from a site down problem. I can think of a couple ways to do this: 1) increase the 10-second maximum timeout for a service check to complete. Can this be done in Nagios? 2) have the trouble ticket system be a escalation contact that is only notified after the problem persists for five minutes. We're not using escalations today, but they can't be too hard. Has anyone dealt with this type of problem before? Any other suggestions or advice on monitoring and alarming in this sort of environment? Thanks, ==ml -- Michael W. Lucas [EMAIL PROTECTED], [EMAIL PROTECTED] http://www.BlackHelicopters.org/~mwlucas/ Coming Soon: "Absolute FreeBSD" -- http://www.AbsoluteFreeBSD.com On 5/4/2007, the TSA kept 3 pairs of my soiled undies "for security reasons." ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null