On Jul 14, 2009, at 9:46 AM, Paul Corcoran wrote: > HI, > > I run a distributed Nagios environment consisting of 1 parent server > and 2 child servers. > > The child servers perform all the service checking while the parent > server should be performing active service checks.
Both the child server and the central server are performing active service checks? > The host definitions are configured to perform host checks every 5 > minutes. The retry interval is 1 minute and the max attempts is set > to 5. On both or are you submitting passive host checks or are you expecting the central machine to initiate it's own active checks of hosts? > We are monitoring 580 hosts and approx 4000 services. > > I noticed when a host down was detected the parent server did not > perform any retries of the host. This led to the status of the host > being stuck in a SOFT state and therefore no alerts were sent out as > required. I noticed that the child server performed the host checks > without any problem and the host was logged as being in a HARD down > state after 5 failed attempts. I'm not sure what configuration you could have that would lead to this. Can you post the host{} definition and any relevant log entries? Are you only sending a single passive host result and have 'passive_host_checks_are_soft' set in nagios.cfg? > Is there a specific variable in nagios.cfg that explicitly tells the > server to perform active checks? There are a few -- - in nagios.cfg - execute_host_checks=<0/1> - in your host definition - active_checks_enabled [0/1], an appropriate check_period, check_interval and retry_interval set and an appropriate check_command set. > Is it best practice to have the 2 child servers perform passive host > checks? I have no opinion on this other that to say that if you trust the remote nagios' to correctly report on services, they can usually be trusted to correctly report on hosts. > Is it possible that processing all the passive service check info is > causing the parent server to lag behind in it's own process queue? Not likely, IMHO, assuming you're using somewhat modern hardware. You can see for sure under Performance Info though. Look for high latencies (minutes)... This is a measure of how long after a check was scheduled to run that it actually it ran. -- Marc ------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null