Re: [Nagios-users] UNKNOWN service state question
On Feb 17, 2009, at 8:47 AM, Nicole Hähnel wrote: > Hmm... > And why is UNKNOWN a hard state and not soft? Because it reached max_check_attempts. Nagios treats all non-OK states the same. If the plugin returns any non-OK state, the service is put into a soft state until max_check_attempts is reached, at which point the service becomes hard and the notification logic is processed. You can see this in nagios.log. > I would say, unknown is like critical and I need OK to know if > everything works allright in a network. This seems contradictory. It's all right with you if things are down for an hour if you don't know it but not ok if it has been fixed and still shows down in nagios until the next normal check? > So why should nagios wait for the next normal scheduled check if it is > unknown if a service is OK? Because that's what nagios does. Once max_check_attempts is reached for *any* non-OK state, nagios reverts back to normal_check_interval. That works well for most people and allows for a working monitoring system during major outages. If that doesn't work for you, you could use an event handler to lower normal_check_interval upon reaching a hard non-OK state and raise it back to normal upon reaching a hard OK state. http://www.nagios.org/developerinfo/externalcommands/commandlist.php CHANGE_NORMAL_HOST_CHECK_INTERVAL CHANGE_NORMAL_SVC_CHECK_INTERVAL -- Marc -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] UNKNOWN service state question
Hmm... And why is UNKNOWN a hard state and not soft? I would say, unknown is like critical and I need OK to know if everything works allright in a network. So why should nagios wait for the next normal scheduled check if it is unknown if a service is OK? Nicole Thomas Guyot-Sionnest schrieb: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 17/02/09 03:56 AM, Nicole Hähnel wrote: > >> Hi, >> >> I have several service checks via snmp which are checked only every hour. >> This service checks return unknown if the host is down and nagios >> doesn't know this yet. >> If the host goes up the service state is still unknown in hard state and >> only one time checked instead of the max_check_attempts of 3. >> The problem is, it takes about one hour to the next service check or I >> have to reschedule the next check >> if I don't want to remain the service in unknown state. >> >> Is this the right behavior for unknown states? >> Why aren't unknown states treated as critical states? >> The recovery of unknown state takes too long. >> > > The recovery time is just the same as critical. the retry_interval is > used only during SOFT NON-OK states. > > You should most likely check more often. Most people run checks every 5 > minutes, if not 1 minutes. Hourly checks means that it can take over an > hour to detect a service failure. > > If you absolutely want that, you could use event-handlers to force > service checks upon host recovery, or configure adaptive monitoring to > that services gets checked more often during non-OK states. > > - -- > Thomas > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.6 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFJmrOV6dZ+Kt5BchYRAnv+AJ9WImhMSaD6iV1QVQ9Yb19BtjybLQCg+wJv > 8FXQ2VGLurbNkGU2FifoTa0= > =bvYt > -END PGP SIGNATURE- > -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] UNKNOWN service state question
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 17/02/09 03:56 AM, Nicole Hähnel wrote: > Hi, > > I have several service checks via snmp which are checked only every hour. > This service checks return unknown if the host is down and nagios > doesn't know this yet. > If the host goes up the service state is still unknown in hard state and > only one time checked instead of the max_check_attempts of 3. > The problem is, it takes about one hour to the next service check or I > have to reschedule the next check > if I don't want to remain the service in unknown state. > > Is this the right behavior for unknown states? > Why aren't unknown states treated as critical states? > The recovery of unknown state takes too long. The recovery time is just the same as critical. the retry_interval is used only during SOFT NON-OK states. You should most likely check more often. Most people run checks every 5 minutes, if not 1 minutes. Hourly checks means that it can take over an hour to detect a service failure. If you absolutely want that, you could use event-handlers to force service checks upon host recovery, or configure adaptive monitoring to that services gets checked more often during non-OK states. - -- Thomas -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJmrOV6dZ+Kt5BchYRAnv+AJ9WImhMSaD6iV1QVQ9Yb19BtjybLQCg+wJv 8FXQ2VGLurbNkGU2FifoTa0= =bvYt -END PGP SIGNATURE- -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] UNKNOWN service state question
Hi, I have several service checks via snmp which are checked only every hour. This service checks return unknown if the host is down and nagios doesn't know this yet. If the host goes up the service state is still unknown in hard state and only one time checked instead of the max_check_attempts of 3. The problem is, it takes about one hour to the next service check or I have to reschedule the next check if I don't want to remain the service in unknown state. Is this the right behavior for unknown states? Why aren't unknown states treated as critical states? The recovery of unknown state takes too long. Thank you! Nicole -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null