Re: [Nagios-users] UNKNOWN service state question

2009-02-17 Thread Marc Powell

On Feb 17, 2009, at 8:47 AM, Nicole Hähnel wrote:

> Hmm...
> And why is UNKNOWN a hard state and not soft?

Because it reached max_check_attempts. Nagios treats all non-OK states  
the same. If the plugin returns any non-OK state, the service is put  
into a soft state until max_check_attempts is reached, at which point  
the service becomes hard and the notification logic is processed. You  
can see this in nagios.log.

> I would say, unknown is like critical and I need OK to know if
> everything works allright in a network.

This seems contradictory. It's all right with you if things are down  
for an hour if you don't know it but not ok if it has been fixed and  
still shows down in nagios until the next normal check?

> So why should nagios wait for the next normal scheduled check if it is
> unknown if a service is OK?

Because that's what nagios does. Once max_check_attempts is reached  
for *any* non-OK state, nagios reverts back to normal_check_interval.  
That works well for most people and allows for a working monitoring  
system during major outages. If that doesn't work for you, you could  
use an event handler to lower normal_check_interval upon reaching a  
hard non-OK state and raise it back to normal upon reaching a hard OK  
state.

http://www.nagios.org/developerinfo/externalcommands/commandlist.php

CHANGE_NORMAL_HOST_CHECK_INTERVAL
CHANGE_NORMAL_SVC_CHECK_INTERVAL

--
Marc


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] UNKNOWN service state question

2009-02-17 Thread Nicole Hähnel
Hmm...
And why is UNKNOWN a hard state and not soft?
I would say, unknown is like critical and I need OK to know if 
everything works allright in a network.
So why should nagios wait for the next normal scheduled check if it is 
unknown if a service is OK?


Nicole


Thomas Guyot-Sionnest schrieb:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 17/02/09 03:56 AM, Nicole Hähnel wrote:
>   
>> Hi,
>>
>> I have several service checks via snmp which are checked only every hour.
>> This service checks return unknown if the host is down and nagios 
>> doesn't know this yet.
>> If the host goes up the service state is still unknown in hard state and 
>> only one time checked instead of the max_check_attempts of 3.
>> The problem is, it takes about one hour to the next service check or I 
>> have to reschedule the next check
>> if I don't want to remain the service in unknown state.
>>
>> Is this the right behavior for unknown states?
>> Why aren't unknown states treated as critical states?
>> The recovery of unknown state takes too long.
>> 
>
> The recovery time is just the same as critical. the retry_interval is
> used only during SOFT NON-OK states.
>
> You should most likely check more often. Most people run checks every 5
> minutes, if not 1 minutes. Hourly checks means that it can take over an
> hour to detect a service failure.
>
> If you absolutely want that, you could use event-handlers to force
> service checks upon host recovery, or configure adaptive monitoring to
> that services gets checked more often during non-OK states.
>
> - --
> Thomas
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFJmrOV6dZ+Kt5BchYRAnv+AJ9WImhMSaD6iV1QVQ9Yb19BtjybLQCg+wJv
> 8FXQ2VGLurbNkGU2FifoTa0=
> =bvYt
> -END PGP SIGNATURE-
>   


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] UNKNOWN service state question

2009-02-17 Thread Thomas Guyot-Sionnest
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17/02/09 03:56 AM, Nicole Hähnel wrote:
> Hi,
> 
> I have several service checks via snmp which are checked only every hour.
> This service checks return unknown if the host is down and nagios 
> doesn't know this yet.
> If the host goes up the service state is still unknown in hard state and 
> only one time checked instead of the max_check_attempts of 3.
> The problem is, it takes about one hour to the next service check or I 
> have to reschedule the next check
> if I don't want to remain the service in unknown state.
> 
> Is this the right behavior for unknown states?
> Why aren't unknown states treated as critical states?
> The recovery of unknown state takes too long.

The recovery time is just the same as critical. the retry_interval is
used only during SOFT NON-OK states.

You should most likely check more often. Most people run checks every 5
minutes, if not 1 minutes. Hourly checks means that it can take over an
hour to detect a service failure.

If you absolutely want that, you could use event-handlers to force
service checks upon host recovery, or configure adaptive monitoring to
that services gets checked more often during non-OK states.

- --
Thomas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJmrOV6dZ+Kt5BchYRAnv+AJ9WImhMSaD6iV1QVQ9Yb19BtjybLQCg+wJv
8FXQ2VGLurbNkGU2FifoTa0=
=bvYt
-END PGP SIGNATURE-

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] UNKNOWN service state question

2009-02-17 Thread Nicole Hähnel
Hi,

I have several service checks via snmp which are checked only every hour.
This service checks return unknown if the host is down and nagios 
doesn't know this yet.
If the host goes up the service state is still unknown in hard state and 
only one time checked instead of the max_check_attempts of 3.
The problem is, it takes about one hour to the next service check or I 
have to reschedule the next check
if I don't want to remain the service in unknown state.

Is this the right behavior for unknown states?
Why aren't unknown states treated as critical states?
The recovery of unknown state takes too long.

Thank you!
Nicole

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null