Re: [Nagios-users] eventhandlers running when a dependent service dependency is not satisfied

2005-12-09 Thread Eli Stair


Thanks a million for pointing out the 'SCHEDULE_FORCED_SVC_CHECK', I'm 
now rewriting and testing the event handlers to take care of this.  If 
only there were a macro/variable of the master service... looking for a 
lightweight way to determine the service_description to pass to the 
macro that is the direct parent of the check that just failed.


WRT the SSH/SNMP dependency issue, I have a feeling that I'm missing 
something here altogether, or didn't include enough info in my initial 
report, as both you and Hugo mentioned a possible issue with this.


To be clear, I'm doing this only so that if a dependent service IS down 
(Ganglia) and SNMP has been shown to be up (after 
'SCHEDULE_FORCED_SVC_CHECK',) I need to (or want to) make sure that SSH 
is running before attempting to connect.  There are enough failure modes 
that occur causing SSH to die at the same time as other services that I 
want to avoid a bunch of high-latency/timeout/CPU event handlers running 
if they are bound to fail.


Thanks for the accurate pointer to that macro,

Cheers,

/eli


Here's the output of view config showing that it is configured the way I 
think... just not sure if that is something I don't want to do :)



HostService HostService Dependency Type Dependency Failure Options
deathstar1001	SNMP-- Ganglia running 	deathstar1001	SNMP 	Notification 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- Ganglia running 	deathstar1001	SNMP 	Check 
Execution	Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- NTP running 	deathstar1001	SNMP 	Notification 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- NTP running 	deathstar1001	SNMP 	Check Execution 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- cron running 	deathstar1001	SNMP 	Notification 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- cron running 	deathstar1001	SNMP 	Check Execution 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- automounter running 4 instances 	deathstar1001 
SNMP 	Notification	Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- automounter running 4 instances 	deathstar1001 
SNMP 	Check Execution	Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- load -lt 4 	deathstar1001	SNMP 	Notification 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- load -lt 4 	deathstar1001	SNMP 	Check Execution 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP 	deathstar1001	SSH 	Notification	Warning, Unknown, 
Critical, Pending
deathstar1001	SNMP 	deathstar1001	SSH 	Check Execution	Warning, Unknown, 
Critical, Pending


John P. Rouillard wrote:

Hi Eli:

You didn't say what version of nagios you are running so I'll assume
2.0.

In message [EMAIL PROTECTED],
Eli Stair writes:


The question comes down to this:

 Should a failed service check for a dependent trigger a check of its 
parent before continuing?



IIRC from the code it does not force a check of the parent service. I
can see arguments for and against forcing a poll of the parent. Also
the documentation:

  http://nagios.sourceforge.net/docs/2_0/dependencies.html

in the How Service Dependencies Are Tested section, says:

  Nagios gets the current status of the service that is being depended upon.

not nagios repolls the service being depended upon. A footnote
says:

  by default, Nagios will use the most current hard state of the
  service(s) that is/are being depended upon

an option in the config file will allow it to use the current soft
state instead. I use the soft state of the service being depended upon
myself.



If this is not the case, or default, is there _ANY_ way to implement this?



Sort of. The event handler for the child can send a
SCHEDULE_FORCED_SVC_CHECK external command for the parent specifying
the current time in seconds. See

 
http://www.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=129

for details. The command will be acted upon immediately since nagios
reads the external command file after an event handler runs. Use this
to force an update of the current service status for the parent. Parse
through the objects.cache (probably in /var/log/nagios/objects.cache)
file for the expanded servicedependency objects to find the service
dependencies that match your host/service.

I set my nagios options so that:
  
	max_check_attempts(dependent)*retry_check_interval(dependent) 

normal_check_interval(parent)

This way the parent service will be checked at least once during the
soft error interval of the dependent service.



I want to avoid at all costs having an every-minute check of the parent
processes on many thousand hosts just to keep from having the child
process checks and event handlers going hay-wire.



You need to use the max_check_attempts to provide a buffer in which
the parent service will be checked. You can have your event handler
submit an external command on the first soft error and try to fix the
problem on a subsequent soft, or hard error. You don't have any of

Re: [Nagios-users] eventhandlers running when a dependent service dependency is not satisfied

2005-12-09 Thread John P. Rouillard

In message [EMAIL PROTECTED],
Eli Stair writes:
Thanks a million for pointing out the 'SCHEDULE_FORCED_SVC_CHECK', I'm 
now rewriting and testing the event handlers to take care of this.  If 
only there were a macro/variable of the master service... looking for a 
lightweight way to determine the service_description to pass to the 
macro that is the direct parent of the check that just failed.

One problem is that there can be multiple parents. It's a many to many
relationship. Parsing it from the objects cache is pain, but it works.
You could run a script that inverts the object.cache file for faster
lookup. Then your plugin runs the script which updates the cache only
if it is older than the object.cache then you query the inverted cache
file.

WRT the SSH/SNMP dependency issue, I have a feeling that I'm missing 
something here altogether, or didn't include enough info in my initial 
report, as both you and Hugo mentioned a possible issue with this.

To be clear, I'm doing this only so that if a dependent service IS down 
(Ganglia) and SNMP has been shown to be up (after 
'SCHEDULE_FORCED_SVC_CHECK',) I need to (or want to) make sure that SSH 
is running before attempting to connect.  There are enough failure modes 
that occur causing SSH to die at the same time as other services that I 
want to avoid a bunch of high-latency/timeout/CPU event handlers running 
if they are bound to fail.

SSH isn't required to do the monitoring. Its required for the
response. I would just handle the error in the event handler and
submit an apropriate passive response. Make the service have no valid
polling time and be volatile reporting only on wanrin and
critical. This will make errors in the event handler be reported.

-- rouilj
John Rouillard
===
My employers don't acknowledge my existence much less my opinions.



---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null