On 2009-06-03 12:03, Duncan Ferguson wrote
> Because the check is automatically added at reload time rather than
> configured via the interface, you need to amend the nagconfgen.pl
> script within nagios/bin (around line 470)
Thanks for the info.
I have just been looking at the check_opsview_slave_node script and I
can see that it...
1. checks the time on the master
2. connects to the slave and checks its time
3. evaluates the difference against a threshold and alerts if over
that threshold.
The problem is that $now on the master is defined before running the
ssh command on the slave - whereas it should be definied after.
As it stands the time it takes to run the ssh command is being added
to the difference between the time on the master and the time on the
slave - which is incorrect and leading to false positives.
For example, here follows the output (with debug added) of the
check_opsview_slave script as it stands now - with $now on the master
being defined prior to connecting to the slave...
[nag...@opsview]$ ./check_opsview_slave_node slave01
masternow=1244035619
slavenow=1244035625
difference=-6
duration=6.8445510864258
time=6.84455s;;
...whereas here is the output with $now defined after connecting to the slave...
[nag...@opsview]$ ./check_opsview_slave_node slave01
masternow=1244035671
slavenow=1244035671
difference=0
duration=6.65824294090271
time=6.65824s;;
As you can see the second version provides the correct output - that
regardless of how long it took to connect to the slave there is in
fact no difference between its time and that of the master.
To make this change here's the diff...
[nag...@opsview]$ diff check_opsview_slave_node check_opsview_slave_node_new
77d76
< my $now = time;
81a81
> my $now = time;
I have merely shifted the $now below the ssh command to the slave so
it now reads...
my @cmd =
$slavenode->ssh_command("/usr/local/nagios/bin/retrieve_opsview_info");
open F,"-|",@cmd or $np->nagios_exit(CRITICAL, "Cannot run ssh command
on master to slave");
my $info = <F>;
close F or $np->nagios_exit(CRITICAL, "Error retrieving slave
information - slave is likely to be down");
my $now = time;
chomp $info;
I hope this is valid, makes sense, and is of help to others :-)
_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/listinfo/opsview-users