Hello,
I was able to figure out what was wrong, by enabling debug logging on the
monitored machine. There I found this:
root@bravo:/etc/icinga2/conf.d# cat /var/log/icinga2/debug.log |grep check_procs
[2015-08-12 18:35:19 +0200] notice/Process: Running command
'/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '250': PID 748
[2015-08-12 18:35:19 +0200] notice/Process: PID 748
('/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '250') terminated with
exit code 0
[2015-08-12 18:35:35 +0200] notice/Process: Running command
'/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '320': PID 750
[2015-08-12 18:35:35 +0200] notice/Process: PID 750
('/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '320') terminated with
exit code 0
[2015-08-12 18:36:19 +0200] notice/Process: Running command
'/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '250': PID 784
[2015-08-12 18:36:19 +0200] notice/Process: PID 784
('/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '250') terminated with
exit code 0
So it turned out that somehow the check_procs was being run twice, once with
the default threshholds, and once with the correct threshholds. Looking a bit
deeper I found that whenever there was the "320" threshold, which was correct,
it was directly preceded by an ExecuteCommand message having been received from
alpha, like this:
[2015-08-12 18:35:35 +0200] notice/ApiClient: Received 'event::ExecuteCommand'
message from 'alpha.example.com'
[2015-08-12 18:35:35 +0200] notice/Process: Running command
'/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '320': PID 750
And there was no corresponding such message, for the ones with the 250
threshhold.
Looking closer at the config, it turns out that there was a default service
named "procs". I'm a bit hazy exactly how, but I guess that must have caused
some kind of clash - where the remote-executed service checks were running with
one set of parameters, and the locally-initiated ones were running with
another, and the wrong service check results were then sent back to the master.
The whole approach in our setup is that we don't want checks to be initiated
locally, so that configuration was spurious and unintentional, so we just got
rid of that. After that, it works as expected.
Nice little gotcha, and shows the utility of checking debug logs on the
monitored machine as well as the central one!
-----Ursprungligt meddelande-----
Från: icinga-users [mailto:[email protected]] För Per von
Zweigbergk
Skickat: den 12 augusti 2015 12:28
Till: [email protected]
Ämne: [icinga-users] Setting threshholds in the host definition?
Hi.
I have the following setup:
A "central" Icinga2 instance (let’s call it "alpha"), that all the monitored
hosts connect back to, with their own local Icinga2 installs.
On the central icinga2 instance, a host (let’s call it "bravo") is defined like
this:
template Host "generic-host" {
max_check_attempts = 3
check_interval = 5m
retry_interval = 1m
check_command = "hostalive"
}
object Host "bravo.example.com" {
import "generic-host"
address = "bravo.example.com"
vars.os = "Linux"
vars.procs_warning = 320
vars.procs_critical = 400
vars.notification["mail"] = {
groups = [ "icingaadmins" ]
}
}
And it has a service applied like this:
template Service "generic-service" {
max_check_attempts = 5
check_interval = 5m
retry_interval = 1m
}
template Service "remote-service" {
import "generic-service"
command_endpoint = host.name
}
apply Service "procs" {
import "remote-service"
check_command = "procs"
assign where (host.vars.os == "Linux" || host.vars.os == "FreeBSD")
}
The CheckCommand for procs is the ons that's in from start in
/usr/share/icinga2/include/command-plugins.conf
Abbreviated below:
object CheckCommand "procs" {
import "plugin-check-command"
// ---- snip ----
arguments = {
"-w" = {
value = "$procs_warning$"
description = "Generate warning state if metric is
outside this range"
}
"-c" = {
value = "$procs_critical$"
description = "Generate critical state if metric is
outside this range"
}
// ---- snip ----
}
vars.procs_warning = 250
vars.procs_critical = 400
}
What I'm expecting to happen, based on this documentation:
-
http://docs.icinga.org/icinga2/snapshot/doc/module/icinga2/chapter/monitoring-basics#command-passing-parameters
-
http://docs.icinga.org/icinga2/snapshot/doc/module/icinga2/chapter/monitoring-basics#macro-evaluation-order
... is that the variables "procs_warning" and "procs_critical" should override
the defaults in the CheckCommand definition.
What actually happens is that I still get spurious warnings based on the
default threshholds, for example, I have a warning if
What am I missing in my understanding of how this is supposed to work?
--
IT-assistans Sverige AB
Per von Zweigbergk
Phone: +46 (0)8 522 192 15
Mail/Lync: [email protected]
♻ This e-mail was sent using 100% recycled electrons.
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users