Hello,

I was able to figure out what was wrong, by enabling debug logging on the 
monitored machine. There I found this:

root@bravo:/etc/icinga2/conf.d# cat /var/log/icinga2/debug.log |grep check_procs
[2015-08-12 18:35:19 +0200] notice/Process: Running command 
'/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '250': PID 748
[2015-08-12 18:35:19 +0200] notice/Process: PID 748 
('/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '250') terminated with 
exit code 0
[2015-08-12 18:35:35 +0200] notice/Process: Running command 
'/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '320': PID 750
[2015-08-12 18:35:35 +0200] notice/Process: PID 750 
('/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '320') terminated with 
exit code 0
[2015-08-12 18:36:19 +0200] notice/Process: Running command 
'/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '250': PID 784
[2015-08-12 18:36:19 +0200] notice/Process: PID 784 
('/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '250') terminated with 
exit code 0

So it turned out that somehow the check_procs was being run twice, once with 
the default threshholds, and once with the correct threshholds. Looking a bit 
deeper I found that whenever there was the "320" threshold, which was correct, 
it was directly preceded by an ExecuteCommand message having been received from 
alpha, like this:

[2015-08-12 18:35:35 +0200] notice/ApiClient: Received 'event::ExecuteCommand' 
message from 'alpha.example.com'
[2015-08-12 18:35:35 +0200] notice/Process: Running command 
'/usr/lib/nagios/plugins/check_procs' '-c' '400' '-w' '320': PID 750

And there was no corresponding such message, for the ones with the 250 
threshhold.

Looking closer at the config, it turns out that there was a default service 
named "procs". I'm a bit hazy exactly how, but I guess that must have caused 
some kind of clash - where the remote-executed service checks were running with 
one set of parameters, and the locally-initiated ones were running with 
another, and the wrong service check results were then sent back to the master.

The whole approach in our setup is that we don't want checks to be initiated 
locally, so that configuration was spurious and unintentional, so we just got 
rid of that. After that, it works as expected.

Nice little gotcha, and shows the utility of checking debug logs on the 
monitored machine as well as the central one!

-----Ursprungligt meddelande-----
Från: icinga-users [mailto:[email protected]] För Per von 
Zweigbergk
Skickat: den 12 augusti 2015 12:28
Till: [email protected]
Ämne: [icinga-users] Setting threshholds in the host definition?

Hi.

I have the following setup:

A "central" Icinga2 instance (let’s call it "alpha"), that all the monitored 
hosts connect back to, with their own local Icinga2 installs.

On the central icinga2 instance, a host (let’s call it "bravo") is defined like 
this:

template Host "generic-host" {
  max_check_attempts = 3
  check_interval = 5m
  retry_interval = 1m
  check_command = "hostalive"
}
object Host "bravo.example.com" {
  import "generic-host"
  address = "bravo.example.com"
  vars.os = "Linux"     
  vars.procs_warning = 320
  vars.procs_critical = 400
  vars.notification["mail"] = {
    groups = [ "icingaadmins" ]
  }
}

And it has a service applied like this:

template Service "generic-service" {
  max_check_attempts = 5
  check_interval = 5m
  retry_interval = 1m
}
template Service "remote-service" {
  import "generic-service"
  command_endpoint = host.name
}
apply Service "procs" {
  import "remote-service"
  check_command = "procs"
  assign where (host.vars.os == "Linux" || host.vars.os == "FreeBSD")
}

The CheckCommand for procs is the ons that's in from start in 
/usr/share/icinga2/include/command-plugins.conf

Abbreviated below:

object CheckCommand "procs" {
        import "plugin-check-command"
        // ---- snip ----
        arguments = {
                "-w" = {
                        value = "$procs_warning$"
                        description = "Generate warning state if metric is 
outside this range"
                }
                "-c" = {
                        value = "$procs_critical$"
                        description = "Generate critical state if metric is 
outside this range"
                }
        // ---- snip ----
        }
        vars.procs_warning = 250
        vars.procs_critical = 400
}

What I'm expecting to happen, based on this documentation:

- 
http://docs.icinga.org/icinga2/snapshot/doc/module/icinga2/chapter/monitoring-basics#command-passing-parameters
- 
http://docs.icinga.org/icinga2/snapshot/doc/module/icinga2/chapter/monitoring-basics#macro-evaluation-order

... is that the variables "procs_warning" and "procs_critical" should override 
the defaults in the CheckCommand definition.

What actually happens is that I still get spurious warnings based on the 
default threshholds, for example, I have a warning if 

What am I missing in my understanding of how this is supposed to work?

-- 
IT-assistans Sverige AB
Per von Zweigbergk
Phone: +46 (0)8 522 192 15
Mail/Lync: [email protected]

♻ This e-mail was sent using 100% recycled electrons.

_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to