Hello,
I have been experiencing some odd behavior with retry interval. Since I
was using a older version on Nagios, 3.5.1 to be more precisely, I had the
idea to try naemon to see if the same thing would happens, as, I thought,
this could be a bug long solved.
For my surprise, the same behavior happened on naemon 1.0.6, where it
will not respect the 1 minute retry interval, as you can see in the logs
bellow.
The monitoring has no latency and this behavior happens in different
installation sizes that we have.
define host {
_HOST_ID 29
host_name ns3
check_command check-host-alive
max_check_attempts 5
check_interval 5
check_period 24x7
retry_interval 1
NAEMON
[Sat Feb 25 02:00:54 2017] HOST ALERT: ns3;DOWN;SOFT;1;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:01:16 2017] HOST ALERT: ns3;DOWN;SOFT;2;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:02:01 2017] HOST ALERT: ns3;DOWN;SOFT;3;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:02:15 2017] HOST ALERT: ns3;DOWN;SOFT;4;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:02:28 2017] HOST ALERT: ns3;DOWN;HARD;5;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:02:59 2017] HOST ALERT: ns3;UP;HARD;1;OK -
ns3.opservices.com.br: , rta 167.203ms, lost 0%
NAGIOS 3.5.1
[Sat Feb 25 02:00:55 2017] HOST ALERT: ns3;DOWN;SOFT;1;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:01:29 2017] HOST ALERT: ns3;DOWN;SOFT;2;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:02:09 2017] HOST ALERT: ns3;DOWN;SOFT;3;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:02:29 2017] HOST ALERT: ns3;DOWN;SOFT;4;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:03:04 2017] HOST ALERT: ns3;DOWN;HARD;5;CRITICAL -
ns3.opservices.com.br: rta nan, lost 100%
[Sat Feb 25 02:03:09 2017] HOST ALERT: ns3;UP;HARD;1;OK -
ns3.opservices.com.br: , rta 171.734ms, lost 0%
Any ideas?
Alessandro Ren
[]s.