Samuel Bancal wrote: > Nagios Core 3.2.0 > nagios-plugins-1.4.14 > Ubuntu server 8.04.3 LTS > > Hi, > > I'm encountering problems to configure the notifications in case a > server is no more responding to PING (ICMP). > I don't understand why Nagios is jumping over steps when it's doing > service-check "icmp". > Here is the config : > > define host{ > use generic-server > host_name server1 > alias server1 > address the.ip.the.ip > hostgroups prod-servers > contact_groups group1 > check_command check-host-alive > check_period 24x7 > check_interval 5 > retry_interval 1 > max_check_attempts 4 > notification_period 24x7 > notification_interval 60 > notification_options d,u,r > } > > define service{ > use generic-service > host_name server1 > service_description ICMP > check_command check_icmp!100.0,20%!500.0,60% > max_check_attempts 4 > normal_check_interval 5 > retry_check_interval 1 > notification_options w,u,c,r > notification_interval 60 > notification_period 24x7 > } > [...] > define command{ > command_name check-host-alive > command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c > 5000.0,100% -p 5 > } > define command{ > command_name check_icmp > command_line $USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$ -c > $ARG2$ -p 5 > } > [...] > > Here is an example of history that I get : > Service Critical[2010-02-16 11:33:13] SERVICE ALERT: > server1;ICMP;CRITICAL;SOFT;1;CRITICAL - the.ip.the.ip: rta nan, lost 100% > Host Down[2010-02-16 11:33:43] HOST ALERT: server1;DOWN;SOFT;1;(Host > Check Timed Out) > Service Critical[2010-02-16 11:34:13] SERVICE ALERT: > server1;ICMP;CRITICAL;HARD;1;CRITICAL - the.ip.the.ip: rta nan, lost 100% > Host Down[2010-02-16 11:34:43] HOST ALERT: server1;DOWN;SOFT;2;(Host > Check Timed Out) > Host Down[2010-02-16 11:35:23] HOST ALERT: server1;DOWN;SOFT;3;(Host > Check Timed Out) > Host Down[2010-02-16 11:36:33] HOST ALERT: server1;DOWN;HARD;4;(Host > Check Timed Out) > Host Up[2010-02-16 11:37:43] HOST ALERT: server1;UP;HARD;1;PING OK - > Packet loss = 0%, RTA = 0.67 ms > Service Ok[2010-02-16 11:39:13] SERVICE ALERT: > server1;ICMP;OK;HARD;1;OK - the.ip.the.ip: rta 0.943ms, lost 0% > > Or later : > Host Down[2010-02-16 11:42:03] HOST ALERT: server1;DOWN;SOFT;1;(Host > Check Timed Out) > Host Down[2010-02-16 11:43:13] HOST ALERT: server1;DOWN;SOFT;2;(Host > Check Timed Out) > Service Critical[2010-02-16 11:44:13] SERVICE ALERT: > server1;ICMP;CRITICAL;HARD;1;CRITICAL - the.ip.the.ip: rta nan, lost 100% > Host Down[2010-02-16 11:44:43] HOST ALERT: server1;DOWN;SOFT;3;(Host > Check Timed Out) > Host Up[2010-02-16 11:45:53] HOST ALERT: server1;UP;SOFT;4;PING OK - > Packet loss = 0%, RTA = 0.64 ms > Service Ok[2010-02-16 11:49:13] SERVICE ALERT: > server1;ICMP;OK;HARD;1;OK - the.ip.the.ip: rta 0.948ms, lost 0%
If you're asking why Nagios runs a host check when it sees the service fail a check, that's normal behavior. When a service check fails, the first thing Nagios will do is look to see if the service failed because the host is down. ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null