Hi admins!
Following situation: Debian 4.0 + Nagios 3.0b1. I defined a service-template SSH: ######################################################## define service{ name check-ssh-service ; The 'name' of this service template check_command check_ssh service_description SSH active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled failure_prediction_enabled 1 ; Failure prediction is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts is_volatile 0 ; The service is not volatile check_period 24x7 ; The service can be checked at any time of the day max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state normal_check_interval 10 ; Check the service every 10 minutes under normal conditions retry_check_interval 1 ; Re-check the service every two minutes until a hard state can be determined contact_groups linux-admins ; Notifications get sent out to everyone in the 'admins' group notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 60 ; Re-notify about service problems every hour notification_period 24x7 ; Notifications can be sent out at any time register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! } ######################################################## Then i defined a hostgroup, for which i use the SSH service ######################################################## define hostgroup{ hostgroup_name vpn-server alias VPN-Gateways members vpn-gw1-remote,vpn-gw1-local } ######################################################## Now i defined a service that uses the SSH-template and is applied to the group "vpn-server" ######################################################## define service{ use check-ssh-service notes SSH auf Linux-Servern hostgroup_name vpn-server service_description SSH } ######################################################## And at last i defined two service-escalations for SSH (i've set the intervals so short only for testing purposes) ######################################################## define serviceescalation{ hostgroup_name vpn-server service_description SSH first_notification 1 last_notification 5 notification_interval 3 contact_groups linux-admins } define serviceescalation{ hostgroup_name vpn-server service_description SSH first_notification 5 last_notification 0 notification_interval 10 contact_groups linux-admins } ######################################################## "interval_length" is set to 60 seconds in nagios.cfg. So far, so good. 1. PROBLEM: =========== Now, i get following notifications (these here are the syslog-entries): ######################################################## Aug 30 13:05:38 unicorn nagios: SERVICE ALERT: vpn-gw1-local;SSH;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds Aug 30 13:06:38 unicorn nagios: SERVICE ALERT: vpn-gw1-local;SSH;CRITICAL;SOFT;2;CRITICAL - Socket timeout after 10 seconds Aug 30 13:07:38 unicorn nagios: SERVICE ALERT: vpn-gw1-local;SSH;CRITICAL;HARD;3;CRITICAL - Socket timeout after 10 seconds Aug 30 13:07:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds Aug 30 13:17:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds Aug 30 13:27:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds Aug 30 13:37:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds Aug 30 13:47:39 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds Aug 30 13:57:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds Aug 30 14:07:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds ######################################################## The notifications here are sent when HARD state is reached, right? So the first notification is the the one at 13:07:38. Ok. But according to my config, the notifications 1 to 5 should be resent every 3 minutes. Then why the second and other notifications came 10 minutes after each other? I see only one value, where 10 minutes are set - its the "normal_check_interval". (15 minutes later: ok, ok, i see, for the 5+ notifications the value of 10 minutes is also set.... but it should not affect the notifications period for the first 4 messages. Theoretically... :-\ ) So as i understand the problem: "notification_interval" from the "serviceescalation" is ignored! What could help out? P.S.: checked the config of service-escalations via webgui and got the the "Notification Interval" for my two escalations is set to "0"! HOW THAT!?!? 2. PROBLEM: =========== Furthermore, the "notification_interval" in the service-part is described as "Re-notify about service problems every XXX". Note: "about service problems". Now, if i set the notification_interval to a lower value then a "normal_check_interval", i.e. "9", i get following warning-message at nagios pre-flight-check: ######################################################## Warning: Service 'SSH' on host 'vpn-gw1-local' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval. Warning: Service 'SSH' on host 'vpn-gw1-remote' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval. ######################################################## But, hell, what the "normal_check_interval" have to do with "notification_interval"?! These two are completely different things! Or have i misunderstood something? People - HELP!! Thanks. Ilya. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null