Hi admins!

Following situation: Debian 4.0 + Nagios 3.0b1.


I defined a service-template SSH:

########################################################
define service{
        name                            check-ssh-service             ; The 
'name' of this service template
        check_command                   check_ssh
        service_description             SSH
        active_checks_enabled           1                       ; Active 
service checks are enabled
        passive_checks_enabled          1                       ; Passive 
service checks are enabled/accepted
        parallelize_check               1                       ; Active 
service checks should be parallelized (disabling this can lead to major 
performance problems)
        obsess_over_service             1                       ; We should 
obsess over this service (if necessary)
        check_freshness                 0                       ; Default is to 
NOT check service 'freshness'
        notifications_enabled           1                       ; Service 
notifications are enabled
        event_handler_enabled           1                       ; Service event 
handler is enabled
        flap_detection_enabled          1                       ; Flap 
detection is enabled
        failure_prediction_enabled      1                       ; Failure 
prediction is enabled
        process_perf_data               1                       ; Process 
performance data
        retain_status_information       1                       ; Retain status 
information across program restarts
        retain_nonstatus_information    1                       ; Retain 
non-status information across program restarts
        is_volatile                     0                       ; The service 
is not volatile
        check_period                    24x7                    ; The service 
can be checked at any time of the day
        max_check_attempts              3                       ; Re-check the 
service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10                      ; Check the 
service every 10 minutes under normal conditions
        retry_check_interval            1                       ; Re-check the 
service every two minutes until a hard state can be determined
        contact_groups                  linux-admins            ; Notifications 
get sent out to everyone in the 'admins' group
        notification_options            w,u,c,r                 ; Send 
notifications about warning, unknown, critical, and recovery events
        notification_interval           60                      ; Re-notify 
about service problems every hour
        notification_period             24x7                    ; Notifications 
can be sent out at any time
        register                        0                       ; DONT REGISTER 
THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
########################################################


Then i defined a hostgroup, for which i use the SSH service

########################################################
define hostgroup{
        hostgroup_name  vpn-server
        alias       VPN-Gateways
        members     vpn-gw1-remote,vpn-gw1-local
        }
########################################################


Now i defined a service that uses the SSH-template and
is applied to the group "vpn-server"

########################################################
define service{
        use                     check-ssh-service
        notes                   SSH auf Linux-Servern
        hostgroup_name          vpn-server
        service_description     SSH
        }
########################################################


And at last i defined two service-escalations for SSH

(i've set the intervals so short only for testing purposes)
########################################################
define serviceescalation{
        hostgroup_name          vpn-server
        service_description     SSH
        first_notification      1
        last_notification       5
        notification_interval   3
        contact_groups          linux-admins
        }

define serviceescalation{
        hostgroup_name          vpn-server
        service_description     SSH
        first_notification      5
        last_notification       0
        notification_interval   10
        contact_groups          linux-admins
        }
########################################################

"interval_length" is set to 60 seconds in nagios.cfg.

So far, so good.



1. PROBLEM:
===========

Now, i get following notifications (these here are the syslog-entries):

########################################################
Aug 30 13:05:38 unicorn nagios: SERVICE ALERT: 
vpn-gw1-local;SSH;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds 
Aug 30 13:06:38 unicorn nagios: SERVICE ALERT: 
vpn-gw1-local;SSH;CRITICAL;SOFT;2;CRITICAL - Socket timeout after 10 seconds 
Aug 30 13:07:38 unicorn nagios: SERVICE ALERT: 
vpn-gw1-local;SSH;CRITICAL;HARD;3;CRITICAL - Socket timeout after 10 seconds 
Aug 30 13:07:38 unicorn nagios: SERVICE NOTIFICATION: 
linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket 
timeout after 10 seconds
Aug 30 13:17:38 unicorn nagios: SERVICE NOTIFICATION: 
linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket 
timeout after 10 seconds
Aug 30 13:27:38 unicorn nagios: SERVICE NOTIFICATION: 
linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket 
timeout after 10 seconds 
Aug 30 13:37:38 unicorn nagios: SERVICE NOTIFICATION: 
linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket 
timeout after 10 seconds 
Aug 30 13:47:39 unicorn nagios: SERVICE NOTIFICATION: 
linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket 
timeout after 10 seconds
Aug 30 13:57:38 unicorn nagios: SERVICE NOTIFICATION: 
linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket 
timeout after 10 seconds
Aug 30 14:07:38 unicorn nagios: SERVICE NOTIFICATION: 
linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket 
timeout after 10 seconds
########################################################


The notifications here are sent when HARD state is reached, right?
So the first notification is the the one at 13:07:38. Ok. 

But according to my config, the notifications 1 to 5 should be resent every 3 
minutes.
Then why the second and other notifications came 10 minutes after each other?
I see only one value, where 10 minutes are set - its the 
"normal_check_interval".

(15 minutes later: ok, ok, i see, for the 5+ notifications the value of 10 
minutes is also set....
but it should not affect the notifications period for the first 4 messages. 
Theoretically... :-\ )


So as i understand the problem:

"notification_interval" from the "serviceescalation" is ignored!

What could help out?


P.S.: checked the config of service-escalations via webgui and got the the 
"Notification Interval" for my two escalations is set to "0"!
HOW THAT!?!?



2. PROBLEM:
===========

Furthermore, the "notification_interval" in the service-part is described as 
"Re-notify about service problems every XXX".
Note: "about service problems".
Now, if i set the notification_interval to a lower value then a 
"normal_check_interval", i.e. "9", i get following warning-message
at nagios pre-flight-check:

########################################################
Warning: Service 'SSH' on host 'vpn-gw1-local'  has a notification interval 
less than its check interval!  Notifications are only re-sent after checks are 
made, so the 
effective notification interval will be that of the check interval.
Warning: Service 'SSH' on host 'vpn-gw1-remote'  has a notification interval 
less than its check interval!  Notifications are only re-sent after checks are 
made, so the effective 
notification interval will be that of the check interval.
########################################################

But, hell, what the "normal_check_interval" have to do with 
"notification_interval"?!
These two are completely different things! Or have i misunderstood something?



People - HELP!!

Thanks.

Ilya.





-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to