I'll preface this with saying that yes, I understand that this is far
past any expected scope of support for the mailing list; I'm posting it
here in the hopes that someone has dealt with this issue or has done
something similar to this themselves and is able to give some pointers
to where I may have gone wrong.

 

I'm trying to get increasing interval escalation behavior working under
Opsview 3.9.0. I've edited nagconfgen.pl to append appropriate host and
service escalation objects onto the defined host and service checks;
getting a definition similar to (from hosts.cfg):

 

# test host definition

define host {

       host_name     test

       alias         test

       address              10.0.0.10

       hostgroups    test

       check_interval       0

       retry_interval       1

       max_check_attempts 2

       flap_detection_enabled     0

       icon_image    windows.png

       icon_image_alt       LOGO - Windows

       vrml_image    windows.png

       statusmap_image      windows.png

       action_url    /info/host/275

       check_period  24x7

       contact_groups
hostgroup15_servicegroup1,hostgroup15_servicegroup10,hostgroup15_service
group13,hostgroup15_servicegroup14,hostgroup15_servicegroup2,hostgroup15
_servicegroup3,hostgroup15_servicegroup4,hostgroup15_servicegroup42,host
group15_servicegroup45,hostgroup15_servicegroup46,hostgroup15_servicegro
up47,hostgroup15_servicegroup5,hostgroup15_servicegroup8,hostgroup15_ser
vicegroup9,ov_monitored_by_master

       check_command        check_host_15!-H $HOSTADDRESS$ -t 3 -w
500.0,80% -c 1000.0,100%

       parents       opsview

       notifications_enabled      1

       notification_interval      3

       notification_period  24x7

       notification_options u,d,r

       use    host-global

}

 

define hostescalation{

    host_name               test

    first_notification      3

    last_notification       4

    notification_interval   10

    contact_groups
hostgroup15_servicegroup1,hostgroup15_servicegroup10,hostgroup15_service
group13,hostgroup15_servicegroup14,hostgroup15_servicegroup2,hostgroup15
_servicegroup3,hostgroup15_servicegroup4,hostgroup15_servicegroup42,host
group15_servicegroup45,hostgroup15_servicegroup46,hostgroup15_servicegro
up47,hostgroup15_servicegroup5,hostgroup15_servicegroup8,hostgroup15_ser
vicegroup9,ov_monitored_by_master

}

 

define hostescalation{

    host_name               test

    first_notification      4

    last_notification       5

    notification_interval   30

    contact_groups
hostgroup15_servicegroup1,hostgroup15_servicegroup10,hostgroup15_service
group13,hostgroup15_servicegroup14,hostgroup15_servicegroup2,hostgroup15
_servicegroup3,hostgroup15_servicegroup4,hostgroup15_servicegroup42,host
group15_servicegroup45,hostgroup15_servicegroup46,hostgroup15_servicegro
up47,hostgroup15_servicegroup5,hostgroup15_servicegroup8,hostgroup15_ser
vicegroup9,ov_monitored_by_master

}

 

define hostescalation{

    host_name               test

    first_notification      5

    last_notification       0

    notification_interval   240

    contact_groups
hostgroup15_servicegroup1,hostgroup15_servicegroup10,hostgroup15_service
group13,hostgroup15_servicegroup14,hostgroup15_servicegroup2,hostgroup15
_servicegroup3,hostgroup15_servicegroup4,hostgroup15_servicegroup42,host
group15_servicegroup45,hostgroup15_servicegroup46,hostgroup15_servicegro
up47,hostgroup15_servicegroup5,hostgroup15_servicegroup8,hostgroup15_ser
vicegroup9,ov_monitored_by_master

}

 

As I understand it, this won't cause the notification to trigger every 3
minutes initially, the notifications will simply increment up to the
next time the host check runs that's >3 minutes from each previous
notification, so the incrementation behavior will be somewhat variable
based on the host check and recheck intervals. That's OK with me. The
problem is that the escalations never occur.

 

As I understand it, the escalations above should each take over from the
default notification interval in the host definition as each
notification number is hit. That's not happening. I've turned on the
notification debug output, and the results I'm seeing for the
appropriate host are:

 

[1287524459.202392] [032.0] [pid=32412] ** Host Notification Attempt **
Host: 'test', Type: 0, Options: 0, Current State: 1, Last Notification:
Wed Dec 31 18:00:00 1969

[1287524459.202545] [032.0] [pid=32412] Notification viability test
passed.

[1287524459.202590] [032.1] [pid=32412] Current notification number: 1
(incremented)

[1287524459.202629] [032.1] [pid=32412] Host notification will NOT be
escalated.

[1287524459.202700] [032.1] [pid=32412] Adding normal contacts for host
to notification list.

[1287524463.103970] [032.0] [pid=32412] 4 contacts were notified.  Next
possible notification time: Tue Oct 19 16:43:59 2010

[1287524463.104254] [032.0] [pid=32412] 4 contacts were notified.

--

[1287524576.160988] [032.0] [pid=32412] ** Host Notification Attempt **
Host: 'test', Type: 0, Options: 0, Current State: 1, Last Notification:
Tue Oct 19 16:40:59 2010

[1287524576.161439] [032.1] [pid=32412] Its not yet time to re-notify
the contacts about this host problem...

[1287524576.161512] [032.1] [pid=32412] Next acceptable notification
time: Tue Oct 19 16:43:59 2010

[1287524576.161552] [032.0] [pid=32412] Notification viability test
failed.  No notification will be sent out.

--

[1287524876.191478] [032.0] [pid=32412] ** Host Notification Attempt **
Host: 'test', Type: 0, Options: 0, Current State: 1, Last Notification:
Tue Oct 19 16:40:59 2010

[1287524876.191803] [032.0] [pid=32412] Notification viability test
passed.

[1287524876.191850] [032.1] [pid=32412] Current notification number: 2
(incremented)

[1287524876.191933] [032.1] [pid=32412] Host notification will NOT be
escalated.

[1287524876.192002] [032.1] [pid=32412] Adding normal contacts for host
to notification list.

[1287524879.592963] [032.0] [pid=32412] 4 contacts were notified.  Next
possible notification time: Tue Oct 19 16:50:56 2010

[1287524879.593110] [032.0] [pid=32412] 4 contacts were notified.

--

[1287525175.904951] [032.0] [pid=32412] ** Host Notification Attempt **
Host: 'test', Type: 0, Options: 0, Current State: 1, Last Notification:
Tue Oct 19 16:47:56 2010

[1287525175.905177] [032.0] [pid=32412] Notification viability test
passed.

[1287525175.905222] [032.1] [pid=32412] Current notification number: 3
(incremented)

[1287525175.905272] [032.1] [pid=32412] Host notification will NOT be
escalated.

[1287525175.905333] [032.1] [pid=32412] Adding normal contacts for host
to notification list.

[1287525180.098830] [032.0] [pid=32412] 4 contacts were notified.  Next
possible notification time: Tue Oct 19 16:55:55 2010

[1287525180.099015] [032.0] [pid=32412] 4 contacts were notified.

--

[1287525476.080139] [032.0] [pid=32412] ** Host Notification Attempt **
Host: 'test', Type: 0, Options: 0, Current State: 1, Last Notification:
Tue Oct 19 16:52:55 2010

[1287525476.080561] [032.0] [pid=32412] Notification viability test
passed.

[1287525476.080636] [032.1] [pid=32412] Current notification number: 4
(incremented)

[1287525476.080686] [032.1] [pid=32412] Host notification will NOT be
escalated.

[1287525476.080739] [032.1] [pid=32412] Adding normal contacts for host
to notification list.

[1287525479.943738] [032.0] [pid=32412] 4 contacts were notified.  Next
possible notification time: Tue Oct 19 17:00:56 2010

[1287525479.943886] [032.0] [pid=32412] 4 contacts were notified.

--

[1287525775.530102] [032.0] [pid=32412] ** Host Notification Attempt **
Host: 'test', Type: 0, Options: 0, Current State: 1, Last Notification:
Tue Oct 19 16:57:56 2010

[1287525775.530376] [032.0] [pid=32412] Notification viability test
passed.

[1287525775.530462] [032.1] [pid=32412] Current notification number: 5
(incremented)

[1287525775.530502] [032.1] [pid=32412] Host notification will NOT be
escalated.

[1287525775.530558] [032.1] [pid=32412] Adding normal contacts for host
to notification list.

[1287525779.453501] [032.0] [pid=32412] 4 contacts were notified.  Next
possible notification time: Tue Oct 19 17:05:55 2010

[1287525779.453648] [032.0] [pid=32412] 4 contacts were notified.

--

[1287526075.308761] [032.0] [pid=32412] ** Host Notification Attempt **
Host: 'test', Type: 0, Options: 0, Current State: 1, Last Notification:
Tue Oct 19 17:02:55 2010

[1287526075.309239] [032.0] [pid=32412] Notification viability test
passed.

[1287526075.309460] [032.1] [pid=32412] Current notification number: 6
(incremented)

[1287526075.309586] [032.1] [pid=32412] Host notification will NOT be
escalated.

[1287526075.309689] [032.1] [pid=32412] Adding normal contacts for host
to notification list.

[1287526081.749857] [032.0] [pid=32412] 4 contacts were notified.  Next
possible notification time: Tue Oct 19 17:10:55 2010

[1287526081.750095] [032.0] [pid=32412] 4 contacts were notified.

 

As you can see, it continues notifying based on the default 3 minute
notification interval, and the host notification escalations never
trigger. I've run a pre-flight-check on the configuration, and the
nagios binary reports no errors. The preflight check also shows that
it's parsing the three host escalations properly, as they show up in the
list of parsed objects.

 

There's a few caveats to how I'm working, though. As of right now I
haven't replaced the nagconfgen.pl script entirely, I'm simply splicing
in the output I get from a test run of my updated script for the one
test host I'm working with into the default script's configuration and
running /etc/init.d/opsview reload to reload the edited cfg files. I
know for a fact that that does reparse the nagios config, as I've used
that reload to update my logging behavior settings for the nagios binary
and see those changes take effect.

 

Where should I be looking to debug this? Is there some way to debug the
escalation behavior quicker than setting it up on a downed host and
watching what notifications come out? Thanks in advance!

 

-Matt

* System Administrator ([email protected])
* Excel.Net,Inc. - http://www.excel.net/ <http://www.excel.net/> 
* (920) 452-0455 x501 - Sheboygan/Plymouth area
* (888) 489-9995 x501 - Other areas, toll-free

_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users

Reply via email to