Re: [Nagios-users] Escalate after X warnings or criticals

2009-11-08 Thread Neil Ramsay
Hi Martin,

The escalation_options don't take the state into consideration during the
notification count. So if you have an escalate rule on the 4th notification
and only escalate on Critical in the escalation_options then following
scenario is can occur:
You have 3 warning notifications and the 4th is Critical then it will
escalate as there have been 4 notifications and a Critical. I posted a help
request on this issue a week or two ago and would really like this to be
patched or built into the next update.
http://article.gmane.org/gmane.network.nagios.user/64997/match=escalation+state

Cheers,

Neil

On Sat, Nov 7, 2009 at 12:56 AM, Martin Melin mme...@gmail.com wrote:

 The existing escalation_options directive in escalation definitions will
 likely get you this behavior without the need for a patch.

 http://nagios.sourceforge.net/docs/3_0/escalations.html - see the very
 bottom of this page as well as the object definition documentation for
 escalation_options.

 Regards,
 Martin Melin


 On Fri, Nov 6, 2009 at 3:49 AM, Mark Gius mg...@createspace.com wrote:

 Currently, service notifications contain first/last_notification
 directives, that specify the range of notifications that the escalation
 should apply to.  This method of escalation has a weakness however.

 At my work, we let warnings go to the default contact (which happens to
 be email), and escalate to a pager chain on critical.  However, if a
 service sits in WARNING for a length of time (which is likely to happen
 in the middle of the night), by the time the service enters a CRITICAL
 state the notification count exceeds our highest escalation, and our
 entire team gets paged immediately.

 What I'd like to see is the ability to distinguish between a WARNING
 notification and a CRITICAL notification in the escalation, and set up
 escalation chains that work based on the number of CRITICAL's that have
 been sent, as opposed to the total number of notifications.

 I am planning on patching nagios to support this behavior if there isn't
 a way to achieve this behavior with the current implementation.  My plan
 is to add a warning/critical count to service, add a first/last
 warning/critical state to service escalations, and add the directives
 (first|last)_(warning|critical)_notification to the service escalation
 configs.  The idea is also to keep the current behavior
 (notification_count and first/last_notification would still be present),
 but allow finer grained control over when escalations are sent out.
 This way if somebody didn't want to use the finer grained control their
 behavior would stay the same.  My current plan is to match the
 escalation if _any_ of the 3 notification ranges match
 (all/warning/critical).

 Any advice on making this behavior happen with Nagios as-is, or
 suggestions/advice on the implementation are welcome.

 -Gius


 --
 Let Crystal Reports handle the reporting - Free Crystal Reports 2008
 30-Day
 trial. Simplify your report design, integration and deployment - and focus
 on
 what you do best, core application coding. Discover what's new with
 Crystal Reports now.  http://p.sf.net/sfu/bobj-july
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null




 --
 Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
 trial. Simplify your report design, integration and deployment - and focus
 on
 what you do best, core application coding. Discover what's new with
 Crystal Reports now.  http://p.sf.net/sfu/bobj-july
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Configuring Serce Dependencies in Nagios 3.1.2

2009-10-27 Thread Neil Ramsay
Hi Mirza,

From Building a Monitoring Infrastructure With Nagios by David Josephen
pg. 71

Before Nagios checks the state of a service, it first checks the state of
all the services that the service depends upon (its parents). If all of
those services are okay, Nagios proceeds to check the child service. If any
of the parent services are down, Nagios assumes the child service is down as
well and stops checks and notifications on the child.

Just set up the dependencies and check them. It's easy to do tests just put
in wrong IP addresses for the services you want to simulate are down. You'll
soon see if your dependency are working.

Cheers,

Neil


On Wed, Oct 28, 2009 at 4:01 AM, Mirza Dedic mi...@oppy.com wrote:

 Hello,

 I have a Nagios setup and I am monitoring LAN/WAN servers/services
 including network devices, smb shares, squid service status, server pings,
 and our frame relay.

 Now if my frame relay goes down between the two offices, obviously my
 Nagios will not be able to check the server and smb/squid services, now I
 get a warning message that it cannot reach the smb/squid/server/network, I
 want to just get a warning that it cannot reach the network and the rest
 suppressed.

 I understand this can be done using service dependency, but I have also
 read that the way Nagios checks stuff is a little wacky, as in:

 What if it checks the ping of a server before the network? It will report
 the ping is critical, then once the network is queued to be checked, it will
 report the network is down?

 Which would kind of defeat the logic of a dependency check, anyone have any
 feedback as if this is still the case?

 I remember reading it on the forum, which is why I did not go through with
 it when I initially configured it all.

 But I am also trying to avoid 20 emails for 1 host that is down.

 Any suggestions or ideas?

 Thanks!


 --
 Come build with us! The BlackBerry(R) Developer Conference in SF, CA
 is the only developer event you need to attend this year. Jumpstart your
 developing skills, take BlackBerry mobile applications to market and stay
 ahead of the curve. Join us from November 9 - 12, 2009. Register now!
 http://p.sf.net/sfu/devconference
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Escalation notification count and state change problem

2009-10-26 Thread Neil Ramsay
Hi,

I've searched and found similar posts but unfortunately no replies to this
type of problem. I'd expect this to be a common problem but maybe I've
misread the documentation.

On Nagios 3.2.0 we have service notifications set to go out for Warning and
for Critical states to an email address 24x7.

In addition during 'after hours', the on-call engineer receives SMS alerts
for all Critical notifications and the backup engineer should receive
escalations after the 4th Critical notification. However, last night the
backup engineer received an SMS on the second Critical notification.


define serviceescalation{
hostgroup_name switches,primary_nodes,secondary_nodes
service_description *
first_notification 4
last_notification 0
notification_interval 5
contact_groups  PrimaryAH,SecondaryAH
escalation_period afterhours
escalation_options u,c
}

# Primary After-hours contacts
define contactgroup{
contactgroup_name   PrimaryAH
alias   Primary After-Hours contact
members supportEmail,Engineer1
}

# Secondary After-hours contacts
define contactgroup{
contactgroup_name   SecondaryAH
alias   Secondary After-Hours contact
members supportEmail,Engineer2
}





It appears that escalation procedure ignores the actual sate when counting
notifications. So if the 1st notification is critical and the 4th is
critical but 2-3 are Warnings the 4th notification is escalated as it is
critical. Eventhough only 1 critical notification was sent and the other 2
were warnings. I was hoping that on the 4th critical notification Nagios
escalates.

See event log below:

Service Notification [10-26-2009 23:48:32] SERVICE NOTIFICATION:
Engineer2;winapps1;NT memory usage;CRITICAL;notify-service-by-sms;Mem: 984
MB (96%) / 1023 MB (3%) Paged Mem: 1189 MB (48%) / 2469 MB (51%)

Service Notification[10-26-2009 23:48:32] SERVICE NOTIFICATION:
Engineer1;winapps1;NT memory usage;CRITICAL;notify-service-by-sms;Mem: 984
MB (96%) / 1023 MB (3%) Paged Mem: 1189 MB (48%) / 2469 MB (51%



Service Notification[10-26-2009 23:38:32] SERVICE NOTIFICATION:
supportEmail;winapps1;NT memory usage;WARNING;notify-service-by-email;Mem:
950 MB (92%) / 1023 MB (7%) Paged Mem: 1219 MB (49%) / 2469 MB (50%)


Service Notification[10-26-2009 23:28:32] SERVICE NOTIFICATION:
supportEmail;winapps1;NT memory usage;WARNING;notify-service-by-email;Mem:
881 MB (86%) / 1023 MB (13%) Paged Mem: 1192 MB (48%) / 2469 MB (51%)


Service Notification[10-26-2009 23:18:32] SERVICE NOTIFICATION:
supportEmail;winapps1;NT memory usage;CRITICAL;notify-service-by-email;Mem:
1006 MB (98%) / 1023 MB (1%) Paged Mem: 1152 MB (46%) / 2469 MB (53%)

Service Notification[10-26-2009 23:18:32] SERVICE NOTIFICATION:
Engineer1;winapps1;NT memory usage;CRITICAL;notify-service-by-sms;Mem: 1006
MB (98%) / 1023 MB (1%) Paged Mem: 1152 MB (46%) / 2469 MB (53%)


Is there a way to avoid this behaviour?

Thanks

Neil
--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null