Re: [Nagios-users] Escalate after X warnings or criticals

2009-11-08 Thread Neil Ramsay
Hi Martin,

The escalation_options don't take the state into consideration during the
notification count. So if you have an escalate rule on the 4th notification
and only escalate on Critical in the escalation_options then following
scenario is can occur:
You have 3 warning notifications and the 4th is Critical then it will
escalate as there have been 4 notifications and a Critical. I posted a help
request on this issue a week or two ago and would really like this to be
patched or built into the next update.
http://article.gmane.org/gmane.network.nagios.user/64997/match=escalation+state

Cheers,

Neil

On Sat, Nov 7, 2009 at 12:56 AM, Martin Melin  wrote:

> The existing escalation_options directive in escalation definitions will
> likely get you this behavior without the need for a patch.
>
> http://nagios.sourceforge.net/docs/3_0/escalations.html - see the very
> bottom of this page as well as the object definition documentation for
> escalation_options.
>
> Regards,
> Martin Melin
>
>
> On Fri, Nov 6, 2009 at 3:49 AM, Mark Gius  wrote:
>
>> Currently, service notifications contain "first/last_notification"
>> directives, that specify the range of notifications that the escalation
>> should apply to.  This method of escalation has a weakness however.
>>
>> At my work, we let warnings go to the default contact (which happens to
>> be email), and escalate to a pager chain on critical.  However, if a
>> service sits in WARNING for a length of time (which is likely to happen
>> in the middle of the night), by the time the service enters a CRITICAL
>> state the notification count exceeds our highest escalation, and our
>> entire team gets paged immediately.
>>
>> What I'd like to see is the ability to distinguish between a WARNING
>> notification and a CRITICAL notification in the escalation, and set up
>> escalation chains that work based on the number of CRITICAL's that have
>> been sent, as opposed to the total number of notifications.
>>
>> I am planning on patching nagios to support this behavior if there isn't
>> a way to achieve this behavior with the current implementation.  My plan
>> is to add a warning/critical count to service, add a first/last
>> warning/critical state to service escalations, and add the directives
>> "(first|last)_(warning|critical)_notification" to the service escalation
>> configs.  The idea is also to keep the current behavior
>> (notification_count and first/last_notification would still be present),
>> but allow finer grained control over when escalations are sent out.
>> This way if somebody didn't want to use the finer grained control their
>> behavior would stay the same.  My current plan is to match the
>> escalation if _any_ of the 3 notification ranges match
>> (all/warning/critical).
>>
>> Any advice on making this behavior happen with Nagios as-is, or
>> suggestions/advice on the implementation are welcome.
>>
>> -Gius
>>
>>
>> --
>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008
>> 30-Day
>> trial. Simplify your report design, integration and deployment - and focus
>> on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> ___
>> Nagios-users mailing list
>> Nagios-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when
>> reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>>
>
>
>
> --
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting i

Re: [Nagios-users] Configuring Serce Dependencies in Nagios 3.1.2

2009-10-27 Thread Neil Ramsay
Hi Mirza,

>From "Building a Monitoring Infrastructure With Nagios" by David Josephen
pg. 71

"Before Nagios checks the state of a service, it first checks the state of
all the services that the service depends upon (its parents). If all of
those services are okay, Nagios proceeds to check the child service. If any
of the parent services are down, Nagios assumes the child service is down as
well and stops checks and notifications on the child."

Just set up the dependencies and check them. It's easy to do tests just put
in wrong IP addresses for the services you want to simulate are down. You'll
soon see if your dependency are working.

Cheers,

Neil


On Wed, Oct 28, 2009 at 4:01 AM, Mirza Dedic  wrote:

> Hello,
>
> I have a Nagios setup and I am monitoring LAN/WAN servers/services
> including network devices, smb shares, squid service status, server pings,
> and our frame relay.
>
> Now if my frame relay goes down between the two offices, obviously my
> Nagios will not be able to check the server and smb/squid services, now I
> get a warning message that it cannot reach the smb/squid/server/network, I
> want to just get a warning that it cannot reach the network and the rest
> suppressed.
>
> I understand this can be done using service dependency, but I have also
> read that the way Nagios checks stuff is a little wacky, as in:
>
> What if it checks the ping of a server before the network? It will report
> the ping is critical, then once the network is queued to be checked, it will
> report the network is down?
>
> Which would kind of defeat the logic of a dependency check, anyone have any
> feedback as if this is still the case?
>
> I remember reading it on the forum, which is why I did not go through with
> it when I initially configured it all.
>
> But I am also trying to avoid 20 emails for 1 host that is down.
>
> Any suggestions or ideas?
>
> Thanks!
>
>
> --
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Escalation notification count and state change problem

2009-10-26 Thread Neil Ramsay
Hi,

I've searched and found similar posts but unfortunately no replies to this
type of problem. I'd expect this to be a common problem but maybe I've
misread the documentation.

On Nagios 3.2.0 we have service notifications set to go out for Warning and
for Critical states to an email address 24x7.

In addition during 'after hours', the on-call engineer receives SMS alerts
for all Critical notifications and the backup engineer should receive
escalations after the 4th Critical notification. However, last night the
backup engineer received an SMS on the second Critical notification.


define serviceescalation{
hostgroup_name switches,primary_nodes,secondary_nodes
service_description *
first_notification 4
last_notification 0
notification_interval 5
contact_groups  PrimaryAH,SecondaryAH
escalation_period afterhours
escalation_options u,c
}

# Primary After-hours contacts
define contactgroup{
contactgroup_name   PrimaryAH
alias   Primary After-Hours contact
members supportEmail,Engineer1
}

# Secondary After-hours contacts
define contactgroup{
contactgroup_name   SecondaryAH
alias   Secondary After-Hours contact
members supportEmail,Engineer2
}





It appears that escalation procedure ignores the actual sate when counting
notifications. So if the 1st notification is critical and the 4th is
critical but 2-3 are Warnings the 4th notification is escalated as it is
critical. Eventhough only 1 critical notification was sent and the other 2
were warnings. I was hoping that on the 4th critical notification Nagios
escalates.

See event log below:

Service Notification [10-26-2009 23:48:32] SERVICE NOTIFICATION:
Engineer2;winapps1;NT memory usage;CRITICAL;notify-service-by-sms;Mem: 984
MB (96%) / 1023 MB (3%) Paged Mem: 1189 MB (48%) / 2469 MB (51%)

Service Notification[10-26-2009 23:48:32] SERVICE NOTIFICATION:
Engineer1;winapps1;NT memory usage;CRITICAL;notify-service-by-sms;Mem: 984
MB (96%) / 1023 MB (3%) Paged Mem: 1189 MB (48%) / 2469 MB (51%



Service Notification[10-26-2009 23:38:32] SERVICE NOTIFICATION:
supportEmail;winapps1;NT memory usage;WARNING;notify-service-by-email;Mem:
950 MB (92%) / 1023 MB (7%) Paged Mem: 1219 MB (49%) / 2469 MB (50%)


Service Notification[10-26-2009 23:28:32] SERVICE NOTIFICATION:
supportEmail;winapps1;NT memory usage;WARNING;notify-service-by-email;Mem:
881 MB (86%) / 1023 MB (13%) Paged Mem: 1192 MB (48%) / 2469 MB (51%)


Service Notification[10-26-2009 23:18:32] SERVICE NOTIFICATION:
supportEmail;winapps1;NT memory usage;CRITICAL;notify-service-by-email;Mem:
1006 MB (98%) / 1023 MB (1%) Paged Mem: 1152 MB (46%) / 2469 MB (53%)

Service Notification[10-26-2009 23:18:32] SERVICE NOTIFICATION:
Engineer1;winapps1;NT memory usage;CRITICAL;notify-service-by-sms;Mem: 1006
MB (98%) / 1023 MB (1%) Paged Mem: 1152 MB (46%) / 2469 MB (53%)


Is there a way to avoid this behaviour?

Thanks

Neil
--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null