Re: [Nagios-users] Escalate after X warnings or criticals
Hi Martin, The escalation_options don't take the state into consideration during the notification count. So if you have an escalate rule on the 4th notification and only escalate on Critical in the escalation_options then following scenario is can occur: You have 3 warning notifications and the 4th is Critical then it will escalate as there have been 4 notifications and a Critical. I posted a help request on this issue a week or two ago and would really like this to be patched or built into the next update. http://article.gmane.org/gmane.network.nagios.user/64997/match=escalation+state Cheers, Neil On Sat, Nov 7, 2009 at 12:56 AM, Martin Melin wrote: > The existing escalation_options directive in escalation definitions will > likely get you this behavior without the need for a patch. > > http://nagios.sourceforge.net/docs/3_0/escalations.html - see the very > bottom of this page as well as the object definition documentation for > escalation_options. > > Regards, > Martin Melin > > > On Fri, Nov 6, 2009 at 3:49 AM, Mark Gius wrote: > >> Currently, service notifications contain "first/last_notification" >> directives, that specify the range of notifications that the escalation >> should apply to. This method of escalation has a weakness however. >> >> At my work, we let warnings go to the default contact (which happens to >> be email), and escalate to a pager chain on critical. However, if a >> service sits in WARNING for a length of time (which is likely to happen >> in the middle of the night), by the time the service enters a CRITICAL >> state the notification count exceeds our highest escalation, and our >> entire team gets paged immediately. >> >> What I'd like to see is the ability to distinguish between a WARNING >> notification and a CRITICAL notification in the escalation, and set up >> escalation chains that work based on the number of CRITICAL's that have >> been sent, as opposed to the total number of notifications. >> >> I am planning on patching nagios to support this behavior if there isn't >> a way to achieve this behavior with the current implementation. My plan >> is to add a warning/critical count to service, add a first/last >> warning/critical state to service escalations, and add the directives >> "(first|last)_(warning|critical)_notification" to the service escalation >> configs. The idea is also to keep the current behavior >> (notification_count and first/last_notification would still be present), >> but allow finer grained control over when escalations are sent out. >> This way if somebody didn't want to use the finer grained control their >> behavior would stay the same. My current plan is to match the >> escalation if _any_ of the 3 notification ranges match >> (all/warning/critical). >> >> Any advice on making this behavior happen with Nagios as-is, or >> suggestions/advice on the implementation are welcome. >> >> -Gius >> >> >> -- >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 >> 30-Day >> trial. Simplify your report design, integration and deployment - and focus >> on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> ___ >> Nagios-users mailing list >> Nagios-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nagios-users >> ::: Please include Nagios version, plugin version (-v) and OS when >> reporting any issue. >> ::: Messages without supporting info will risk being sent to /dev/null >> > > > > -- > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting i
Re: [Nagios-users] Configuring Serce Dependencies in Nagios 3.1.2
Hi Mirza, >From "Building a Monitoring Infrastructure With Nagios" by David Josephen pg. 71 "Before Nagios checks the state of a service, it first checks the state of all the services that the service depends upon (its parents). If all of those services are okay, Nagios proceeds to check the child service. If any of the parent services are down, Nagios assumes the child service is down as well and stops checks and notifications on the child." Just set up the dependencies and check them. It's easy to do tests just put in wrong IP addresses for the services you want to simulate are down. You'll soon see if your dependency are working. Cheers, Neil On Wed, Oct 28, 2009 at 4:01 AM, Mirza Dedic wrote: > Hello, > > I have a Nagios setup and I am monitoring LAN/WAN servers/services > including network devices, smb shares, squid service status, server pings, > and our frame relay. > > Now if my frame relay goes down between the two offices, obviously my > Nagios will not be able to check the server and smb/squid services, now I > get a warning message that it cannot reach the smb/squid/server/network, I > want to just get a warning that it cannot reach the network and the rest > suppressed. > > I understand this can be done using service dependency, but I have also > read that the way Nagios checks stuff is a little wacky, as in: > > What if it checks the ping of a server before the network? It will report > the ping is critical, then once the network is queued to be checked, it will > report the network is down? > > Which would kind of defeat the logic of a dependency check, anyone have any > feedback as if this is still the case? > > I remember reading it on the forum, which is why I did not go through with > it when I initially configured it all. > > But I am also trying to avoid 20 emails for 1 host that is down. > > Any suggestions or ideas? > > Thanks! > > > -- > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Escalation notification count and state change problem
Hi, I've searched and found similar posts but unfortunately no replies to this type of problem. I'd expect this to be a common problem but maybe I've misread the documentation. On Nagios 3.2.0 we have service notifications set to go out for Warning and for Critical states to an email address 24x7. In addition during 'after hours', the on-call engineer receives SMS alerts for all Critical notifications and the backup engineer should receive escalations after the 4th Critical notification. However, last night the backup engineer received an SMS on the second Critical notification. define serviceescalation{ hostgroup_name switches,primary_nodes,secondary_nodes service_description * first_notification 4 last_notification 0 notification_interval 5 contact_groups PrimaryAH,SecondaryAH escalation_period afterhours escalation_options u,c } # Primary After-hours contacts define contactgroup{ contactgroup_name PrimaryAH alias Primary After-Hours contact members supportEmail,Engineer1 } # Secondary After-hours contacts define contactgroup{ contactgroup_name SecondaryAH alias Secondary After-Hours contact members supportEmail,Engineer2 } It appears that escalation procedure ignores the actual sate when counting notifications. So if the 1st notification is critical and the 4th is critical but 2-3 are Warnings the 4th notification is escalated as it is critical. Eventhough only 1 critical notification was sent and the other 2 were warnings. I was hoping that on the 4th critical notification Nagios escalates. See event log below: Service Notification [10-26-2009 23:48:32] SERVICE NOTIFICATION: Engineer2;winapps1;NT memory usage;CRITICAL;notify-service-by-sms;Mem: 984 MB (96%) / 1023 MB (3%) Paged Mem: 1189 MB (48%) / 2469 MB (51%) Service Notification[10-26-2009 23:48:32] SERVICE NOTIFICATION: Engineer1;winapps1;NT memory usage;CRITICAL;notify-service-by-sms;Mem: 984 MB (96%) / 1023 MB (3%) Paged Mem: 1189 MB (48%) / 2469 MB (51% Service Notification[10-26-2009 23:38:32] SERVICE NOTIFICATION: supportEmail;winapps1;NT memory usage;WARNING;notify-service-by-email;Mem: 950 MB (92%) / 1023 MB (7%) Paged Mem: 1219 MB (49%) / 2469 MB (50%) Service Notification[10-26-2009 23:28:32] SERVICE NOTIFICATION: supportEmail;winapps1;NT memory usage;WARNING;notify-service-by-email;Mem: 881 MB (86%) / 1023 MB (13%) Paged Mem: 1192 MB (48%) / 2469 MB (51%) Service Notification[10-26-2009 23:18:32] SERVICE NOTIFICATION: supportEmail;winapps1;NT memory usage;CRITICAL;notify-service-by-email;Mem: 1006 MB (98%) / 1023 MB (1%) Paged Mem: 1152 MB (46%) / 2469 MB (53%) Service Notification[10-26-2009 23:18:32] SERVICE NOTIFICATION: Engineer1;winapps1;NT memory usage;CRITICAL;notify-service-by-sms;Mem: 1006 MB (98%) / 1023 MB (1%) Paged Mem: 1152 MB (46%) / 2469 MB (53%) Is there a way to avoid this behaviour? Thanks Neil -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null