[Nagios-users] Host seemingly not following escalation rules
Hello all, I have a number of host objects that are sending notifications continuously (down notification every half hour for the duration of the outage), and aren't following my host escalation rules that are defined for the group. I've looked at my config files and can't seem to locate whatever is causing the set of hosts to not follow the escalation rules. It is a group of sites that are all members of the same host group, and use the same template. Below are the bits of config info pertaining to one individual host, it's template, contact group, host escalation and host group. If anyone could take a glance at it, to see if I'm missing something on why this set of hosts isn't following my escalation rules; or if you could point me towards something else that I need to look at, that'd be awesome. (for what it's worth, all of my other host escalation rules are working fine; it's just this group of hosts that don't seem to like following the rules that are screwing with me). define host { use cstore,host-pnp host_name 7608_Madeup_Avenue_Shell_Router alias 7608_Madeup_Avenue_Shell_Router address 10.6.8.31 hostgroups CStore-Sites } define hostgroup{ hostgroup_name CStore-Sites } define host{ namecstore use generic-switch notification_period workhours contact_groups cstore_contacts register0 icon_image cstore.png statusmap_image cstore.gd2 notification_optionsd,r } define host{ namegeneric-switch use generic-host check_period24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_optionsd,r contact_groups admins register0 } define hostescalation{ hostgroup_name CStore-Sites contact_groups cstore_contacts first_notification 3 last_notification 3 notification_interval 20 escalation_period workhours escalation_options d,r } define contactgroup{ contactgroup_name cstore_contacts alias Retail Support members usernames } Thanks, Daniel Ceola Systems DB Admin The Wills Group 6355 Crain Hwy La Plata, MD 20646 301-932-3600 301-932-3643 (direct line) -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Host seemingly not following escalation rules
And what is host-pnp ??? On 09/10/12 14:33, Daniel Ceola wrote: Hello all, I have a number of host objects that are sending notifications continuously (down notification every half hour for the duration of the outage), and aren't following my host escalation rules that are defined for the group. I've looked at my config files and can't seem to locate whatever is causing the set of hosts to not follow the escalation rules. It is a group of sites that are all members of the same host group, and use the same template. Below are the bits of config info pertaining to one individual host, it's template, contact group, host escalation and host group. If anyone could take a glance at it, to see if I'm missing something on why this set of hosts isn't following my escalation rules; or if you could point me towards something else that I need to look at, that'd be awesome. (for what it's worth, all of my other host escalation rules are working fine; it's just this group of hosts that don't seem to like following the rules that are screwing with me). define host { use cstore,host-pnp host_name 7608_Madeup_Avenue_Shell_Router alias 7608_Madeup_Avenue_Shell_Router address 10.6.8.31 hostgroups CStore-Sites } define hostgroup{ hostgroup_name CStore-Sites } define host{ namecstore use generic-switch notification_period workhours contact_groups cstore_contacts register0 icon_image cstore.png statusmap_image cstore.gd2 notification_optionsd,r } define host{ namegeneric-switch use generic-host check_period24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_optionsd,r contact_groups admins register0 } define hostescalation{ hostgroup_name CStore-Sites contact_groups cstore_contacts first_notification 3 last_notification 3 notification_interval 20 escalation_period workhours escalation_options d,r } define contactgroup{ contactgroup_name cstore_contacts alias Retail Support members usernames } Thanks, Daniel Ceola Systems DB Admin The Wills Group 6355 Crain Hwy La Plata, MD 20646 301-932-3600 301-932-3643 (direct line) -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Host seemingly not following escalation rules
It's probably a host definition used for pnp4nagios. It allows you to display/graph certain performance data. For more info: http://www.pnp4nagios.org/ -Chris B. On 10/9/12 10:02 AM, Assaf Flatto wrote: And what is host-pnp ??? On 09/10/12 14:33, Daniel Ceola wrote: Hello all, I have a number of host objects that are sending notifications continuously (down notification every half hour for the duration of the outage), and aren’t following my host escalation rules that are defined for the group. I’ve looked at my config files and can’t seem to locate whatever is causing the set of hosts to not follow the escalation rules. It is a group of sites that are all members of the same host group, and use the same template. Below are the bits of config info pertaining to one individual host, it’s template, contact group, host escalation and host group. If anyone could take a glance at it, to see if I’m missing something on why this set of hosts isn’t following my escalation rules; or if you could point me towards something else that I need to look at, that’d be awesome. (for what it’s worth, all of my other host escalation rules are working fine; it’s just this group of hosts that don’t seem to like following the rules that are screwing with me). define host { use cstore,host-pnp host_name 7608_Madeup_Avenue_Shell_Router alias 7608_Madeup_Avenue_Shell_Router address 10.6.8.31 hostgroups CStore-Sites } define hostgroup{ hostgroup_name CStore-Sites } define host{ name cstore use generic-switch notification_period workhours contact_groups cstore_contacts register 0 icon_image cstore.png statusmap_image cstore.gd2 notification_options d,r } define host{ name generic-switch use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_options d,r contact_groups admins register 0 } define hostescalation{ hostgroup_name CStore-Sites contact_groups cstore_contacts first_notification 3 last_notification 3 notification_interval 20 escalation_period workhours escalation_options d,r } define contactgroup{ contactgroup_name cstore_contacts alias Retail Support members usernames } Thanks, Daniel Ceola Systems DB Admin The Wills Group 6355 Crain Hwy La Plata, MD 20646 301-932-3600 301-932-3643 (direct line) -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Host seemingly not following escalation rules
On 09/10/12 15:37, Chris Baldwin wrote: It's probably a host definition used for pnp4nagios. It allows you to display/graph certain performance data. For more info: http://www.pnp4nagios.org/ -Chris B. Chris The name pnp sort of gave that away , but since he is using two templates to define the host , and one might override definitions , then he should inspect both when asking us to debug his config , hence the question . -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE: Unable to read output; but works when run under strace ...
Hello Peter, thanks for your reply. However, as previously written, I know of the peculiarities that might arise once sudo joins the team, and in the issue at hand sudo is no more involved than being used for illustration purposes while the issue itself doesn't even remotely touch sudo at all. Furthermore, I never found any the necessity to deal with !requiretty on Debian, but indeed had to make use of this sudo option on RHEL. Still, no sudo involved in this case, sorry ... Cheers, Flo -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Host seemingly not following escalation rules
Correct, it was for pnp4nagios. Just a bit ago I finally realized my goof. In the host template definition that I titled cstore I had assigned the cstore contact group, instead of the 'default' nagios contact, so the users were getting emails from that. I have since changed it, and the hosts seem to properly be following the escalation definitions now. Thanks, Daniel Ceola -Original Message- From: Chris Baldwin [mailto:o...@umich.edu] Sent: Tuesday, October 09, 2012 10:38 AM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Host seemingly not following escalation rules It's probably a host definition used for pnp4nagios. It allows you to display/graph certain performance data. For more info: http://www.pnp4nagios.org/ -Chris B. On 10/9/12 10:02 AM, Assaf Flatto wrote: And what is host-pnp ??? On 09/10/12 14:33, Daniel Ceola wrote: Hello all, I have a number of host objects that are sending notifications continuously (down notification every half hour for the duration of the outage), and aren't following my host escalation rules that are defined for the group. I've looked at my config files and can't seem to locate whatever is causing the set of hosts to not follow the escalation rules. It is a group of sites that are all members of the same host group, and use the same template. Below are the bits of config info pertaining to one individual host, it's template, contact group, host escalation and host group. If anyone could take a glance at it, to see if I'm missing something on why this set of hosts isn't following my escalation rules; or if you could point me towards something else that I need to look at, that'd be awesome. (for what it's worth, all of my other host escalation rules are working fine; it's just this group of hosts that don't seem to like following the rules that are screwing with me). define host { use cstore,host-pnp host_name 7608_Madeup_Avenue_Shell_Router alias 7608_Madeup_Avenue_Shell_Router address 10.6.8.31 hostgroups CStore-Sites } define hostgroup{ hostgroup_name CStore-Sites } define host{ name cstore use generic-switch notification_period workhours contact_groups cstore_contacts register 0 icon_image cstore.png statusmap_image cstore.gd2 notification_options d,r } define host{ name generic-switch use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_options d,r contact_groups admins register 0 } define hostescalation{ hostgroup_name CStore-Sites contact_groups cstore_contacts first_notification 3 last_notification 3 notification_interval 20 escalation_period workhours escalation_options d,r } define contactgroup{ contactgroup_name cstore_contacts alias Retail Support members usernames } Thanks, Daniel Ceola Systems DB Admin The Wills Group 6355 Crain Hwy La Plata, MD 20646 301-932-3600 301-932-3643 (direct line) - - Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev
Re: [Nagios-users] Repeating event handler in hard service state....
Simple script in cron... -Original Message- From: Peter Kaagman [mailto:p.kaag...@atlascollege.nl] Sent: Monday, October 08, 2012 9:17 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Repeating event handler in hard service state Hi there list, As I understand service specific event handlers are triggered for every state change whenever a server is in a SOFT state, and once when a service enters a HARD state. Problem is that I have a service (an IPSEC tunnel) which is dependent on an outside source. If the outside party fails (whenever they do updates once a week) I actually kill the tunnel when attempting a restart. To solve this I would like to keep trying the restart. I could do this in a SOFT state by increasing the max check attempts to a higher number... but than I would never get a notification. Letting the service go to a HARD state (to get the notification) would limit the restart attempt to just the one event when the service enters the HARD state. I think there are 2 possibilities: - Keep the service in a SOFT state and send out a notification on attempt X. - Let the service go to a HARD state but keep on trying the restart. Is there anyway I could achieve this? Or am I completely missing something. Peter -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Repeating event handler in hard service state....
Van: Matthew Jurgens [mailto:nagiosus...@edcint.co.nz] Verzonden: dinsdag 9 oktober 2012 0:27 Aan: Nagios Users List Onderwerp: Re: [Nagios-users] Repeating event handler in hard service state If you set the service to volatile if will run the event handler every time the service is not OK, even after multiple HARD states. The event handler at edcint.co.nz/checkwmiplus will also give you fine grain control over exactly what states the event handler should do something including specific text strings in the service output. This may add some flexibility so that you only restart the tunnel if you really need to. -- Smartmon System Monitoringhttp://www.smartmon.com.au www.smartmon.com.auhttp://www.smartmon.com.au [Peter Kaagman] Thanks... that did the trick. Did not solve the notification part. But I can live with that Peter -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Repeating event handler in hard service state....
-Oorspronkelijk bericht- Van: Frank Bulk [mailto:frnk...@iname.com] Verzonden: woensdag 10 oktober 2012 5:34 Aan: nagios-users@lists.sourceforge.net Onderwerp: Re: [Nagios-users] Repeating event handler in hard service state Simple script in cron... [Peter Kaagman] That is in fact how it all started out for me: putting ping checks in cron jobs. -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE: Unable to read output; but works when run under strace ...
-Oorspronkelijk bericht- Van: Florian Ernst [mailto:florian_er...@gmx.net] Verzonden: dinsdag 9 oktober 2012 20:14 Aan: Nagios Users List Onderwerp: Re: [Nagios-users] NRPE: Unable to read output; but works when run under strace ... Hello Peter, thanks for your reply. However, as previously written, I know of the peculiarities that might arise once sudo joins the team, and in the issue at hand sudo is no more involved than being used for illustration purposes while the issue itself doesn't even remotely touch sudo at all. Furthermore, I never found any the necessity to deal with !requiretty on Debian, but indeed had to make use of this sudo option on RHEL. Still, no sudo involved in this case, sorry ... Cheers, Flo [Peter Kaagman] Sorry that did not help you. Guess I should have read you post more closely... Peter -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null