[Nagios-users] Host seemingly not following escalation rules

2012-10-09 Thread Daniel Ceola
Hello all,

I have a number of host objects that are sending notifications continuously 
(down notification every half hour for the duration of the outage), and aren't 
following my host escalation rules that are defined for the group.  I've looked 
at my config files and can't seem to locate whatever is causing the set of 
hosts to not follow the escalation rules.  It is a group of sites that are all 
members of the same host group, and use the same template.

Below are the bits of config info pertaining to one individual host, it's 
template, contact group, host escalation and host group.  If anyone could take 
a glance at it, to see if I'm missing something on why this set of hosts isn't 
following my escalation rules; or if you could point me towards something else 
that I need to look at, that'd be awesome. (for what it's worth, all of my 
other host escalation rules are working fine; it's just this group of hosts 
that don't seem to like following the rules that are screwing with me).


define host {
use cstore,host-pnp
host_name   7608_Madeup_Avenue_Shell_Router
alias   7608_Madeup_Avenue_Shell_Router
address 10.6.8.31
hostgroups  CStore-Sites
}

define hostgroup{
hostgroup_name  CStore-Sites
}

define host{
namecstore
use generic-switch
notification_period workhours
contact_groups  cstore_contacts
register0
icon_image  cstore.png
statusmap_image cstore.gd2
notification_optionsd,r
}

define host{
namegeneric-switch
use generic-host
check_period24x7
check_interval  5
retry_interval  1
max_check_attempts  10
check_command   check-host-alive
notification_period 24x7
notification_interval   30
notification_optionsd,r
contact_groups  admins
register0
}


define  hostescalation{
hostgroup_name  CStore-Sites
contact_groups  cstore_contacts
first_notification  3
last_notification   3
notification_interval   20
escalation_period   workhours
escalation_options  d,r
}

define contactgroup{
contactgroup_name   cstore_contacts
alias   Retail Support
members usernames
}



Thanks,

Daniel Ceola
Systems  DB Admin

The Wills Group
6355 Crain Hwy
La Plata, MD 20646
301-932-3600
301-932-3643 (direct line)

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Host seemingly not following escalation rules

2012-10-09 Thread Assaf Flatto

And what is host-pnp ???


On 09/10/12 14:33, Daniel Ceola wrote:


Hello all,

I have a number of host objects that are sending notifications 
continuously (down notification every half hour for the duration of 
the outage), and aren't following my host escalation rules that are 
defined for the group.  I've looked at my config files and can't seem 
to locate whatever is causing the set of hosts to not follow the 
escalation rules.  It is a group of sites that are all members of the 
same host group, and use the same template.


Below are the bits of config info pertaining to one individual host, 
it's template, contact group, host escalation and host group.  If 
anyone could take a glance at it, to see if I'm missing something on 
why this set of hosts isn't following my escalation rules; or if you 
could point me towards something else that I need to look at, that'd 
be awesome. (for what it's worth, all of my other host escalation 
rules are working fine; it's just this group of hosts that don't seem 
to like following the rules that are screwing with me).


define host {

use cstore,host-pnp

host_name   7608_Madeup_Avenue_Shell_Router

alias   7608_Madeup_Avenue_Shell_Router

address 10.6.8.31

hostgroups  CStore-Sites

}

define hostgroup{

hostgroup_name  CStore-Sites

}

define host{

namecstore

use generic-switch

notification_period workhours

contact_groups  cstore_contacts

register0

icon_image  cstore.png

statusmap_image cstore.gd2

notification_optionsd,r

}

define host{

namegeneric-switch

use generic-host

check_period24x7

check_interval  5

retry_interval  1

max_check_attempts  10

check_command   check-host-alive

notification_period 24x7

notification_interval   30

notification_optionsd,r

contact_groups  admins

register0

}

define  hostescalation{

hostgroup_name  CStore-Sites

contact_groups  cstore_contacts

first_notification  3

last_notification   3

notification_interval   20

escalation_period   workhours

escalation_options  d,r

}

define contactgroup{

contactgroup_name   cstore_contacts

alias   Retail Support

members usernames

}

Thanks,

Daniel Ceola

Systems  DB Admin

The Wills Group

6355 Crain Hwy

La Plata, MD 20646

301-932-3600

301-932-3643 (direct line)


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev


___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Host seemingly not following escalation rules

2012-10-09 Thread Chris Baldwin
It's probably a host definition used for pnp4nagios. It allows you to 
display/graph certain performance data.

For more info: http://www.pnp4nagios.org/

-Chris B.


On 10/9/12 10:02 AM, Assaf Flatto wrote:
 And what is host-pnp ???


 On 09/10/12 14:33, Daniel Ceola wrote:

 Hello all,

 I have a number of host objects that are sending notifications 
 continuously (down notification every half hour for the duration of 
 the outage), and aren’t following my host escalation rules that are 
 defined for the group. I’ve looked at my config files and can’t seem 
 to locate whatever is causing the set of hosts to not follow the 
 escalation rules. It is a group of sites that are all members of the 
 same host group, and use the same template.

 Below are the bits of config info pertaining to one individual host, 
 it’s template, contact group, host escalation and host group. If 
 anyone could take a glance at it, to see if I’m missing something on 
 why this set of hosts isn’t following my escalation rules; or if you 
 could point me towards something else that I need to look at, that’d 
 be awesome. (for what it’s worth, all of my other host escalation 
 rules are working fine; it’s just this group of hosts that don’t seem 
 to like following the rules that are screwing with me).

 define host {

 use cstore,host-pnp

 host_name 7608_Madeup_Avenue_Shell_Router

 alias 7608_Madeup_Avenue_Shell_Router

 address 10.6.8.31

 hostgroups CStore-Sites

 }

 define hostgroup{

 hostgroup_name CStore-Sites

 }

 define host{

 name cstore

 use generic-switch

 notification_period workhours

 contact_groups cstore_contacts

 register 0

 icon_image cstore.png

 statusmap_image cstore.gd2

 notification_options d,r

 }

 define host{

 name generic-switch

 use generic-host

 check_period 24x7

 check_interval 5

 retry_interval 1

 max_check_attempts 10

 check_command check-host-alive

 notification_period 24x7

 notification_interval 30

 notification_options d,r

 contact_groups admins

 register 0

 }

 define hostescalation{

 hostgroup_name CStore-Sites

 contact_groups cstore_contacts

 first_notification 3

 last_notification 3

 notification_interval 20

 escalation_period workhours

 escalation_options d,r

 }

 define contactgroup{

 contactgroup_name cstore_contacts

 alias Retail Support

 members usernames

 }

 Thanks,

 Daniel Ceola

 Systems  DB Admin

 The Wills Group

 6355 Crain Hwy

 La Plata, MD 20646

 301-932-3600

 301-932-3643 (direct line)


 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev


 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null



 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev


 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Host seemingly not following escalation rules

2012-10-09 Thread Assaf Flatto
On 09/10/12 15:37, Chris Baldwin wrote:
 It's probably a host definition used for pnp4nagios. It allows you to
 display/graph certain performance data.

 For more info: http://www.pnp4nagios.org/

 -Chris B.
Chris

The name pnp sort of gave that away , but  since he is using two 
templates to define the host , and one might override definitions , then 
he should inspect both when asking us to debug his config , hence the 
question .





--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] NRPE: Unable to read output; but works when run under strace ...

2012-10-09 Thread Florian Ernst
Hello Peter,

thanks for your reply.

However, as previously written, I know of the peculiarities that might
arise once sudo joins the team, and in the issue at hand sudo is no more
involved than being used for illustration purposes while the issue
itself doesn't even remotely touch sudo at all.
Furthermore, I never found any the necessity to deal with !requiretty on
Debian, but indeed had to make use of this sudo option on RHEL. Still,
no sudo involved in this case, sorry ...

Cheers,
Flo

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Host seemingly not following escalation rules

2012-10-09 Thread Daniel Ceola
Correct, it was for pnp4nagios. Just a bit ago I finally realized my goof.  In 
the host template definition that I titled cstore  I had assigned the cstore 
contact group, instead of the 'default' nagios contact, so the users were 
getting emails from that.  I have since changed it, and the hosts seem to 
properly be following the escalation definitions now.

Thanks,

Daniel Ceola


-Original Message-
From: Chris Baldwin [mailto:o...@umich.edu] 
Sent: Tuesday, October 09, 2012 10:38 AM
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Host seemingly not following escalation rules

It's probably a host definition used for pnp4nagios. It allows you to 
display/graph certain performance data.

For more info: http://www.pnp4nagios.org/

-Chris B.


On 10/9/12 10:02 AM, Assaf Flatto wrote:
 And what is host-pnp ???


 On 09/10/12 14:33, Daniel Ceola wrote:

 Hello all,

 I have a number of host objects that are sending notifications 
 continuously (down notification every half hour for the duration of 
 the outage), and aren't following my host escalation rules that are 
 defined for the group. I've looked at my config files and can't seem 
 to locate whatever is causing the set of hosts to not follow the 
 escalation rules. It is a group of sites that are all members of the 
 same host group, and use the same template.

 Below are the bits of config info pertaining to one individual host, 
 it's template, contact group, host escalation and host group. If 
 anyone could take a glance at it, to see if I'm missing something on 
 why this set of hosts isn't following my escalation rules; or if you 
 could point me towards something else that I need to look at, that'd 
 be awesome. (for what it's worth, all of my other host escalation 
 rules are working fine; it's just this group of hosts that don't seem 
 to like following the rules that are screwing with me).

 define host {

 use cstore,host-pnp

 host_name 7608_Madeup_Avenue_Shell_Router

 alias 7608_Madeup_Avenue_Shell_Router

 address 10.6.8.31

 hostgroups CStore-Sites

 }

 define hostgroup{

 hostgroup_name CStore-Sites

 }

 define host{

 name cstore

 use generic-switch

 notification_period workhours

 contact_groups cstore_contacts

 register 0

 icon_image cstore.png

 statusmap_image cstore.gd2

 notification_options d,r

 }

 define host{

 name generic-switch

 use generic-host

 check_period 24x7

 check_interval 5

 retry_interval 1

 max_check_attempts 10

 check_command check-host-alive

 notification_period 24x7

 notification_interval 30

 notification_options d,r

 contact_groups admins

 register 0

 }

 define hostescalation{

 hostgroup_name CStore-Sites

 contact_groups cstore_contacts

 first_notification 3

 last_notification 3

 notification_interval 20

 escalation_period workhours

 escalation_options d,r

 }

 define contactgroup{

 contactgroup_name cstore_contacts

 alias Retail Support

 members usernames

 }

 Thanks,

 Daniel Ceola

 Systems  DB Admin

 The Wills Group

 6355 Crain Hwy

 La Plata, MD 20646

 301-932-3600

 301-932-3643 (direct line)


 -
 - Don't let slow site performance ruin your business. Deploy 
 New Relic APM Deploy New Relic app performance management and know 
 exactly what is happening inside your Ruby, Python, PHP, Java, and 
 .NET app Try New Relic at no cost today and get our sweet Data Nerd 
 shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev


 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue.
 ::: Messages without supporting info will risk being sent to 
 /dev/null



 --
  Don't let slow site performance ruin your business. Deploy 
 New Relic APM Deploy New Relic app performance management and know 
 exactly what is happening inside your Ruby, Python, PHP, Java, and 
 .NET app Try New Relic at no cost today and get our sweet Data Nerd 
 shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev


 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


--
Don't let slow site performance ruin your business. Deploy New Relic APM Deploy 
New Relic app performance management and know exactly what is happening inside 
your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and 
get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev

Re: [Nagios-users] Repeating event handler in hard service state....

2012-10-09 Thread Frank Bulk
Simple script in cron...

-Original Message-
From: Peter Kaagman [mailto:p.kaag...@atlascollege.nl] 
Sent: Monday, October 08, 2012 9:17 AM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Repeating event handler in hard service state

Hi there list,

As I understand service specific event handlers are triggered for every
state change whenever a server is in a SOFT state, and once when a service
enters a HARD state.

Problem is that I have a service (an IPSEC tunnel) which is dependent on an
outside source. If the outside party fails (whenever they do updates once a
week) I actually kill the tunnel when attempting a restart. To solve this I
would like to keep trying the restart.

I could do this in a SOFT state by increasing the max check attempts to a
higher number... but than I would never get a notification. Letting the
service go to a HARD state (to get the notification) would limit the restart
attempt to just the one event when the service enters the HARD state.

I think there are 2 possibilities:

- Keep the service in a SOFT state and send out a notification on attempt X.
- Let the service go to a HARD state but keep on trying the restart.

Is there anyway I could achieve this? Or am I completely missing something. 

Peter


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. 
::: Messages without supporting info will risk being sent to /dev/null



--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Repeating event handler in hard service state....

2012-10-09 Thread Peter Kaagman


Van: Matthew Jurgens [mailto:nagiosus...@edcint.co.nz]
Verzonden: dinsdag 9 oktober 2012 0:27
Aan: Nagios Users List
Onderwerp: Re: [Nagios-users] Repeating event handler in hard service state


If you set the service to volatile if will run the event handler every time the 
service is not OK, even after multiple HARD states.

The event handler at edcint.co.nz/checkwmiplus will also give you fine grain 
control over exactly what states the event handler should do something 
including specific text strings in the service output. This may add some 
flexibility so that you only restart the tunnel if you really need to.

--
Smartmon System Monitoringhttp://www.smartmon.com.au
www.smartmon.com.auhttp://www.smartmon.com.au

[Peter Kaagman]
Thanks... that did the trick. Did not solve the notification part. But I can 
live with that

Peter

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Repeating event handler in hard service state....

2012-10-09 Thread Peter Kaagman


 -Oorspronkelijk bericht-
 Van: Frank Bulk [mailto:frnk...@iname.com]
 Verzonden: woensdag 10 oktober 2012 5:34
 Aan: nagios-users@lists.sourceforge.net
 Onderwerp: Re: [Nagios-users] Repeating event handler in hard service
 state
 
 Simple script in cron...
 
[Peter Kaagman] 
That is in fact how it all started out for me: putting ping checks in cron jobs.

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] NRPE: Unable to read output; but works when run under strace ...

2012-10-09 Thread Peter Kaagman


 -Oorspronkelijk bericht-
 Van: Florian Ernst [mailto:florian_er...@gmx.net]
 Verzonden: dinsdag 9 oktober 2012 20:14
 Aan: Nagios Users List
 Onderwerp: Re: [Nagios-users] NRPE: Unable to read output; but works
 when run under strace ...
 
 Hello Peter,
 
 thanks for your reply.
 
 However, as previously written, I know of the peculiarities that might arise
 once sudo joins the team, and in the issue at hand sudo is no more involved
 than being used for illustration purposes while the issue itself doesn't even
 remotely touch sudo at all.
 Furthermore, I never found any the necessity to deal with !requiretty on
 Debian, but indeed had to make use of this sudo option on RHEL. Still, no
 sudo involved in this case, sorry ...
 
 Cheers,
 Flo

[Peter Kaagman] 
Sorry that did not help you. Guess I should have read you post more closely...

Peter

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null