[Nagios-users] Unneeded alerts from Nagios

2007-12-05 Thread Monappallil, George
hi:
I have a nagios 2.9 instance running on an ESX linux guest. The problem
we are seeing is that whenever we lose and regain network connectivity
to the host, nagios wrongly sends a bunch of server down and server up
alerts for all the servers that nagios is monitoring. 
this is how my hosts.cfg looks like for a typical hosts
define host{
namegeneric-host; Generic
template name
notifications_enabled   1   ; Host
notifications are enabled
event_handler_enabled   1   ; Host event
handler is enabled
flap_detection_enabled  1   ; Flap detection
is enabled
process_perf_data   1   ; Process
performance data
retain_status_information   1   ; Retain status
information
retain_nonstatus_information1   ; Retain
non-status information
register0   ; DONT REGISTER
THIS DEFINITION
}
 
# This creates a generic host that your routers can use
# monitors host(s) 24x7, notifies on down and recovery, checks 15 times
before going critical,
# notifies the contact_group every 30 minutes
define host{
namebasic-host
use generic-host
check_command   check-host-alive
max_check_attempts  10
notification_interval   30
notification_period 24x7
notification_optionsd,r
register0
}
 
#adelphi
define host{
use basic-host
host_name   adelphi
alias   adelphi
address 172.xx.xx.xx (intentional)
contact_groups  rpfl-it
}
 
this is how my services.cfg file looks like
-
define service{
namegeneric-service ; Generic
service name
active_checks_enabled   1   ; Active service
checks are enabled
passive_checks_enabled  1   ; Passive
service checks are enabled/accepted
parallelize_check   1   ; Active service
checks should be parallelized
obsess_over_service 1   ; We should
obsess over this service
check_freshness 0   ; Default is to
NOT check service 'freshness'
notifications_enabled   1   ; Service
notifications are enabled
event_handler_enabled   1   ; Service event
handler is enabled
flap_detection_enabled  1   ; Flap detection
is enabled
process_perf_data   1   ; Process
performance data
retain_status_information   1   ; Retain status
information
retain_nonstatus_information1   ; Retain
non-status information
register0   ; DONT REGISTER
THIS DEFINITION
}
 
define service{
use generic-service
namebasic-service
is_volatile 0
check_period24x7
max_check_attempts  15
normal_check_interval   10
retry_check_interval2
notification_interval   0
notification_period none
register0
}
 
# Generic for all services
# PING - ensure HOSTS are available.
define service{
use basic-service
nameping-service
service_description PING
notification_interval   30
contact_groups  rpfl-it
hostgroup_name  PROD1
notification_optionsc,r
notification_period 24x7
check_command   check_ping!1000.0,20%!2000.0,60%
}
-
 
the question I have is why would nagios send DOWN/UP alerts for all the
hosts it is monitoring when it is just the host that it is on loses
connectivity.
 
thanks in advance

-George 

 
-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Unneeded alerts from Nagios

2007-12-05 Thread Marc Powell


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:nagios-users-
> [EMAIL PROTECTED] On Behalf Of Monappallil, George
> Sent: Wednesday, December 05, 2007 2:17 PM
> To: nagios-users@lists.sourceforge.net
> Subject: [Nagios-users] Unneeded alerts from Nagios
> 
> hi:
> I have a nagios 2.9 instance running on an ESX linux guest. The
problem we
> are seeing is that whenever we lose and regain network connectivity to
the
> host, nagios wrongly sends a bunch of server down and server up alerts
for
> all the servers that nagios is monitoring.
> this is how my hosts.cfg looks like for a typical hosts
> define host{
> namegeneric-host; Generic
template
> name
> notifications_enabled   1   ; Host
> notifications are enabled
> event_handler_enabled   1   ; Host event
> handler is enabled
> flap_detection_enabled  1   ; Flap
detection
> is enabled
> process_perf_data   1   ; Process
> performance data
> retain_status_information   1   ; Retain
status
> information
> retain_nonstatus_information1   ; Retain non-
> status information
> register0   ; DONT
REGISTER
> THIS DEFINITION
> }
> 
> # This creates a generic host that your routers can use
> # monitors host(s) 24x7, notifies on down and recovery, checks 15
times
> before going critical,
> # notifies the contact_group every 30 minutes
> define host{
> namebasic-host
> use generic-host
> check_command   check-host-alive
> max_check_attempts  10
> notification_interval   30
> notification_period 24x7
> notification_optionsd,r
> register0
> }
> 
> #adelphi
> define host{
> use basic-host
> host_name   adelphi
> alias   adelphi
> address 172.xx.xx.xx (intentional)
> contact_groups  rpfl-it
> }


> the question I have is why would nagios send DOWN/UP alerts for all
the
> hosts it is monitoring when it is just the host that it is on loses
> connectivity.

The question is why is this surprising? Your description is that the
machine nagios is running on loses network connectivity. Nagios can not
reach network hosts that it is monitoring so it believes them to be down
and sends notifications. You've not given nagios any way to tell
otherwise.

If you're unable to create a more stable environment for nagios
(generally a mission-critical service), I'd recommend creating a
host/service check for the default gateway and set that as the parent
for all your other hosts. If the hosts become unreachable, nagios will
verify if the default gateway is down and notify appropriately.

--
Marc

-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Unneeded alerts from Nagios

2007-12-06 Thread mark redding
Hi,

> hi:
> I have a nagios 2.9 instance running on an ESX linux guest. The problem we
> are seeing is that whenever we lose and regain network connectivity to the
> host, nagios wrongly sends a bunch of server down and server up alerts for
> all the servers that nagios is monitoring.

Best if you have the entry 'parents' in the 'define host' definitions
that describes the path that nagios has to take in order to reach a
host.

So a crude example may be.

nagios server is housed in your office.
some web server are housed in a data centre.
you have a router in the office and one in the datacentre that provide a link.

So, web server parent is the datacentre router
the datacentre router parent is the office router
if the office or datacentre routers fails then nagios knows that it's
not going to be able to reach the web servers, and if you are
monitoring the routers then they should alert but the web servers
should not.

that's my understanding of how it should work (and is what I have
configured for my nagios system).
-- 
bright blessings,
Mark

-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null