[Nagios-users] Unneeded alerts from Nagios
hi: I have a nagios 2.9 instance running on an ESX linux guest. The problem we are seeing is that whenever we lose and regain network connectivity to the host, nagios wrongly sends a bunch of server down and server up alerts for all the servers that nagios is monitoring. this is how my hosts.cfg looks like for a typical hosts define host{ namegeneric-host; Generic template name notifications_enabled 1 ; Host notifications are enabled event_handler_enabled 1 ; Host event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information retain_nonstatus_information1 ; Retain non-status information register0 ; DONT REGISTER THIS DEFINITION } # This creates a generic host that your routers can use # monitors host(s) 24x7, notifies on down and recovery, checks 15 times before going critical, # notifies the contact_group every 30 minutes define host{ namebasic-host use generic-host check_command check-host-alive max_check_attempts 10 notification_interval 30 notification_period 24x7 notification_optionsd,r register0 } #adelphi define host{ use basic-host host_name adelphi alias adelphi address 172.xx.xx.xx (intentional) contact_groups rpfl-it } this is how my services.cfg file looks like - define service{ namegeneric-service ; Generic service name active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized obsess_over_service 1 ; We should obsess over this service check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information retain_nonstatus_information1 ; Retain non-status information register0 ; DONT REGISTER THIS DEFINITION } define service{ use generic-service namebasic-service is_volatile 0 check_period24x7 max_check_attempts 15 normal_check_interval 10 retry_check_interval2 notification_interval 0 notification_period none register0 } # Generic for all services # PING - ensure HOSTS are available. define service{ use basic-service nameping-service service_description PING notification_interval 30 contact_groups rpfl-it hostgroup_name PROD1 notification_optionsc,r notification_period 24x7 check_command check_ping!1000.0,20%!2000.0,60% } - the question I have is why would nagios send DOWN/UP alerts for all the hosts it is monitoring when it is just the host that it is on loses connectivity. thanks in advance -George - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Unneeded alerts from Nagios
> -Original Message- > From: [EMAIL PROTECTED] [mailto:nagios-users- > [EMAIL PROTECTED] On Behalf Of Monappallil, George > Sent: Wednesday, December 05, 2007 2:17 PM > To: nagios-users@lists.sourceforge.net > Subject: [Nagios-users] Unneeded alerts from Nagios > > hi: > I have a nagios 2.9 instance running on an ESX linux guest. The problem we > are seeing is that whenever we lose and regain network connectivity to the > host, nagios wrongly sends a bunch of server down and server up alerts for > all the servers that nagios is monitoring. > this is how my hosts.cfg looks like for a typical hosts > define host{ > namegeneric-host; Generic template > name > notifications_enabled 1 ; Host > notifications are enabled > event_handler_enabled 1 ; Host event > handler is enabled > flap_detection_enabled 1 ; Flap detection > is enabled > process_perf_data 1 ; Process > performance data > retain_status_information 1 ; Retain status > information > retain_nonstatus_information1 ; Retain non- > status information > register0 ; DONT REGISTER > THIS DEFINITION > } > > # This creates a generic host that your routers can use > # monitors host(s) 24x7, notifies on down and recovery, checks 15 times > before going critical, > # notifies the contact_group every 30 minutes > define host{ > namebasic-host > use generic-host > check_command check-host-alive > max_check_attempts 10 > notification_interval 30 > notification_period 24x7 > notification_optionsd,r > register0 > } > > #adelphi > define host{ > use basic-host > host_name adelphi > alias adelphi > address 172.xx.xx.xx (intentional) > contact_groups rpfl-it > } > the question I have is why would nagios send DOWN/UP alerts for all the > hosts it is monitoring when it is just the host that it is on loses > connectivity. The question is why is this surprising? Your description is that the machine nagios is running on loses network connectivity. Nagios can not reach network hosts that it is monitoring so it believes them to be down and sends notifications. You've not given nagios any way to tell otherwise. If you're unable to create a more stable environment for nagios (generally a mission-critical service), I'd recommend creating a host/service check for the default gateway and set that as the parent for all your other hosts. If the hosts become unreachable, nagios will verify if the default gateway is down and notify appropriately. -- Marc - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Unneeded alerts from Nagios
Hi, > hi: > I have a nagios 2.9 instance running on an ESX linux guest. The problem we > are seeing is that whenever we lose and regain network connectivity to the > host, nagios wrongly sends a bunch of server down and server up alerts for > all the servers that nagios is monitoring. Best if you have the entry 'parents' in the 'define host' definitions that describes the path that nagios has to take in order to reach a host. So a crude example may be. nagios server is housed in your office. some web server are housed in a data centre. you have a router in the office and one in the datacentre that provide a link. So, web server parent is the datacentre router the datacentre router parent is the office router if the office or datacentre routers fails then nagios knows that it's not going to be able to reach the web servers, and if you are monitoring the routers then they should alert but the web servers should not. that's my understanding of how it should work (and is what I have configured for my nagios system). -- bright blessings, Mark - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null