Marc, I doubled check the disk space last night thinking that might be the issue, but I have plenty of space:
Filesystem Size Used Avail Use% Mounted on /dev/hda3 109G 70G 33G 68% / /dev/hda1 99M 28M 66M 30% /boot As for the processes, I also thought of that scenario. All were killed prior to restarting. I'm going to build a version of nagios with debugging turned on this morning and run it. Thanks! Mike Here are a couple of samples of my hosts/services from the sensor: ######################################################################## #### define host { host_name Switch-35 alias Switch-35 address 10.xx.xx.xx hostgroups Company_Switches max_check_attempts 10 check_interval 1 active_checks_enabled 0 passive_checks_enabled 1 check_period 24x7 obsess_over_host 1 check_freshness 0 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 0 retain_status_information 1 retain_nonstatus_information 1 contact_groups Support notification_interval 2 notification_period 24x7 notification_options d,u,r notifications_enabled 0 register 1 } ######################################################################## #### ######################################################################## #### define service { hostgroup_name Company_Switches service_description check_ping is_volatile 1 check_command check_ping!150.0,20%!200.0,60% max_check_attempts 2 normal_check_interval 1 retry_check_interval 1 passive_checks_enabled 0 active_checks_enabled 1 check_period 24x7 parallelize_check 0 obsess_over_service 1 check_freshness 0 event_handler_enabled 0 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 contact_groups Support notification_interval 99 notification_period 24x7 notification_options w,u,c,r,f notifications_enabled 0 register 1 } ######################################################################## #### Hosts/Services from the Central Server: ######################################################################## #### define host { host_name Switch-35 alias Switch-35 address 10.xx.xx.xx hostgroups Company_Switches max_check_attempts 1 check_interval 1 active_checks_enabled 0 passive_checks_enabled 1 check_period 24x7 obsess_over_host 1 check_freshness 0 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 0 retain_status_information 1 retain_nonstatus_information 1 contact_groups Support notification_interval 1 notification_period 24x7 notification_options d,u,r notifications_enabled 1 register 1 } ######################################################################## #### ######################################################################## #### define service { hostgroup_name Company_Switches service_description check_ping is_volatile 1 check_command check_stale max_check_attempts 1 normal_check_interval 2 retry_check_interval 1 active_checks_enabled 0 passive_checks_enabled 1 check_period 24x7 parallelize_check 1 obsess_over_service 1 check_freshness 2 freshness_threshold 660 event_handler_enabled 1 low_flap_threshold 0 high_flap_threshold 0 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 contact_groups Support notification_interval 0 notification_period 24x7 notification_options w,u,c,r notifications_enabled 1 register 1 } ######################################################################## #### -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Powell Sent: Wednesday, February 15, 2006 8:21 AM To: Nagios Users Subject: RE: [Nagios-users] Nagios Hang? > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:nagios-users- > [EMAIL PROTECTED] On Behalf Of Mike Koponick > Sent: Wednesday, February 15, 2006 10:10 AM > To: Nagios Users > Subject: [Nagios-users] Nagios Hang? > > > > I'm running Nagios 2.0 (Stable) on Redhat 9.0, in a distributed > environment. I'm utilizing NSCA for checks and all appears to be working > properly. > > > > I'm running into several issues that seemed to have "started all of a > sudden". > > > > 1) On my distributed server, I don't see syslog messages any longer, > with the exception of "INITIAL SERVICE STATE" messages. Syslog is working, > and in the nagios.cfg file, "nagios.cfg:use_syslog=1" I used to see all > the check messages, etc. Nothing in the configuration has changed to the > best of my knowledge. > Make sure you haven't run out of disk space. Verify your log_ settings in nagios.cfg. > > 2) Nagios appears to "hang" on the remote sensor. Once I receive > notifications that network devices are down, I never see a recovery of the > network devices, even though they are recovered. The work around is to > restart nagios with "service nagios restart". Sometimes, this takes > multiple tries. Could be related to multiple nagios processes as below. One daemon sees the down and another sees the up. What have you verified so far? I'd check disk space, use strace to see what the daemon is doing, turn up logging as much as possible for both nagios and nsca and watch the logs. > 3) When I have a massive network outage, I receive the appropriate > alerts but I receive multiple "PROBLEM" notifications. I'm only using > service checks (I'm only using check_ping currently) and the > notification_interval set to "0", which according to the documentation > should limit the amount of messages I'm receiving to "1", unless I'm using > the service escalations, which I am not at this time. I am not receiving > multiple notifications for "OK" messages, which is what I would expect. Without seeing any example host and service config information this sounds very much like you might have multiple nagios daemons running at the same time. Stop nagios, make sure they're _all_ stopped and restart nagios. -- Marc ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=k&kid3432&bid#0486&dat1642 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null