A combination of tweaks seems to have fixed this. Lowering service_reaper_frequency combined with turning on smart interleaving seems to make Nagios quite a bit better at catching problems.
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Nagios seems to have been very unstable the past couple of weeks. The > only change I've made is upgrading from 2.2 to 2.4. It could just be > that I have > some bad configuration options, but I'm not sure. I just had a server > go down for an hour, and Nagios never caught it. In general, since > upgrading to 2.0 > Nagios seems very slow on catching broken services/hosts, but usually > checks them (not always). I look at my status overview right now, and > Nagios > says Last Check for almost every service is two days old. Any ideas > on what I'm doing wrong? > > Here's my Nagios log without comments or blank lines > > log_file=/nagios/services/nagios/var/nagios.log > cfg_file=/nagios/services/nagios/etc/checkcommands.cfg > cfg_file=/nagios/services/nagios/etc/contact-templates.cfg > cfg_file=/nagios/services/nagios/etc/contactgroups.cfg > cfg_file=/nagios/services/nagios/etc/contacts.cfg > cfg_file=/nagios/services/nagios/etc/escalations.cfg > cfg_file=/nagios/services/nagios/etc/host-templates.cfg > cfg_file=/nagios/services/nagios/etc/service-templates.cfg > cfg_file=/nagios/services/nagios/etc/misccommands.cfg > cfg_file=/nagios/services/nagios/etc/time_periods.cfg > cfg_file=/nagios/services/nagios/etc/nagios-commands.cfg > cfg_file=/nagios/services/nagios/etc/nagios-hostgroups.cfg > cfg_file=/nagios/services/nagios/etc/nagios-hosts.cfg > cfg_file=/nagios/services/nagios/etc/nagios-service-templates.cfg > cfg_file=/nagios/services/nagios/etc/nagios-services.cfg > object_cache_file=/nagios/services/nagios/var/objects.cache > resource_file=/nagios/services/nagios/etc/resource.cfg > temp_file=/nagios/services/nagios/var/nagios.tmp > status_file=/nagios/services/nagios/var/status.dat > aggregate_status_updates=1 > status_update_interval=15 > nagios_user=nagios > nagios_group=nagios > enable_notifications=1 > execute_service_checks=1 > accept_passive_service_checks=0 > execute_host_checks=1 > accept_passive_host_checks=0 > enable_event_handlers=1 > log_rotation_method=d > log_archive_path=/nagios/services/nagios/var/archives > check_external_commands=1 > command_check_interval=60s > command_file=/nagios/services/nagios/var/rw/nagios.cmd > downtime_file=/nagios/services/nagios/var/downtime.dat > comment_file=/nagios/services/nagios/var/comments.dat > lock_file=/nagios/services/nagios/var/nagios.lock > retain_state_information=1 > state_retention_file=/nagios/services/nagios/var/retention.dat > use_retained_scheduling_info=1 > retention_update_interval=0 > use_retained_program_state=1 > use_syslog=1 > log_notifications=1 > log_service_retries=1 > log_host_retries=1 > log_event_handlers=1 > log_initial_states=0 > log_external_commands=1 > log_passive_checks=0 > sleep_time=0.25 > service_inter_check_delay_method=n > max_service_check_spread=5 > service_interleave_factor=s > max_concurrent_checks=300 > service_reaper_frequency=40 > host_inter_check_delay_method=n > max_host_check_spread=5 > interval_length=60 > auto_reschedule_checks=0 > auto_rescheduling_interval=30 > auto_rescheduling_window=30 > use_agressive_host_checking=0 > enable_flap_detection=0 > low_service_flap_threshold=5.0 > high_service_flap_threshold=20.0 > low_host_flap_threshold=5.0 > high_host_flap_threshold=20.0 > soft_state_dependencies=0 > service_check_timeout=60 > host_check_timeout=30 > event_handler_timeout=30 > notification_timeout=30 > ocsp_timeout=5 > perfdata_timeout=5 > obsess_over_services=0 > process_performance_data=0 > check_for_orphaned_services=0 > check_service_freshness=0 > freshness_check_interval=60 > check_host_freshness=0 > host_freshness_check_interval=60 > date_format=us > illegal_object_name_chars=`~!$%^&*|'"<>?,()'= > illegal_macro_output_chars=`~$&|'"<> > use_regexp_matching=0 > use_true_regexp_matching=0 > [EMAIL PROTECTED] > [EMAIL PROTECTED] > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.1 (Darwin) > > iD8DBQFEqVHlwjCqooJyNAMRAgeJAKCSF6mCLLr9uRhtwHng+cW6W2/4VwCbBrOS > cjp0AoxpQp1pj72WGsqs4RQ= > =vHlh > -----END PGP SIGNATURE----- > ------------------- BitPusher, LLC http://www.bitpusher.com/ 1.888.9PUSHER (415) 724.7998 - Mobile Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null