Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)
On 16 Apr 2005, sean finney wrote: Nothing. and what about in the nagios logs? Nothing. - are there any cronjobs that coincide with this? The crontab is ~100 lines long but nothing is related to Nagios (except the stupid script that restart it in this case). are there any cronjobs (even unrelated) that run around this time though? my thought that something like a log rotation or a mysql dump might be stealing all of some kind of resource, causing the forks in nagios to fail. As I told you, the server is constantly loaded by all those cronjobs and daemons, the load average is between 1 and 4. Nagios should not fail in this situation. could you post (or send privately if you prefer) your nagios.cfg? Sure, here's the config file (comment excluded): log_file=/var/log/nagios/nagios.log cfg_file=/var/cache/nagios/plugins-auto.cfg cfg_file=/etc/nagios/misccommands.cfg cfg_file=/etc/nagios/contactgroups.cfg cfg_file=/etc/nagios/contacts.cfg cfg_file=/etc/nagios/dependencies.cfg cfg_file=/etc/nagios/hostgroups.cfg cfg_file=/etc/nagios/hosts.cfg cfg_file=/etc/nagios/services.cfg cfg_file=/etc/nagios/timeperiods.cfg resource_file=/etc/nagios/resource.cfg status_file=/var/log/nagios/status.log nagios_user=nagios nagios_group=nagios check_external_commands=1 command_check_interval=-1 command_file=/var/run/nagios/nagios.cmd comment_file=/var/log/nagios/comment.log downtime_file=/var/log/nagios/downtime.log lock_file=/var/log/nagios/nagios.lock temp_file=/var/cache/nagios/nagios.tmp log_rotation_method=d log_archive_path=/var/log/nagios/archives use_syslog=1 log_notifications=1 log_service_retries=1 log_host_retries=1 log_event_handlers=1 log_initial_states=0 log_external_commands=1 log_passive_service_checks=1 inter_check_delay_method=s service_interleave_factor=s max_concurrent_checks=5 service_reaper_frequency=10 sleep_time=1 service_check_timeout=60 host_check_timeout=30 event_handler_timeout=30 notification_timeout=30 ocsp_timeout=5 perfdata_timeout=5 retain_state_information=1 state_retention_file=/var/cache/nagios/status.sav retention_update_interval=60 use_retained_program_state=0 interval_length=60 use_agressive_host_checking=0 execute_service_checks=1 accept_passive_service_checks=1 enable_notifications=1 enable_event_handlers=1 process_performance_data=0 obsess_over_services=0 check_for_orphaned_services=0 check_service_freshness=1 freshness_check_interval=60 aggregate_status_updates=1 status_update_interval=15 enable_flap_detection=0 low_service_flap_threshold=5.0 high_service_flap_threshold=20.0 low_host_flap_threshold=5.0 high_host_flap_threshold=20.0 date_format=us illegal_object_name_chars=`~!$%^*|'?,()= illegal_macro_output_chars=`~$|' admin_email=nagios admin_pager=pagenagios This is mainly the default configuration file. -- Cyril Bouthors pgpLyzMuGtoV1.pgp Description: PGP signature
Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)
hi cyril, On Tue, Apr 19, 2005 at 04:03:43PM +0300, Cyril Bouthors wrote: As I told you, the server is constantly loaded by all those cronjobs and daemons, the load average is between 1 and 4. Nagios should not fail in this situation. this is true, but i'm wondering what makes you situation so different from other people's (my nagios server has had loads like this and not exhibited this behavior). max_concurrent_checks=5 could you see if scaling this down, and/or check_interval for your services in services.cfg has any effect? sean -- signature.asc Description: Digital signature
Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)
On 14 Apr 2005, sean finney wrote: - how often does this happen? About every 2 hours. By the way, http://cyril.bouthors.org/tmp/status.cgi.html is now working again. - is it regular, or sporadic? Sporadic. - is there anything else from your syslog from around these times? Nothing. - are there any cronjobs that coincide with this? The crontab is ~100 lines long but nothing is related to Nagios (except the stupid script that restart it in this case). - what else is running on this server? Apache, MySQL, CVS, NFS, arpwatch, snmpd, log2mail, DHCP, SSH, RSYNC, Munin, MRTG, Exim4, ircd-hybrid, hddtemp, Bind. None of those interferes with Nagios. -- Cyril Bouthors pgp3GB41Ay59Y.pgp Description: PGP signature
Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)
hi cyril, On Fri, Apr 15, 2005 at 10:45:51AM +0300, Cyril Bouthors wrote: - is there anything else from your syslog from around these times? Nothing. and what about in the nagios logs? - are there any cronjobs that coincide with this? The crontab is ~100 lines long but nothing is related to Nagios (except the stupid script that restart it in this case). are there any cronjobs (even unrelated) that run around this time though? my thought that something like a log rotation or a mysql dump might be stealing all of some kind of resource, causing the forks in nagios to fail. - what else is running on this server? Apache, MySQL, CVS, NFS, arpwatch, snmpd, log2mail, DHCP, SSH, RSYNC, Munin, MRTG, Exim4, ircd-hybrid, hddtemp, Bind. None of those interferes with Nagios. hmm.. could you post (or send privately if you prefer) your nagios.cfg? looking at that may give me an idea of some settings changes that might help as well. sean -- signature.asc Description: Digital signature
Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)
On Wed, Apr 13, 2005 at 05:58:48PM +0300, Cyril Bouthors wrote: I've reopened that bug because I'm still facing the exact same issue with 1.3-cvs.20050402-1, I don't think it has something to do with the load because it's still continues to do the same for hours and days if the load goes back to 0. okay... well let's start from square one again: - how often does this happen? - is it regular, or sporadic? - is there anything else from your syslog from around these times? - are there any cronjobs that coincide with this? - what else is running on this server? sean -- signature.asc Description: Digital signature