Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)

2005-04-19 Thread Cyril Bouthors
On 16 Apr 2005, sean finney wrote:
 Nothing.

 and what about in the nagios logs?

Nothing.

 - are there any cronjobs that coincide with this?

 The crontab is ~100 lines long but nothing is related to Nagios
 (except the stupid script that restart it in this case).

 are there any cronjobs (even unrelated) that run around this time
 though?  my thought that something like a log rotation or a mysql
 dump might be stealing all of some kind of resource, causing the
 forks in nagios to fail.

As I told you, the server is constantly loaded by all those cronjobs
and daemons, the load average is between 1 and 4.

Nagios should not fail in this situation.

 could you post (or send privately if you prefer) your nagios.cfg?

Sure, here's the config file (comment excluded):

log_file=/var/log/nagios/nagios.log
cfg_file=/var/cache/nagios/plugins-auto.cfg
cfg_file=/etc/nagios/misccommands.cfg
cfg_file=/etc/nagios/contactgroups.cfg
cfg_file=/etc/nagios/contacts.cfg
cfg_file=/etc/nagios/dependencies.cfg
cfg_file=/etc/nagios/hostgroups.cfg
cfg_file=/etc/nagios/hosts.cfg
cfg_file=/etc/nagios/services.cfg
cfg_file=/etc/nagios/timeperiods.cfg
resource_file=/etc/nagios/resource.cfg
status_file=/var/log/nagios/status.log
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/var/run/nagios/nagios.cmd
comment_file=/var/log/nagios/comment.log
downtime_file=/var/log/nagios/downtime.log
lock_file=/var/log/nagios/nagios.lock
temp_file=/var/cache/nagios/nagios.tmp
log_rotation_method=d
log_archive_path=/var/log/nagios/archives
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_service_checks=1
inter_check_delay_method=s
service_interleave_factor=s
max_concurrent_checks=5
service_reaper_frequency=10
sleep_time=1
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/var/cache/nagios/status.sav
retention_update_interval=60
use_retained_program_state=0
interval_length=60
use_agressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=0
obsess_over_services=0
check_for_orphaned_services=0
check_service_freshness=1
freshness_check_interval=60
aggregate_status_updates=1
status_update_interval=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
illegal_object_name_chars=`~!$%^*|'?,()=
illegal_macro_output_chars=`~$|'
admin_email=nagios
admin_pager=pagenagios

This is mainly the default configuration file.
-- 
Cyril Bouthors


pgpLyzMuGtoV1.pgp
Description: PGP signature


Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)

2005-04-19 Thread sean finney
hi cyril,

On Tue, Apr 19, 2005 at 04:03:43PM +0300, Cyril Bouthors wrote:
 As I told you, the server is constantly loaded by all those cronjobs
 and daemons, the load average is between 1 and 4.
 
 Nagios should not fail in this situation.

this is true, but i'm wondering what makes you situation so different
from other people's (my nagios server has had loads like this and
not exhibited this behavior).

 max_concurrent_checks=5

could you see if scaling this down, and/or check_interval for
your services in services.cfg has any effect?


sean

-- 


signature.asc
Description: Digital signature


Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)

2005-04-15 Thread Cyril Bouthors
On 14 Apr 2005, sean finney wrote:
 - how often does this happen?

About every 2 hours.

By the way, http://cyril.bouthors.org/tmp/status.cgi.html is now
working again.

 - is it regular, or sporadic?

Sporadic.

 - is there anything else from your syslog from around these times?

Nothing.

 - are there any cronjobs that coincide with this?

The crontab is ~100 lines long but nothing is related to Nagios
(except the stupid script that restart it in this case).

 - what else is running on this server?

Apache, MySQL, CVS, NFS, arpwatch, snmpd, log2mail, DHCP, SSH, RSYNC,
Munin, MRTG, Exim4, ircd-hybrid, hddtemp, Bind.

None of those interferes with Nagios.
-- 
Cyril Bouthors


pgp3GB41Ay59Y.pgp
Description: PGP signature


Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)

2005-04-15 Thread sean finney
hi cyril,

On Fri, Apr 15, 2005 at 10:45:51AM +0300, Cyril Bouthors wrote:
  - is there anything else from your syslog from around these times?
 
 Nothing.

and what about in the nagios logs?

  - are there any cronjobs that coincide with this?
 
 The crontab is ~100 lines long but nothing is related to Nagios
 (except the stupid script that restart it in this case).

are there any cronjobs (even unrelated) that run around this time though?
my thought that something like a log rotation or a mysql dump might
be stealing all of some kind of resource, causing the forks in nagios
to fail.

  - what else is running on this server?
 
 Apache, MySQL, CVS, NFS, arpwatch, snmpd, log2mail, DHCP, SSH, RSYNC,
 Munin, MRTG, Exim4, ircd-hybrid, hddtemp, Bind.
 
 None of those interferes with Nagios.

hmm..

could you post (or send privately if you prefer) your nagios.cfg?
looking at that may give me an idea of some settings changes that might
help as well.


sean

-- 


signature.asc
Description: Digital signature


Bug#292473: [Pkg-nagios-devel] Bug#292473: acknowledged by developer (Bug#292473: fixed in nagios 2:1.3-cvs.20050402-1)

2005-04-13 Thread sean finney
On Wed, Apr 13, 2005 at 05:58:48PM +0300, Cyril Bouthors wrote:
 I've reopened that bug because I'm still facing the exact same issue
 with 1.3-cvs.20050402-1, I don't think it has something to do with the
 load because it's still continues to do the same for hours and days if
 the load goes back to 0.

okay... well let's start from square one again:

- how often does this happen?
- is it regular, or sporadic?
- is there anything else from your syslog from around these times?
- are there any cronjobs that coincide with this?
- what else is running on this server?


sean

-- 


signature.asc
Description: Digital signature