Re: [Nagios-users] serious performance issue

2009-04-09 Thread Surajit Mukherjee
Hi All,

 

Even I am also facing the same kind of issue. I am using Nagios 3.0.6
and Redhat 5 OS.

 

I am not getting archive logs in the notification area and it says
Error: Cannot open log file
'/usr/local/nagios/var/archives/nagios-04-10-2009-00.log' for reading!

 

Please help.

 

Surajit 



From: shadih rahman [mailto:shadhi...@gmail.com] 
Sent: Thursday, April 09, 2009 7:25 PM
To: fancyrabbit
Cc: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] serious performance issue

 

Now my nagios is not running any check at all.  I get a lot of "looks
like it was orphaned" message and then nagios just sit there.  Can
someone help me with this.  I will add some entries from nagios.debug
and  nagios.log along with my nagios.cfg.  Thanks in advance.




nagios.debug: 

[1239284464.560241] [016.2] [pid=15690] Found another host check event
for this 
host @ Thu Apr  9 08:59:56 2009
[1239284464.560248] [016.2] [pid=15690] New host check event occurs
after the ex
isting event, so we'll ignore it.
[1239284464.560253] [016.2] [pid=15690] Keeping original host check
event (ignor
ing the new one).
[1239284464.560261] [016.1] [pid=15690] ** Async check result for host
'iab323pc
20.atg.columbia.edu' handled: new state=0



nagios.log:


[1239254607] Warning: The check of host 'et251pc70.atg.columbia.edu'
looks like 
it was orphaned (results never came back).  I'm scheduling an immediate
check of
 the host...
[1239254607] Warning: The check of host 'et251pc71.atg.columbia.edu'
looks like 
it was orphaned (results never came back).  I'm scheduling an immediate
check of
 the host...
[1239254607] Warning: The check of host 'et251pc72.atg.columbia.edu'
looks like 
it was orphaned (results never came back).  I'm scheduling an immediate
check of
 the host...


nagiostats:

Nagios Stats 3.0.6
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 12-01-2008
License: GPL

CURRENT STATUS DATA
--
Status File:/var/log/nagios/status.dat
Status File Age:0d 0h 0m 4s
Status File Version:3.0.6

Program Running Time:   0d 15h 37m 5s
Nagios PID: 15690
Used/High/Total Command Buffers:0 / 1 / 4096

Total Services: 2783
Services Checked:   2783
Services Scheduled: 2782
Services Actively Checked:  2783
Services Passively Checked: 0
Total Service State Change: 0.000 / 38.820 / 0.328 %
Active Service Latency: 244.062 / 37353.761 / 22185.948
sec
Active Service Execution Time:  0.010 / 15.072 / 0.293 sec
Active Service State Change:0.000 / 38.820 / 0.328 %
Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Passive Service Latency:0.000 / 0.000 / 0.000 sec
Passive Service State Change:   0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:  2571 / 14 / 143 / 55
Services Flapping:  19
Services In Downtime:   0

Total Hosts:3037
Hosts Checked:  3005
Hosts Scheduled:3030
Hosts Actively Checked: 3037
Host Passively Checked: 0
Total Host State Change:0.000 / 57.170 / 0.448 %
Active Host Latency:0.000 / 36712.008 / 19785.947
sec
Active Host Execution Time: 0.000 / 30.011 / 1.589 sec
Active Host State Change:   0.000 / 57.170 / 0.448 %
Active Hosts Last 1/5/15/60 min:0 / 0 / 0 / 299
Passive Host Latency:   0.000 / 0.000 / 0.000 sec
Passive Host State Change:  0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:   0 / 0 / 0 / 0
Hosts Up/Down/Unreach:  2854 / 183 / 0
Hosts Flapping: 16
Hosts In Downtime:  0

Active Host Checks Last 1/5/15 min: 0 / 0 / 0
   Scheduled:   0 / 0 / 0
   On-demand:   0 / 0 / 0
   Parallel:0 / 0 / 0
   Serial:  0 / 0 / 0
   Cached:  0 / 0 / 0
Passive Host Checks Last 1/5/15 min:0 / 0 / 0
Active Service Checks Last 1/5/15 min:  0 / 0 / 0
   Scheduled:   0 / 0 / 0
   On-demand:   0 / 0 / 0
   Cached:  0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min:  0 / 0 / 0



nagios.cfg:

log_file=/var/log/nagios/nagios.log
cfg_file=/etc/nagios/commands.cfg
cfg_file=/etc/nagios/contacts.cfg
cfg_file=/etc/nagios/timeperiods.cfg
cfg_file=

Re: [Nagios-users] serious performance issue

2009-04-09 Thread shadih rahman
Now my nagios is not running any check at all.  I get a lot of "looks like
it was orphaned" message and then nagios just sit there.  Can someone help
me with this.  I will add some entries from nagios.debug and  nagios.log
along with my nagios.cfg.  Thanks in advance.




nagios.debug:

[1239284464.560241] [016.2] [pid=15690] Found another host check event for
this
host @ Thu Apr  9 08:59:56 2009
[1239284464.560248] [016.2] [pid=15690] New host check event occurs after
the ex
isting event, so we'll ignore it.
[1239284464.560253] [016.2] [pid=15690] Keeping original host check event
(ignor
ing the new one).
[1239284464.560261] [016.1] [pid=15690] ** Async check result for host
'iab323pc
20.atg.columbia.edu' handled: new state=0



nagios.log:


[1239254607] Warning: The check of host 'et251pc70.atg.columbia.edu' looks
like
it was orphaned (results never came back).  I'm scheduling an immediate
check of
 the host...
[1239254607] Warning: The check of host 'et251pc71.atg.columbia.edu' looks
like
it was orphaned (results never came back).  I'm scheduling an immediate
check of
 the host...
[1239254607] Warning: The check of host 'et251pc72.atg.columbia.edu' looks
like
it was orphaned (results never came back).  I'm scheduling an immediate
check of
 the host...


nagiostats:

Nagios Stats 3.0.6
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 12-01-2008
License: GPL

CURRENT STATUS DATA
--
Status File:/var/log/nagios/status.dat
Status File Age:0d 0h 0m 4s
Status File Version:3.0.6

Program Running Time:   0d 15h 37m 5s
Nagios PID: 15690
Used/High/Total Command Buffers:0 / 1 / 4096

Total Services: 2783
Services Checked:   2783
Services Scheduled: 2782
Services Actively Checked:  2783
Services Passively Checked: 0
Total Service State Change: 0.000 / 38.820 / 0.328 %
Active Service Latency: 244.062 / 37353.761 / 22185.948 sec
Active Service Execution Time:  0.010 / 15.072 / 0.293 sec
Active Service State Change:0.000 / 38.820 / 0.328 %
Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Passive Service Latency:0.000 / 0.000 / 0.000 sec
Passive Service State Change:   0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:  2571 / 14 / 143 / 55
Services Flapping:  19
Services In Downtime:   0

Total Hosts:3037
Hosts Checked:  3005
Hosts Scheduled:3030
Hosts Actively Checked: 3037
Host Passively Checked: 0
Total Host State Change:0.000 / 57.170 / 0.448 %
Active Host Latency:0.000 / 36712.008 / 19785.947 sec
Active Host Execution Time: 0.000 / 30.011 / 1.589 sec
Active Host State Change:   0.000 / 57.170 / 0.448 %
Active Hosts Last 1/5/15/60 min:0 / 0 / 0 / 299
Passive Host Latency:   0.000 / 0.000 / 0.000 sec
Passive Host State Change:  0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:   0 / 0 / 0 / 0
Hosts Up/Down/Unreach:  2854 / 183 / 0
Hosts Flapping: 16
Hosts In Downtime:  0

Active Host Checks Last 1/5/15 min: 0 / 0 / 0
   Scheduled:   0 / 0 / 0
   On-demand:   0 / 0 / 0
   Parallel:0 / 0 / 0
   Serial:  0 / 0 / 0
   Cached:  0 / 0 / 0
Passive Host Checks Last 1/5/15 min:0 / 0 / 0
Active Service Checks Last 1/5/15 min:  0 / 0 / 0
   Scheduled:   0 / 0 / 0
   On-demand:   0 / 0 / 0
   Cached:  0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min:  0 / 0 / 0



nagios.cfg:

log_file=/var/log/nagios/nagios.log
cfg_file=/etc/nagios/commands.cfg
cfg_file=/etc/nagios/contacts.cfg
cfg_file=/etc/nagios/timeperiods.cfg
cfg_file=/etc/nagios/templates.cfg
cfg_dir=/etc/nagios/hosts
cfg_dir=/etc/nagios/services
object_cache_file=/var/log/nagios/objects.cache
precached_object_file=/var/log/nagios/objects.precache
resource_file=/etc/nagios/resource.cfg
status_file=/var/log/nagios/status.dat
status_update_interval=60
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/var/log/nagios/rw/nagios.cmd
external_command_buffer_slots=4096
lock_file=/var/log/nagios/nagios.lock
temp_file=/var/log/nagios/nagios.tmp
temp_path=/tmp
event_broker_options=8
broker_module=/usr/lib64/nagios/ndomod.o config_file=/etc/

Re: [Nagios-users] serious performance issue

2009-04-07 Thread fancyrabbit
i met almost the same issue.
after tweaking enable_embedded_perl=0, the load average was brought up but
latencies became lower.

On Wed, Apr 8, 2009 at 11:54 AM, shadih rahman  wrote:

> I am seeing a ton of orphaned error message for both services and hosts.  I
> am running nagios on a quad core 2.2 GHZ machine running 4 GHZ memory.  I
> will paste my configuration file below.  I have the machine sending ndo to a
> local database sitting on a 170 GB Hard drive.  nagios is obcessing on both
> host and services and sending data to a machine with identical
> configuration.  I am doing failover using NSCA.  Please advise on this.
>
>
>
>
>
> nagios.cfg
>
>
>
> log_file=/var/log/nagios/nagios.log
> cfg_file=/etc/nagios/commands.cfg
> cfg_file=/etc/nagios/contacts.cfg
> cfg_file=/etc/nagios/timeperiods.cfg
> cfg_file=/etc/nagios/templates.cfg
> cfg_dir=/etc/nagios/hosts
> cfg_dir=/etc/nagios/services
> object_cache_file=/var/log/nagios/objects.cache
> precached_object_file=/var/log/nagios/objects.precache
> resource_file=/etc/nagios/resource.cfg
> status_file=/var/log/nagios/status.dat
> status_update_interval=60
> nagios_user=nagios
> nagios_group=nagios
> check_external_commands=1
> command_check_interval=-1
> command_file=/var/log/nagios/rw/nagios.cmd
> external_command_buffer_slots=8192
> lock_file=/var/log/nagios/nagios.lock
> temp_file=/var/log/nagios/nagios.tmp
> temp_path=/tmp
> event_broker_options=8
> broker_module=/usr/lib64/nagios/ndomod.o config_file=/etc/nagios/ndomod.cfg
> log_rotation_method=m
> log_archive_path=/var/log/nagios/archives
> use_syslog=1
> log_notifications=1
> log_service_retries=1
> log_host_retries=1
> log_event_handlers=1
> log_initial_states=0
> log_external_commands=1
> log_passive_checks=1
> service_inter_check_delay_method=n
> max_service_check_spread=30
> service_interleave_factor=s
> host_inter_check_delay_method=s
> max_host_check_spread=30
> max_concurrent_checks=0
> check_result_reaper_frequency=2
> max_check_result_reaper_time=10
> check_result_path=/var/log/nagios/spool/checkresults
> max_check_result_file_age=3600
> cached_host_check_horizon=15
> cached_service_check_horizon=15
> enable_predictive_host_dependency_checks=1
> enable_predictive_service_dependency_checks=1
> soft_state_dependencies=1
> auto_reschedule_checks=1
> auto_rescheduling_interval=30
> auto_rescheduling_window=180
> sleep_time=0.25
> service_check_timeout=30
> host_check_timeout=20
>
> event_handler_timeout=30
> notification_timeout=60
> ocsp_timeout=5
> perfdata_timeout=5
> retain_state_information=1
> state_retention_file=var/log/nagios/retention.dat
> retention_update_interval=60
> use_retained_program_state=1
> use_retained_scheduling_info=1
> retained_host_attribute_mask=0
> retained_service_attribute_mask=0
> retained_process_host_attribute_mask=0
> retained_process_service_attribute_mask=0
> retained_contact_host_attribute_mask=0
> retained_contact_service_attribute_mask=0
> interval_length=60
> use_aggressive_host_checking=0
> execute_service_checks=1
> accept_passive_service_checks=1
> execute_host_checks=1
> accept_passive_host_checks=1
> enable_notifications=1
> enable_event_handlers=1
> process_performance_data=0
> obsess_over_services=1
> ocsp_command=send_service_check
> ochp_command=send_host_check
> obsess_over_hosts=1
> translate_passive_host_checks=0
> passive_host_checks_are_soft=0
> check_for_orphaned_services=1
> check_for_orphaned_hosts=1
> check_service_freshness=1
> service_freshness_check_interval=60
> check_host_freshness=0
> host_freshness_check_interval=60
> additional_freshness_latency=15
> enable_flap_detection=1
> low_service_flap_threshold=5.0
> high_service_flap_threshold=20.0
> low_host_flap_threshold=5.0
> high_host_flap_threshold=20.0
> date_format=us
> enable_embedded_perl=1
> use_embedded_perl_implicitly=1
> illegal_object_name_chars=`~!$%^&*|'"<>?,()=
> illegal_macro_output_chars=`~$&|'"<>
> use_regexp_matching=0
> use_true_regexp_matching=0
> admin_email=sr2...@columbia.edu
> daemon_dumps_core=0
> use_large_installation_tweaks=1
> enable_environment_macros=1
> debug_level=-1debug_verbosity=2
> debug_file=/var/log/nagios/nagios.debug
> max_debug_file_size=100
>
>
>
>
> my nagiostats output
>
>
>
>
>
>
>
> [sr2690>nagiostats
>
> Nagios Stats 3.0.6
> Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
> Last Modified: 12-01-2008
> License: GPL
>
> CURRENT STATUS DATA
> --
> Status File:/var/log/nagios/status.dat
> Status File Age:0d 0h 0m 19s
> Status File Version:3.0.6
>
> Program Running Time:   0d 2h 5m 28s
> Nagios PID: 12139
> Used/High/Total Command Buffers:0 / 0 / 8192
>
> Total Services: 2783
> Services Checked:   2783
> Services Scheduled: 2782
> Services Actively Checked:  2783
> Services Passiv

[Nagios-users] serious performance issue

2009-04-07 Thread shadih rahman
I am seeing a ton of orphaned error message for both services and hosts.  I
am running nagios on a quad core 2.2 GHZ machine running 4 GHZ memory.  I
will paste my configuration file below.  I have the machine sending ndo to a
local database sitting on a 170 GB Hard drive.  nagios is obcessing on both
host and services and sending data to a machine with identical
configuration.  I am doing failover using NSCA.  Please advise on this.





nagios.cfg



log_file=/var/log/nagios/nagios.log
cfg_file=/etc/nagios/commands.cfg
cfg_file=/etc/nagios/contacts.cfg
cfg_file=/etc/nagios/timeperiods.cfg
cfg_file=/etc/nagios/templates.cfg
cfg_dir=/etc/nagios/hosts
cfg_dir=/etc/nagios/services
object_cache_file=/var/log/nagios/objects.cache
precached_object_file=/var/log/nagios/objects.precache
resource_file=/etc/nagios/resource.cfg
status_file=/var/log/nagios/status.dat
status_update_interval=60
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/var/log/nagios/rw/nagios.cmd
external_command_buffer_slots=8192
lock_file=/var/log/nagios/nagios.lock
temp_file=/var/log/nagios/nagios.tmp
temp_path=/tmp
event_broker_options=8
broker_module=/usr/lib64/nagios/ndomod.o config_file=/etc/nagios/ndomod.cfg
log_rotation_method=m
log_archive_path=/var/log/nagios/archives
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=n
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=2
max_check_result_reaper_time=10
check_result_path=/var/log/nagios/spool/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=1
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=30
host_check_timeout=20

event_handler_timeout=30
notification_timeout=60
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=var/log/nagios/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=0
obsess_over_services=1
ocsp_command=send_service_check
ochp_command=send_host_check
obsess_over_hosts=1
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=0
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
enable_embedded_perl=1
use_embedded_perl_implicitly=1
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=sr2...@columbia.edu
daemon_dumps_core=0
use_large_installation_tweaks=1
enable_environment_macros=1
debug_level=-1debug_verbosity=2
debug_file=/var/log/nagios/nagios.debug
max_debug_file_size=100




my nagiostats output







[sr2690>nagiostats

Nagios Stats 3.0.6
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 12-01-2008
License: GPL

CURRENT STATUS DATA
--
Status File:/var/log/nagios/status.dat
Status File Age:0d 0h 0m 19s
Status File Version:3.0.6

Program Running Time:   0d 2h 5m 28s
Nagios PID: 12139
Used/High/Total Command Buffers:0 / 0 / 8192

Total Services: 2783
Services Checked:   2783
Services Scheduled: 2782
Services Actively Checked:  2783
Services Passively Checked: 0
Total Service State Change: 0.000 / 52.830 / 0.263 %
Active Service Latency: 1.304 / 12092.843 / 1469.130 sec
Active Service Execution Time:  0.011 / 15.103 / 0.468 sec
Active Service State Change:0.000 / 52.830 / 0.263 %
Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 129
Passive Service Latency:0.000 / 0.000 / 0.000 sec
Passive Service State Change:   0.000 / 0.000 / 0.000