Re: [Nagios-users] serious performance issue
Hi All, Even I am also facing the same kind of issue. I am using Nagios 3.0.6 and Redhat 5 OS. I am not getting archive logs in the notification area and it says Error: Cannot open log file '/usr/local/nagios/var/archives/nagios-04-10-2009-00.log' for reading! Please help. Surajit From: shadih rahman [mailto:shadhi...@gmail.com] Sent: Thursday, April 09, 2009 7:25 PM To: fancyrabbit Cc: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] serious performance issue Now my nagios is not running any check at all. I get a lot of "looks like it was orphaned" message and then nagios just sit there. Can someone help me with this. I will add some entries from nagios.debug and nagios.log along with my nagios.cfg. Thanks in advance. nagios.debug: [1239284464.560241] [016.2] [pid=15690] Found another host check event for this host @ Thu Apr 9 08:59:56 2009 [1239284464.560248] [016.2] [pid=15690] New host check event occurs after the ex isting event, so we'll ignore it. [1239284464.560253] [016.2] [pid=15690] Keeping original host check event (ignor ing the new one). [1239284464.560261] [016.1] [pid=15690] ** Async check result for host 'iab323pc 20.atg.columbia.edu' handled: new state=0 nagios.log: [1239254607] Warning: The check of host 'et251pc70.atg.columbia.edu' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... [1239254607] Warning: The check of host 'et251pc71.atg.columbia.edu' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... [1239254607] Warning: The check of host 'et251pc72.atg.columbia.edu' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... nagiostats: Nagios Stats 3.0.6 Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org) Last Modified: 12-01-2008 License: GPL CURRENT STATUS DATA -- Status File:/var/log/nagios/status.dat Status File Age:0d 0h 0m 4s Status File Version:3.0.6 Program Running Time: 0d 15h 37m 5s Nagios PID: 15690 Used/High/Total Command Buffers:0 / 1 / 4096 Total Services: 2783 Services Checked: 2783 Services Scheduled: 2782 Services Actively Checked: 2783 Services Passively Checked: 0 Total Service State Change: 0.000 / 38.820 / 0.328 % Active Service Latency: 244.062 / 37353.761 / 22185.948 sec Active Service Execution Time: 0.010 / 15.072 / 0.293 sec Active Service State Change:0.000 / 38.820 / 0.328 % Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 Passive Service Latency:0.000 / 0.000 / 0.000 sec Passive Service State Change: 0.000 / 0.000 / 0.000 % Passive Services Last 1/5/15/60 min:0 / 0 / 0 / 0 Services Ok/Warn/Unk/Crit: 2571 / 14 / 143 / 55 Services Flapping: 19 Services In Downtime: 0 Total Hosts:3037 Hosts Checked: 3005 Hosts Scheduled:3030 Hosts Actively Checked: 3037 Host Passively Checked: 0 Total Host State Change:0.000 / 57.170 / 0.448 % Active Host Latency:0.000 / 36712.008 / 19785.947 sec Active Host Execution Time: 0.000 / 30.011 / 1.589 sec Active Host State Change: 0.000 / 57.170 / 0.448 % Active Hosts Last 1/5/15/60 min:0 / 0 / 0 / 299 Passive Host Latency: 0.000 / 0.000 / 0.000 sec Passive Host State Change: 0.000 / 0.000 / 0.000 % Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 Hosts Up/Down/Unreach: 2854 / 183 / 0 Hosts Flapping: 16 Hosts In Downtime: 0 Active Host Checks Last 1/5/15 min: 0 / 0 / 0 Scheduled: 0 / 0 / 0 On-demand: 0 / 0 / 0 Parallel:0 / 0 / 0 Serial: 0 / 0 / 0 Cached: 0 / 0 / 0 Passive Host Checks Last 1/5/15 min:0 / 0 / 0 Active Service Checks Last 1/5/15 min: 0 / 0 / 0 Scheduled: 0 / 0 / 0 On-demand: 0 / 0 / 0 Cached: 0 / 0 / 0 Passive Service Checks Last 1/5/15 min: 0 / 0 / 0 External Commands Last 1/5/15 min: 0 / 0 / 0 nagios.cfg: log_file=/var/log/nagios/nagios.log cfg_file=/etc/nagios/commands.cfg cfg_file=/etc/nagios/contacts.cfg cfg_file=/etc/nagios/timeperiods.cfg cfg_file=
Re: [Nagios-users] serious performance issue
Now my nagios is not running any check at all. I get a lot of "looks like it was orphaned" message and then nagios just sit there. Can someone help me with this. I will add some entries from nagios.debug and nagios.log along with my nagios.cfg. Thanks in advance. nagios.debug: [1239284464.560241] [016.2] [pid=15690] Found another host check event for this host @ Thu Apr 9 08:59:56 2009 [1239284464.560248] [016.2] [pid=15690] New host check event occurs after the ex isting event, so we'll ignore it. [1239284464.560253] [016.2] [pid=15690] Keeping original host check event (ignor ing the new one). [1239284464.560261] [016.1] [pid=15690] ** Async check result for host 'iab323pc 20.atg.columbia.edu' handled: new state=0 nagios.log: [1239254607] Warning: The check of host 'et251pc70.atg.columbia.edu' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... [1239254607] Warning: The check of host 'et251pc71.atg.columbia.edu' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... [1239254607] Warning: The check of host 'et251pc72.atg.columbia.edu' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the host... nagiostats: Nagios Stats 3.0.6 Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org) Last Modified: 12-01-2008 License: GPL CURRENT STATUS DATA -- Status File:/var/log/nagios/status.dat Status File Age:0d 0h 0m 4s Status File Version:3.0.6 Program Running Time: 0d 15h 37m 5s Nagios PID: 15690 Used/High/Total Command Buffers:0 / 1 / 4096 Total Services: 2783 Services Checked: 2783 Services Scheduled: 2782 Services Actively Checked: 2783 Services Passively Checked: 0 Total Service State Change: 0.000 / 38.820 / 0.328 % Active Service Latency: 244.062 / 37353.761 / 22185.948 sec Active Service Execution Time: 0.010 / 15.072 / 0.293 sec Active Service State Change:0.000 / 38.820 / 0.328 % Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 Passive Service Latency:0.000 / 0.000 / 0.000 sec Passive Service State Change: 0.000 / 0.000 / 0.000 % Passive Services Last 1/5/15/60 min:0 / 0 / 0 / 0 Services Ok/Warn/Unk/Crit: 2571 / 14 / 143 / 55 Services Flapping: 19 Services In Downtime: 0 Total Hosts:3037 Hosts Checked: 3005 Hosts Scheduled:3030 Hosts Actively Checked: 3037 Host Passively Checked: 0 Total Host State Change:0.000 / 57.170 / 0.448 % Active Host Latency:0.000 / 36712.008 / 19785.947 sec Active Host Execution Time: 0.000 / 30.011 / 1.589 sec Active Host State Change: 0.000 / 57.170 / 0.448 % Active Hosts Last 1/5/15/60 min:0 / 0 / 0 / 299 Passive Host Latency: 0.000 / 0.000 / 0.000 sec Passive Host State Change: 0.000 / 0.000 / 0.000 % Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 Hosts Up/Down/Unreach: 2854 / 183 / 0 Hosts Flapping: 16 Hosts In Downtime: 0 Active Host Checks Last 1/5/15 min: 0 / 0 / 0 Scheduled: 0 / 0 / 0 On-demand: 0 / 0 / 0 Parallel:0 / 0 / 0 Serial: 0 / 0 / 0 Cached: 0 / 0 / 0 Passive Host Checks Last 1/5/15 min:0 / 0 / 0 Active Service Checks Last 1/5/15 min: 0 / 0 / 0 Scheduled: 0 / 0 / 0 On-demand: 0 / 0 / 0 Cached: 0 / 0 / 0 Passive Service Checks Last 1/5/15 min: 0 / 0 / 0 External Commands Last 1/5/15 min: 0 / 0 / 0 nagios.cfg: log_file=/var/log/nagios/nagios.log cfg_file=/etc/nagios/commands.cfg cfg_file=/etc/nagios/contacts.cfg cfg_file=/etc/nagios/timeperiods.cfg cfg_file=/etc/nagios/templates.cfg cfg_dir=/etc/nagios/hosts cfg_dir=/etc/nagios/services object_cache_file=/var/log/nagios/objects.cache precached_object_file=/var/log/nagios/objects.precache resource_file=/etc/nagios/resource.cfg status_file=/var/log/nagios/status.dat status_update_interval=60 nagios_user=nagios nagios_group=nagios check_external_commands=1 command_check_interval=-1 command_file=/var/log/nagios/rw/nagios.cmd external_command_buffer_slots=4096 lock_file=/var/log/nagios/nagios.lock temp_file=/var/log/nagios/nagios.tmp temp_path=/tmp event_broker_options=8 broker_module=/usr/lib64/nagios/ndomod.o config_file=/etc/
Re: [Nagios-users] serious performance issue
i met almost the same issue. after tweaking enable_embedded_perl=0, the load average was brought up but latencies became lower. On Wed, Apr 8, 2009 at 11:54 AM, shadih rahman wrote: > I am seeing a ton of orphaned error message for both services and hosts. I > am running nagios on a quad core 2.2 GHZ machine running 4 GHZ memory. I > will paste my configuration file below. I have the machine sending ndo to a > local database sitting on a 170 GB Hard drive. nagios is obcessing on both > host and services and sending data to a machine with identical > configuration. I am doing failover using NSCA. Please advise on this. > > > > > > nagios.cfg > > > > log_file=/var/log/nagios/nagios.log > cfg_file=/etc/nagios/commands.cfg > cfg_file=/etc/nagios/contacts.cfg > cfg_file=/etc/nagios/timeperiods.cfg > cfg_file=/etc/nagios/templates.cfg > cfg_dir=/etc/nagios/hosts > cfg_dir=/etc/nagios/services > object_cache_file=/var/log/nagios/objects.cache > precached_object_file=/var/log/nagios/objects.precache > resource_file=/etc/nagios/resource.cfg > status_file=/var/log/nagios/status.dat > status_update_interval=60 > nagios_user=nagios > nagios_group=nagios > check_external_commands=1 > command_check_interval=-1 > command_file=/var/log/nagios/rw/nagios.cmd > external_command_buffer_slots=8192 > lock_file=/var/log/nagios/nagios.lock > temp_file=/var/log/nagios/nagios.tmp > temp_path=/tmp > event_broker_options=8 > broker_module=/usr/lib64/nagios/ndomod.o config_file=/etc/nagios/ndomod.cfg > log_rotation_method=m > log_archive_path=/var/log/nagios/archives > use_syslog=1 > log_notifications=1 > log_service_retries=1 > log_host_retries=1 > log_event_handlers=1 > log_initial_states=0 > log_external_commands=1 > log_passive_checks=1 > service_inter_check_delay_method=n > max_service_check_spread=30 > service_interleave_factor=s > host_inter_check_delay_method=s > max_host_check_spread=30 > max_concurrent_checks=0 > check_result_reaper_frequency=2 > max_check_result_reaper_time=10 > check_result_path=/var/log/nagios/spool/checkresults > max_check_result_file_age=3600 > cached_host_check_horizon=15 > cached_service_check_horizon=15 > enable_predictive_host_dependency_checks=1 > enable_predictive_service_dependency_checks=1 > soft_state_dependencies=1 > auto_reschedule_checks=1 > auto_rescheduling_interval=30 > auto_rescheduling_window=180 > sleep_time=0.25 > service_check_timeout=30 > host_check_timeout=20 > > event_handler_timeout=30 > notification_timeout=60 > ocsp_timeout=5 > perfdata_timeout=5 > retain_state_information=1 > state_retention_file=var/log/nagios/retention.dat > retention_update_interval=60 > use_retained_program_state=1 > use_retained_scheduling_info=1 > retained_host_attribute_mask=0 > retained_service_attribute_mask=0 > retained_process_host_attribute_mask=0 > retained_process_service_attribute_mask=0 > retained_contact_host_attribute_mask=0 > retained_contact_service_attribute_mask=0 > interval_length=60 > use_aggressive_host_checking=0 > execute_service_checks=1 > accept_passive_service_checks=1 > execute_host_checks=1 > accept_passive_host_checks=1 > enable_notifications=1 > enable_event_handlers=1 > process_performance_data=0 > obsess_over_services=1 > ocsp_command=send_service_check > ochp_command=send_host_check > obsess_over_hosts=1 > translate_passive_host_checks=0 > passive_host_checks_are_soft=0 > check_for_orphaned_services=1 > check_for_orphaned_hosts=1 > check_service_freshness=1 > service_freshness_check_interval=60 > check_host_freshness=0 > host_freshness_check_interval=60 > additional_freshness_latency=15 > enable_flap_detection=1 > low_service_flap_threshold=5.0 > high_service_flap_threshold=20.0 > low_host_flap_threshold=5.0 > high_host_flap_threshold=20.0 > date_format=us > enable_embedded_perl=1 > use_embedded_perl_implicitly=1 > illegal_object_name_chars=`~!$%^&*|'"<>?,()= > illegal_macro_output_chars=`~$&|'"<> > use_regexp_matching=0 > use_true_regexp_matching=0 > admin_email=sr2...@columbia.edu > daemon_dumps_core=0 > use_large_installation_tweaks=1 > enable_environment_macros=1 > debug_level=-1debug_verbosity=2 > debug_file=/var/log/nagios/nagios.debug > max_debug_file_size=100 > > > > > my nagiostats output > > > > > > > > [sr2690>nagiostats > > Nagios Stats 3.0.6 > Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org) > Last Modified: 12-01-2008 > License: GPL > > CURRENT STATUS DATA > -- > Status File:/var/log/nagios/status.dat > Status File Age:0d 0h 0m 19s > Status File Version:3.0.6 > > Program Running Time: 0d 2h 5m 28s > Nagios PID: 12139 > Used/High/Total Command Buffers:0 / 0 / 8192 > > Total Services: 2783 > Services Checked: 2783 > Services Scheduled: 2782 > Services Actively Checked: 2783 > Services Passiv
[Nagios-users] serious performance issue
I am seeing a ton of orphaned error message for both services and hosts. I am running nagios on a quad core 2.2 GHZ machine running 4 GHZ memory. I will paste my configuration file below. I have the machine sending ndo to a local database sitting on a 170 GB Hard drive. nagios is obcessing on both host and services and sending data to a machine with identical configuration. I am doing failover using NSCA. Please advise on this. nagios.cfg log_file=/var/log/nagios/nagios.log cfg_file=/etc/nagios/commands.cfg cfg_file=/etc/nagios/contacts.cfg cfg_file=/etc/nagios/timeperiods.cfg cfg_file=/etc/nagios/templates.cfg cfg_dir=/etc/nagios/hosts cfg_dir=/etc/nagios/services object_cache_file=/var/log/nagios/objects.cache precached_object_file=/var/log/nagios/objects.precache resource_file=/etc/nagios/resource.cfg status_file=/var/log/nagios/status.dat status_update_interval=60 nagios_user=nagios nagios_group=nagios check_external_commands=1 command_check_interval=-1 command_file=/var/log/nagios/rw/nagios.cmd external_command_buffer_slots=8192 lock_file=/var/log/nagios/nagios.lock temp_file=/var/log/nagios/nagios.tmp temp_path=/tmp event_broker_options=8 broker_module=/usr/lib64/nagios/ndomod.o config_file=/etc/nagios/ndomod.cfg log_rotation_method=m log_archive_path=/var/log/nagios/archives use_syslog=1 log_notifications=1 log_service_retries=1 log_host_retries=1 log_event_handlers=1 log_initial_states=0 log_external_commands=1 log_passive_checks=1 service_inter_check_delay_method=n max_service_check_spread=30 service_interleave_factor=s host_inter_check_delay_method=s max_host_check_spread=30 max_concurrent_checks=0 check_result_reaper_frequency=2 max_check_result_reaper_time=10 check_result_path=/var/log/nagios/spool/checkresults max_check_result_file_age=3600 cached_host_check_horizon=15 cached_service_check_horizon=15 enable_predictive_host_dependency_checks=1 enable_predictive_service_dependency_checks=1 soft_state_dependencies=1 auto_reschedule_checks=1 auto_rescheduling_interval=30 auto_rescheduling_window=180 sleep_time=0.25 service_check_timeout=30 host_check_timeout=20 event_handler_timeout=30 notification_timeout=60 ocsp_timeout=5 perfdata_timeout=5 retain_state_information=1 state_retention_file=var/log/nagios/retention.dat retention_update_interval=60 use_retained_program_state=1 use_retained_scheduling_info=1 retained_host_attribute_mask=0 retained_service_attribute_mask=0 retained_process_host_attribute_mask=0 retained_process_service_attribute_mask=0 retained_contact_host_attribute_mask=0 retained_contact_service_attribute_mask=0 interval_length=60 use_aggressive_host_checking=0 execute_service_checks=1 accept_passive_service_checks=1 execute_host_checks=1 accept_passive_host_checks=1 enable_notifications=1 enable_event_handlers=1 process_performance_data=0 obsess_over_services=1 ocsp_command=send_service_check ochp_command=send_host_check obsess_over_hosts=1 translate_passive_host_checks=0 passive_host_checks_are_soft=0 check_for_orphaned_services=1 check_for_orphaned_hosts=1 check_service_freshness=1 service_freshness_check_interval=60 check_host_freshness=0 host_freshness_check_interval=60 additional_freshness_latency=15 enable_flap_detection=1 low_service_flap_threshold=5.0 high_service_flap_threshold=20.0 low_host_flap_threshold=5.0 high_host_flap_threshold=20.0 date_format=us enable_embedded_perl=1 use_embedded_perl_implicitly=1 illegal_object_name_chars=`~!$%^&*|'"<>?,()= illegal_macro_output_chars=`~$&|'"<> use_regexp_matching=0 use_true_regexp_matching=0 admin_email=sr2...@columbia.edu daemon_dumps_core=0 use_large_installation_tweaks=1 enable_environment_macros=1 debug_level=-1debug_verbosity=2 debug_file=/var/log/nagios/nagios.debug max_debug_file_size=100 my nagiostats output [sr2690>nagiostats Nagios Stats 3.0.6 Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org) Last Modified: 12-01-2008 License: GPL CURRENT STATUS DATA -- Status File:/var/log/nagios/status.dat Status File Age:0d 0h 0m 19s Status File Version:3.0.6 Program Running Time: 0d 2h 5m 28s Nagios PID: 12139 Used/High/Total Command Buffers:0 / 0 / 8192 Total Services: 2783 Services Checked: 2783 Services Scheduled: 2782 Services Actively Checked: 2783 Services Passively Checked: 0 Total Service State Change: 0.000 / 52.830 / 0.263 % Active Service Latency: 1.304 / 12092.843 / 1469.130 sec Active Service Execution Time: 0.011 / 15.103 / 0.468 sec Active Service State Change:0.000 / 52.830 / 0.263 % Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 129 Passive Service Latency:0.000 / 0.000 / 0.000 sec Passive Service State Change: 0.000 / 0.000 / 0.000