try lowering max_check_result_reaper value.... I had good luck playing with that value. Thanks
On Tue, May 4, 2010 at 8:13 PM, Trisha Hoang <tri...@rockyou.com> wrote: > Hi, > The nagios *master *got really high host latency and I'm not sure how to > tweak it. I ran the check_ping plugin on a handful of hosts and the rta > averaged at 0.2 second so it's not the network. > > *Environment:* > - 565 hosts > - 6790 passive checks from the slaves > - not using event broker > - master server *actively* executes the hosts checks every 5 minutes and > *passively > *processes checks every 1 minute > - not doing performance data > > *Nagiostats* > > Nagios Stats 3.2.1 > Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org) > Last Modified: 03-09-2010 > License: GPL > > CURRENT STATUS DATA > ------------------------------------------------------ > Status File: /var/log/nagios/status.dat > Status File Age: 0d 0h 0m 23s > Status File Version: 3.2.1 > > Program Running Time: 0d 1h 32m 19s > Nagios PID: 28282 > Used/High/Total Command Buffers: 1316 / 3066 / 4096 > > Total Services: 7745 > Services Checked: 7745 > Services Scheduled: 1381 > Services Actively Checked: 955 > Services Passively Checked: 6790 > Total Service State Change: 0.000 / 9.740 / 0.007 % > Active Service Latency: 18.948 / 205.144 / 165.751 sec > Active Service Execution Time: 0.007 / 9.051 / 0.055 sec > Active Service State Change: 0.000 / 5.460 / 0.006 % > Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 > Passive Service Latency: 34.359 / 190.247 / 76.739 sec > Passive Service State Change: 0.000 / 9.740 / 0.008 % > Passive Services Last 1/5/15/60 min: 0 / 3054 / 6774 / 6784 > Services Ok/Warn/Unk/Crit: 7720 / 1 / 0 / 24 > Services Flapping: 27 > Services In Downtime: 0 > > Total Hosts: 566 > Hosts Checked: 566 > Hosts Scheduled: 566 > Hosts Actively Checked: 566 > Host Passively Checked: 0 > Total Host State Change: 0.000 / 0.000 / 0.000 % > Active Host Latency: 0.000 / 3410.087 / 2413.051 sec > Active Host Execution Time: 0.007 / 10.010 / 0.063 sec > Active Host State Change: 0.000 / 0.000 / 0.000 % > Active Hosts Last 1/5/15/60 min: 0 / 8 / 10 / 565 > Passive Host Latency: 0.000 / 0.000 / 0.000 sec > Passive Host State Change: 0.000 / 0.000 / 0.000 % > Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 > Hosts Up/Down/Unreach: 563 / 3 / 0 > Hosts Flapping: 1 > Hosts In Downtime: 0 > > Active Host Checks Last 1/5/15 min: 5 / 32 / 75 > Scheduled: 0 / 0 / 0 > On-demand: 5 / 32 / 75 > Parallel: 1 / 11 / 23 > Serial: 0 / 0 / 0 > Cached: 4 / 21 / 52 > Passive Host Checks Last 1/5/15 min: 0 / 0 / 0 > Active Service Checks Last 1/5/15 min: 0 / 0 / 0 > Scheduled: 0 / 0 / 0 > On-demand: 0 / 0 / 0 > Cached: 0 / 0 / 0 > Passive Service Checks Last 1/5/15 min: 2 / 1455 / 1455 > > External Commands Last 1/5/15 min: 1302 / 6063 / 20253 > > > *Nagios.cfg* > > # EXTERNAL COMMAND CHECK INTERVAL > # This is the interval at which Nagios should check for external commands. > # This value works of the interval_length you specify later. If you leave > # that at its default value of 60 (seconds), a value of 1 here will cause > # Nagios to check for external commands every minute. If you specify a > # number followed by an "s" (i.e. 15s), this will be interpreted to mean > # actual seconds rather than a multiple of the interval_length variable. > # Note: In addition to reading the external command file at regularly > # scheduled intervals, Nagios will also check for external commands after > # event handlers are executed. > # NOTE: Setting this value to -1 causes Nagios to check the external > # command file as often as possible. > > #command_check_interval=15s > command_check_interval=-1 > > # SERVICE INTER-CHECK DELAY METHOD > # This is the method that Nagios should use when initially > # "spreading out" service checks when it starts monitoring. The > # default is to use smart delay calculation, which will try to > # space all service checks out evenly to minimize CPU load. > # Using the dumb setting will cause all checks to be scheduled > # at the same time (with no delay between them)! This is not a > # good thing for production, but is useful when testing the > # parallelization functionality. > # n = None - don't use any delay between checks > # d = Use a "dumb" delay of 1 second between checks > # s = Use "smart" inter-check delay calculation > # x.xx = Use an inter-check delay of x.xx seconds > > service_inter_check_delay_method=s > > # MAXIMUM SERVICE CHECK SPREAD > # This variable determines the timeframe (in minutes) from the > # program start time that an initial check of all services should > # be completed. Default is 30 minutes. > > max_service_check_spread=30 > > # SERVICE CHECK INTERLEAVE FACTOR > # This variable determines how service checks are interleaved. > # Interleaving the service checks allows for a more even > # distribution of service checks and reduced load on remote > # hosts. Setting this value to 1 is equivalent to how versions > # of Nagios previous to 0.0.5 did service checks. Set this > # value to s (smart) for automatic calculation of the interleave > # factor unless you have a specific reason to change it. > # s = Use "smart" interleave factor calculation > # x = Use an interleave factor of x, where x is a > # number greater than or equal to 1. > > service_interleave_factor=s > > # HOST INTER-CHECK DELAY METHOD > # This is the method that Nagios should use when initially > # "spreading out" host checks when it starts monitoring. The > # default is to use smart delay calculation, which will try to > # space all host checks out evenly to minimize CPU load. > # Using the dumb setting will cause all checks to be scheduled > # at the same time (with no delay between them)! > # n = None - don't use any delay between checks > # d = Use a "dumb" delay of 1 second between checks > # s = Use "smart" inter-check delay calculation > # x.xx = Use an inter-check delay of x.xx seconds > > host_inter_check_delay_method=s > > > # MAXIMUM HOST CHECK SPREAD > # This variable determines the timeframe (in minutes) from the > # program start time that an initial check of all hosts should > # be completed. Default is 30 minutes. > > max_host_check_spread=30 > > > # MAXIMUM CONCURRENT SERVICE CHECKS > # This option allows you to specify the maximum number of > # service checks that can be run in parallel at any given time. > # Specifying a value of 1 for this variable essentially prevents > # any service checks from being parallelized. A value of 0 > # will not restrict the number of concurrent checks that are > # being executed. > > max_concurrent_checks=0 > > > # HOST AND SERVICE CHECK REAPER FREQUENCY > # This is the frequency (in seconds!) that Nagios will process > # the results of host and service checks. > > check_result_reaper_frequency=10 > > # MAX CHECK RESULT REAPER TIME > # This is the max amount of time (in seconds) that a single > # check result reaper event will be allowed to run before > # returning control back to Nagios so it can perform other > # duties. > > max_check_result_reaper_time=30 > > > # CHECK RESULT PATH > # This is directory where Nagios stores the results of host and > # service checks that have not yet been processed. > # > # Note: Make sure that only one instance of Nagios has access > # to this directory! > > check_result_path=/var/log/nagios/spool/checkresults > > > # MAX CHECK RESULT FILE AGE > # This option determines the maximum age (in seconds) which check > # result files are considered to be valid. Files older than this > # threshold will be mercilessly deleted without further processing. > > max_check_result_file_age=3600 > > > # CACHED HOST CHECK HORIZON > # This option determines the maximum amount of time (in seconds) > # that the state of a previous host check is considered current. > # Cached host states (from host checks that were performed more > # recently that the timeframe specified by this value) can immensely > # improve performance in regards to the host check logic. > # Too high of a value for this option may result in inaccurate host > # states being used by Nagios, while a lower value may result in a > # performance hit for host checks. Use a value of 0 to disable host > # check caching. > > #cached_host_check_horizon=15 > cached_host_check_horizon=60 > > # CACHED SERVICE CHECK HORIZON > # This option determines the maximum amount of time (in seconds) > # that the state of a previous service check is considered current. > # Cached service states (from service checks that were performed more > # recently that the timeframe specified by this value) can immensely > # improve performance in regards to predictive dependency checks. > # Use a value of 0 to disable service check caching. > > cached_service_check_horizon=15 > > > > # ENABLE PREDICTIVE HOST DEPENDENCY CHECKS > # This option determines whether or not Nagios will attempt to execute > # checks of hosts when it predicts that future dependency logic test > # may be needed. These predictive checks can help ensure that your > # host dependency logic works well. > # Values: > # 0 = Disable predictive checks > # 1 = Enable predictive checks (default) > > enable_predictive_host_dependency_checks=1 > > > > # ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS > # This option determines whether or not Nagios will attempt to execute > # checks of service when it predicts that future dependency logic test > # may be needed. These predictive checks can help ensure that your > # service dependency logic works well. > # Values: > # 0 = Disable predictive checks > # 1 = Enable predictive checks (default) > > enable_predictive_service_dependency_checks=1 > > # AUTO-RESCHEDULING OPTION > # This option determines whether or not Nagios will attempt to > # automatically reschedule active host and service checks to > # "smooth" them out over time. This can help balance the load on > # the monitoring server. > # WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE > # PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY > > auto_reschedule_checks=0 > > > > # AUTO-RESCHEDULING INTERVAL > # This option determines how often (in seconds) Nagios will > # attempt to automatically reschedule checks. This option only > # has an effect if the auto_reschedule_checks option is enabled. > # Default is 30 seconds. > # WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE > # PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY > > auto_rescheduling_interval=30 > > > > # AUTO-RESCHEDULING WINDOW > # This option determines the "window" of time (in seconds) that > # Nagios will look at when automatically rescheduling checks. > # Only host and service checks that occur in the next X seconds > # (determined by this variable) will be rescheduled. This option > # only has an effect if the auto_reschedule_checks option is > # enabled. Default is 180 seconds (3 minutes). > # WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE > # PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY > > auto_rescheduling_window=180 > > > > # SLEEP TIME > # This is the number of seconds to sleep between checking for system > # events and service checks that need to be run. > > sleep_time=0.25 > > # TIMEOUT VALUES > # These options control how much time Nagios will allow various > # types of commands to execute before killing them off. Options > # are available for controlling maximum time allotted for > # service checks, host checks, event handlers, notifications, the > # ocsp command, and performance data commands. All values are in > # seconds. > > service_check_timeout=60 > host_check_timeout=30 > event_handler_timeout=30 > notification_timeout=30 > ocsp_timeout=5 > perfdata_timeout=5 > > # AGGRESSIVE HOST CHECKING OPTION > # If you don't want to turn on aggressive host checking features, set > # this value to 0 (the default). Otherwise set this value to 1 to > # enable the aggressive check option. Read the docs for more info > # on what aggressive host check is or check out the source code in > # base/checks.c > > use_aggressive_host_checking=0 > > > > # SERVICE CHECK EXECUTION OPTION > # This determines whether or not Nagios will actively execute > # service checks when it initially starts. If this option is > # disabled, checks are not actively made, but Nagios can still > # receive and process passive check results that come in. Unless > # you're implementing redundant hosts or have a special need for > # disabling the execution of service checks, leave this enabled! > # Values: 1 = enable checks, 0 = disable checks > > execute_service_checks=0 > > > > # PASSIVE SERVICE CHECK ACCEPTANCE OPTION > # This determines whether or not Nagios will accept passive > # service checks results when it initially (re)starts. > # Values: 1 = accept passive checks, 0 = reject passive checks > > accept_passive_service_checks=1 > > > > # HOST CHECK EXECUTION OPTION > # This determines whether or not Nagios will actively execute > # host checks when it initially starts. If this option is > # disabled, checks are not actively made, but Nagios can still > # receive and process passive check results that come in. Unless > # you're implementing redundant hosts or have a special need for > # disabling the execution of host checks, leave this enabled! > # Values: 1 = enable checks, 0 = disable checks > > execute_host_checks=1 > > # PASSIVE HOST CHECK ACCEPTANCE OPTION > # This determines whether or not Nagios will accept passive > # host checks results when it initially (re)starts. > # Values: 1 = accept passive checks, 0 = reject passive checks > > accept_passive_host_checks=0 > > # OBSESS OVER SERVICE CHECKS OPTION > # This determines whether or not Nagios will obsess over service > # checks and run the ocsp_command defined below. Unless you're > # planning on implementing distributed monitoring, do not enable > # this option. Read the HTML docs for more information on > # implementing distributed monitoring. > # Values: 1 = obsess over services, 0 = do not obsess (default) > > obsess_over_services=0 > > > > # OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND > # This is the command that is run for every service check that is > # processed by Nagios. This command is executed only if the > # obsess_over_services option (above) is set to 1. The command > # argument is the short name of a command definition that you > # define in your host configuration file. Read the HTML docs for > # more information on implementing distributed monitoring. > > #ocsp_command=somecommand > > > > # OBSESS OVER HOST CHECKS OPTION > # This determines whether or not Nagios will obsess over host > # checks and run the ochp_command defined below. Unless you're > # planning on implementing distributed monitoring, do not enable > # this option. Read the HTML docs for more information on > # implementing distributed monitoring. > # Values: 1 = obsess over hosts, 0 = do not obsess (default) > > obsess_over_hosts=0 > > > > # OBSESSIVE COMPULSIVE HOST PROCESSOR COMMAND > # This is the command that is run for every host check that is > # processed by Nagios. This command is executed only if the > # obsess_over_hosts option (above) is set to 1. The command > # argument is the short name of a command definition that you > # define in your host configuration file. Read the HTML docs for > # more information on implementing distributed monitoring. > > #ochp_command=somecommand > > # SERVICE FRESHNESS CHECK OPTION > # This option determines whether or not Nagios will periodically > # check the "freshness" of service results. Enabling this option > # is useful for ensuring passive checks are received in a timely > # manner. > # Values: 1 = enabled freshness checking, 0 = disable freshness checking > > check_service_freshness=1 > > > > # SERVICE FRESHNESS CHECK INTERVAL > # This setting determines how often (in seconds) Nagios will > # check the "freshness" of service check results. If you have > # disabled service freshness checking, this option has no effect. > > #service_freshness_check_interval=60 > service_freshness_check_interval=420 > > > > # HOST FRESHNESS CHECK OPTION > # This option determines whether or not Nagios will periodically > # check the "freshness" of host results. Enabling this option > # is useful for ensuring passive checks are received in a timely > # manner. > # Values: 1 = enabled freshness checking, 0 = disable freshness checking > > check_host_freshness=0 > #check_host_freshness=1 > > > > # HOST FRESHNESS CHECK INTERVAL > # This setting determines how often (in seconds) Nagios will > # check the "freshness" of host check results. If you have > # disabled host freshness checking, this option has no effect. > > #host_freshness_check_interval=60 > host_freshness_check_interval=420 > > # ADDITIONAL FRESHNESS THRESHOLD LATENCY > # This setting determines the number of seconds that Nagios > # will add to any host and service freshness thresholds that > # it calculates (those not explicitly specified by the user). > > #additional_freshness_latency=15 > additional_freshness_latency=180 > > > # LARGE INSTALLATION TWEAKS OPTION > # This option determines whether or not Nagios will take some shortcuts > # which can save on memory and CPU usage in large Nagios installations. > # Read the documentation for more information on the benefits/tradeoffs > # of enabling this option. > # Values: 1 - Enabled tweaks > # 0 - Disable tweaks (default) > > use_large_installation_tweaks=1 > > > # CHILD PROCESS MEMORY OPTION > # This option determines whether or not Nagios will free memory in > # child processes (processed used to execute system commands and host/ > # service checks). If you specify a value here, it will override > # program defaults. > # Value: 1 - Free memory in child processes > # 0 - Do not free memory in child processes > > #free_child_process_memory=1 > > # CHILD PROCESS FORKING BEHAVIOR > # This option determines how Nagios will fork child processes > # (used to execute system commands and host/service checks). Normally > # child processes are fork()ed twice, which provides a very high level > # of isolation from problems. Fork()ing once is probably enough and will > # save a great deal on CPU usage (in large installs), so you might > # want to consider using this. If you specify a value here, it will > # program defaults. > # Value: 1 - Child processes fork() twice > # 0 - Child processes fork() just once > > #child_processes_fork_twice=1 > child_processes_fork_twice=0 > > > # DEBUG LEVEL > # This option determines how much (if any) debugging information will > # be written to the debug file. OR values together to log multiple > # types of information. > # Values: > # -1 = Everything > # 0 = Nothing > # 1 = Functions > # 2 = Configuration > # 4 = Process information > # 8 = Scheduled events > # 16 = Host/service checks > # 32 = Notifications > # 64 = Event broker > # 128 = External commands > # 256 = Commands > # 512 = Scheduled downtime > # 1024 = Comments > # 2048 = Macros > > debug_level=16 > > > # DEBUG VERBOSITY > # This option determines how verbose the debug log out will be. > # Values: 0 = Brief output > # 1 = More detailed > # 2 = Very detailed > > debug_verbosity=1 > > Thanks in advance for your help. > Trisha > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Cordially, Shadhin Rahman
------------------------------------------------------------------------------
_______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null