Hi all, I'm having a big problem with nagios service check latencies.
When the network is in normal operation, with few hosts or services in critical states, everything goes smoothly, however, today we had some problems and there were a large amount of nrpe checks timing out after 10 seconds. In this situation the service check latency sky rocketed to around 12 minutes! We have a grapher system, parsing perfomance data into RRD databases, and due to this latency the rrd databases weren't being updated. This happened for a few hours, not just while notiications were being sent, until I removed the services from the config files. Does anyone have any tip on how i can prevent this from happening? Below you can find the output of nagios -s and nagiostats. I don't have a nagiostats output when we had lots of critical services. But i can try and reproduce the conditions if it is of use. I'm very interested in keeping the latencies to a minimum, even if things go havoc! Thanks all. =========== /usr/nagios/bin/nagios -s /etc/nagios/nagios.cfg =========== Nagios 2.8 Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org) Last Modified: 04-10-2007 License: GPL Projected scheduling information for host and service checks is listed below. This information assumes that you are going to start running Nagios with your current config files. HOST SCHEDULING INFORMATION --------------------------- Total hosts: 433 Total scheduled hosts: 2 Host inter-check delay method: SMART Average host check interval: 86400.00 sec Host inter-check delay: 900.00 sec Max host check spread: 30 min First scheduled check: Fri Jun 15 16:23:29 2007 Last scheduled check: Fri Jun 15 16:38:29 2007 SERVICE SCHEDULING INFORMATION ------------------------------- Total services: 2126 Total scheduled services: 2126 Service inter-check delay method: SMART Average service check interval: 300.00 sec Inter-check delay: 0.14 sec Interleave factor method: SMART Average services per host: 4.91 Service interleave factor: 5 Max service check spread: 30 min First scheduled check: Fri Jun 15 16:24:29 2007 Last scheduled check: Fri Jun 15 16:29:29 2007 CHECK PROCESSING INFORMATION ---------------------------- Service check reaper interval: 10 sec Max concurrent service checks: Unlimited PERFORMANCE SUGGESTIONS ----------------------- I have no suggestions - things look okay. =========== /usr/nagios/bin/nagiostats =========== Nagios Stats 2.8 Copyright (c) 2003-2007 Ethan Galstad (www.nagios.org) Last Modified: 04-10-2007 License: GPL CURRENT STATUS DATA ---------------------------------------------------- Status File: /var/nagios/status.log Status File Age: 0d 0h 0m 25s Status File Version: 2.8 Program Running Time: 0d 0h 16m 31s Nagios PID: 26886 Used/High/Total Command Buffers: 0 / 0 / 4096 Used/High/Total Check Result Buffers: 134 / 134 / 4096 Total Services: 2126 Services Checked: 2126 Services Scheduled: 2126 Active Service Checks: 2126 Passive Service Checks: 0 Total Service State Change: 0.000 / 28.950 / 0.096 % Active Service Latency: 18.980 / 120.190 / 66.213 sec Active Service Execution Time: 0.079 / 60.078 / 1.633 sec Active Service State Change: 0.000 / 28.950 / 0.096 % Active Services Last 1/5/15/60 min: 141 / 1621 / 2126 / 2126 Passive Service State Change: 0.000 / 0.000 / 0.000 % Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 Services Ok/Warn/Unk/Crit: 2101 / 2 / 0 / 23 Services Flapping: 0 Services In Downtime: 0 Total Hosts: 433 Hosts Checked: 433 Hosts Scheduled: 2 Active Host Checks: 433 Passive Host Checks: 0 Total Host State Change: 0.000 / 32.110 / 0.528 % Active Host Latency: 0.000 / 316.979 / 1.378 sec Active Host Execution Time: 0.070 / 2.638 / 2.585 sec Active Host State Change: 0.000 / 32.110 / 0.528 % Active Hosts Last 1/5/15/60 min: 0 / 15 / 31 / 120 Passive Host State Change: 0.000 / 0.000 / 0.000 % Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 Hosts Up/Down/Unreach: 428 / 5 / 0 Hosts Flapping: 0 Hosts In Downtime: 0 ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null