Hi! Recently I have run into the very same performance issues as Daniel Meyer (or so it seems). However, I'm not quite sure about it. Here's the gist of it.
Currently, service check latency slowly creeps up. As it is now, it starts out at a little over 1s and after about 12 hours it's in the area of about 90s. It keeps climbing after that. Here's the output of nagios -s: Nagios 2.6 Copyright (c) 1999-2006 Ethan Galstad (http://www.nagios.org) Last Modified: 11-27-2006 License: GPL Warning: Contact group 'Singles-Truppe' is not used in any host/service definitions or host/service escalations! Projected scheduling information for host and service checks is listed below. This information assumes that you are going to start running Nagios with your current config files. HOST SCHEDULING INFORMATION --------------------------- Total hosts: 330 Total scheduled hosts: 0 Host inter-check delay method: SMART Average host check interval: 0.00 sec Host inter-check delay: 0.00 sec Max host check spread: 10 min First scheduled check: N/A Last scheduled check: N/A SERVICE SCHEDULING INFORMATION ------------------------------- Total services: 2836 Total scheduled services: 2836 Service inter-check delay method: SMART Average service check interval: 2225.56 sec Inter-check delay: 0.21 sec Interleave factor method: SMART Average services per host: 8.59 Service interleave factor: 9 Max service check spread: 10 min First scheduled check: Tue Dec 19 11:21:45 2006 Last scheduled check: Tue Dec 19 11:31:47 2006 CHECK PROCESSING INFORMATION ---------------------------- Service check reaper interval: 5 sec Max concurrent service checks: Unlimited PERFORMANCE SUGGESTIONS ----------------------- I have no suggestions - things look okay. This all looks peachy - I think. What I don't get is this line: Average service check interval: 2225.56 sec It seems to me that this is either a skewed value, stemming from my history of looong latencies (at one point we were beyonf 9000 seconds). *Or* it is indicative of a misconfiguration on my part. If the latter is the case, I'd be eager, nay ecstatic to hear what I did wrong. Here are a few of the config vars that might influence this: sleep_time=0.25 service_reaper_frequency=5 max_concurrent_checks=0 max_host_check_spread=10 host_inter_check_delay_method=s service_interleave_factor=s command_check_interval=1 obsess_over_services=0 aggregate_status_updates=1 status_update_interval=20 Also, here's the output from nagiostats: Nagios Stats 2.6 Copyright (c) 2003-2005 Ethan Galstad (www.nagios.org) Last Modified: 11-27-2006 License: GPL CURRENT STATUS DATA ---------------------------------------------------- Status File: /var/nagios/status.dat Status File Age: 0d 0h 0m 3s Status File Version: 2.6 Program Running Time: 0d 1h 59m 5s Total Services: 2836 Services Checked: 2836 Services Scheduled: 2758 Active Service Checks: 2836 Passive Service Checks: 0 Total Service State Change: 0.000 / 12.370 / 0.007 % Active Service Latency: 0.006 / 10.237 / 0.906 sec Active Service Execution Time: 0.047 / 10.159 / 0.180 sec Active Service State Change: 0.000 / 12.370 / 0.007 % Active Services Last 1/5/15/60 min: 477 / 2678 / 2745 / 2754 Passive Service State Change: 0.000 / 0.000 / 0.000 % Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 Services Ok/Warn/Unk/Crit: 2814 / 6 / 0 / 16 Services Flapping: 0 Services In Downtime: 0 Total Hosts: 330 Hosts Checked: 330 Hosts Scheduled: 0 Active Host Checks: 330 Passive Host Checks: 0 Total Host State Change: 0.000 / 0.000 / 0.000 % Active Host Latency: 0.000 / 1.000 / 0.888 sec Active Host Execution Time: 0.030 / 4.059 / 0.112 sec Active Host State Change: 0.000 / 0.000 / 0.000 % Active Hosts Last 1/5/15/60 min: 0 / 12 / 12 / 12 Passive Host State Change: 0.000 / 0.000 / 0.000 % Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 Hosts Up/Down/Unreach: 329 / 1 / 0 Hosts Flapping: 0 Hosts In Downtime: 0 Hardware is a dual-2.8GHz Xeon, 2G RAM and a 100 FDX interface. LoadAvg is around 1.6, sometimes gets to 1.9. CPUs are both around 40% idle most of the time. I see about 300 context switches and 500 interrupts per second. The network load is neglible, ditto the packet rate. The way these figures look I don't see a performance problem per se, but maybe I have overlooked a metric that descirbes the "usual" bottleneck of installations. Any help is appreciated. Regards, Tobias PS: I'll send another mail with my questions regarding scheduling as they're more general in nature. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null