Forgive me if this has been covered previously: I am not very active on the lists. I found some posts about this subject a long time ago in v1 but am wondering if is this still an issue or is something that can be resolved. I run a pretty decent sized Nagios config (version 2.2 672/3612 hosts/services) and have been struggling lately to understand why my latency is so high (0.08 / 2310.79 / 1451.444 sec) and my scheduling queue is about 30 minutes behind schedule. It's probably because I have mostly active checks, many of which are nrpe checks and take a few seconds each. It takes about 5 hours for Nagios to catch up on the scheduling queue after a restart. I've read over the documentation dozens of times and think I understand the basic scheduling logic. I have toyed with all the available options, but one thing became obvious to me that seemed to be disregarded, and that is the check interval. When the checks get scheduled, they start alphabetically based on hostname, and get "interleaved" based on the interleave factor if that option is turned on. Now, suppose I have 260 hosts named An thru Zn, and most of the hosts run a slew of checks that are slow and only scheduled to check once an hour. However, hosts Xn, Yn and Zn are critical servers that have checks that are supposed to run every 2 minutes. Also, suppose for now interleaving is turned off. When Nagios starts up, it schedules the checks without regard to the normal_check_interval. This means the checks for hosts XYZ have to wait till A-W get processed, and may not get scheduled for (as in my case) a long time, missing their 2 minute window. Of course, turning on interleaving can alleviate SOME of this, but that seems hit and miss depending on the alphabetical placement of your critical hosts, and as you can imagine, if you multiply the numbers, the problem gets worse. It seems in this scenario it would make sense to have a configuration option available that would allow you to initially schedule the highest priority checks first (those with the lowest normal_check_interval) so that they can finish and get rescheduled right away. Another thought would be to use an external script to parse the config and sort the checks by check interval then manipulate the scheduling queue. I would be interested to hear what others are doing to overcome this. I don't want to bother the group with the details of tuning my config, more so discuss the theory of this type of scheduling logic.
------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null