On 16 Nov 2007, at 21:23, [EMAIL PROTECTED] wrote:
> My first problem, and I am not sure it is actually a problem, is > that when I do a reload of nagios (/etc/init.d/nagios reload) it > takes, what seems to me to be, a long time. It is usually around > 90-120 seconds for Nagios to start allowing use of the web > interface once the reload is initiated. A check of the files > reveals no errors (save one warning for a host with no services) > and the nagios process shows in a ps awux list. However the web > interface shows the "Whoops! Error: Could not read host and service > status information!" during the 90-120 second delay I mentioned > earlier. > Hi Mark, There seem to be quite a few emails in this list about NDOUtils being a bit slow. We saw this about 6 months ago and have been optimising the hell out of it, but it boils down to this: - NDO updates are synchronously applied to the database This means that Nagios has to wait for the DB to finish the update before it continues. I believe Ethan is doing something at NDO after Nagios 3 is released. We've done various tricks to try and reduce the time for a reload, which we will blog about on http://altinity.org soon, but I just haven't found the time to do it. The first couple of things that come to mind are: - indexes should be re-arranged so that the time column is first. Currently, a lot of indexes have instance_id first. However, when you are doing a delete based on time, the index is effectively useless, so mysql has to do a complete table scan to work out which rows need to be deleted. This will cause mysql to take a lot of time. This is the single biggest thing that you can do - reduce the amount of times ndo2db calls the housekeeping routine. By default, it is every 60 seconds. We've reduced down to 600 seconds. It could probably be even less frequent. One thing I've just thought is to have ndo2db NOT do any housekeeping and do it yourself (mysql is multi-user after all) - reduce the amount of data sent. We stop the broker module sending systemcommands, log entries and passive commands - we've also patched Nagios to not send status data on a reload. By default, Nagios will send data to ndo about the status of all hosts/ services on a reload. This is not required because the db already knows what the status of the things were before the reload! - we're currently testing a de-coupling of NDOMOD from ndo2db. The idea is that NDOMOD writes files and then a separate daemon loads those files into ndo2db. This effectively means that NDO updates are now asynchronous, though there is now a delay in the updates We've also made a patch to Nagios 2.9 (which Ethan has applied to Nagios 3), where the status file is kept between reloads, so you don't get the dreaded "Could not read host and service status information" error. That is available at http://altinity.blogs.com/ dotorg/2007/09/nagios-patch-da.html. We love NDOutils - a lot of our features in Opsview depends on it, including our favourite, Hostgroup Hierarchy (http://opsview.org/ hostgrouphierarchy). So we're interested in making NDOutils work as fast as possible too. Ton http://www.altinity.com T: +44 (0)870 787 9243 F: +44 (0)845 280 1725 Skype: tonvoon ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null