I've been working on the cause of this issue for a few months and just today had time to really dig in and science it. When we pull data from avail.cgi for reporting, often times a host that was recently rebooted will show some services CRITICAL from the reboot until the midnight log rotation for that day. After some manual editing of the archived history logs that avail.cgi is reading, I found that this happens when a post-reboot service check logs a 'soft' OK but doesn't happen for 'hard' OK messages.
Right now I've implemented a cron to run post-archive and simply convert all SOFT OK entries to HARD OK entries, which has made my data correct to the best of my knowledge. I just have a couple questions about this: 1) Is this skewing my data in any way? The timestamps look right based on the observed length of service outage after doing this. 2) Is this a bug in avail.cgi or in the actual logging of this data? Should there have been a second "HARD" OK status entry in the log or was the first entry supposed to be "HARD"? Also, I am on an older version of nagios than current (3.2.3). If I've wasted my time working around something fixed in a more recent version, please let me know. I'd prefer not to upgrade unless necessary as the version we're on has been very stable excepting this one bug. ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null