The system date is correct and Nagios is showing the correct day and time on the status page. If I stop Nagios, I don't see any other Nagios processes running. The two templates here have not been touched. I looked at both and have 24x7 for all times which is the default. I just modified the time definitions as you have it listed and I'll see what happens tonight. There are also no custom time settings in nagios.cfg; all the settings related to time zone, etc are commented out.
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Young Sent: Monday, August 25, 2008 11:44 AM To: Nagios Users Mailinglist Subject: Re: [Nagios-users] checks,notifications don't work after time period exception Hi Seth, On Aug 25, 2008, at 8:05 AM, Seth Simmons wrote: > We have a qa group overseas that will work on our customer sites > during the US overnight. To avoid false alerts, I added a time > exception so notifications are not sent out between 4am and 5:30am. > The problem is, after the exception, Nagios (3.0.3) won't send > notifications, neither are checks performed for any sites with an > exception. If a site is in a critical state either shortly after 4 > or (if they start early) right before 4, checks do not continue > after 5:30. When I look at Nagios later, it shows it in critical > and the last check was done at 3:58am with the next check at > midnight the next day. When I start dealing with time problems with Nagios I have a small list that I try first just to it out. * Check date/time of monitoring server and that it is the right timezone (UTC or whatever you want it as). * Check that the Nagios web interface is displaying the time you expect it to (top left corner in most cgis). In the nagios.cfg you may have set additional time information in there. * Stop the nagios process, checking that there are are no other running instances left. 'service nagios stop' 'ps aux |grep nagios' * Restart the nagios process. Sometimes you can get duplicate Nagios daemons running and they can cause many odd problems like this. Also I hope we are not dealing with any time translations with the "overseas" group. > > Let me give some more specific examples: > Server-A is running abc.customer.com for us and our qa group takes > the site down at 3:55am, before the 4am exception. Nagios will show > as critical until either midnight the next day, or you force a check > on the service. So, say at 8am I look at it, the service is > critical with last check at 3:55am and next scheduled check at 12am > tomorrow. When I force a check, it will continue on normal check > schedule and send notice that the service is ok. So you are saying that "Server-A" is supposed to be checked in the timerange 24x7 minus 4:00am-5:30am each day, but when it stops at 4:00am it will not start checking until the next day, unless you force it through an external command to start checking again? It is possible that there could be a bug, but you seem to have a really common timeperiod definition type. I normally suggest that users always run the checks 24x7 and then just modify the notification periods (like you did with 'Server-B). But I would try it with a simple time definition first. # Test timeperiod for the recycle service. define timeperiod{ timeperiod_name recycle alias recycle sunday 00:00-04:00,05:30-24:00 monday 00:00-04:00,05:30-24:00 tuesday 00:00-04:00,05:30-24:00 wednesday 00:00-04:00,05:30-24:00 thursday 00:00-04:00,05:30-24:00 friday 00:00-04:00,05:30-24:00 saturday 00:00-04:00,05:30-24:00 } Also what does your "generic-service" and "local-service" templates look like? There could be some settings that are following you through those templates. Also you may have modified some settings in the nagios.cfg that makes changes to how nagios deals with time. > > Server-B is also running a site and tomcat is stopped at 4:10am. > This service has notification period with the same time period with > exceptions from 4am - 5:30am. After that it will not send > notifications. At 8am it is still doing checks and saying is > critical, but when looking at the details it says it has not sent > any notifications. When I force a check it still won't do it. If I > restart Nagios then it does a check it will send first notice. I > don't see anything wrong with my time period so not sure where the > issue is. Not sure if anyone else has noticed this before. The difference between those are that they are using a different service template. Server-B is using 'local-service'. > > Here is what I have for that time period and checks for the above > examples: > > define timeperiod{ > timeperiod_name url-monitor > alias url-monitor > sunday 00:00-23:59 > monday 00:00-23:59 > tuesday 00:00-23:59 > wednesday 00:00-23:59 > thursday 00:00-23:59 > friday 00:00-23:59 > saturday 00:00-23:59 > exclude recycle > } This is how I would have wrote the timeperiod definitions to make them more clear. I've used the exclude method many times so I am sure that it works as you are expecting. define timeperiod{ timeperiod_name 24x7 alias 24 Hours A Day, 7 Days A Week sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 } #down timeperiod for Server-A define timeperiod{ timeperiod_name recycle alias recycle sunday 04:00-05:30 monday 04:00-05:30 tuesday 04:00-05:30 wednesday 04:00-05:30 thursday 04:00-05:30 friday 04:00-05:30 saturday 04:00-05:30 } define timeperiod{ timeperiod_name url-monitor alias url-monitor use 24x7 exclude recycle } Good luck with you plight! I hope someone else can give you a more simple solution. Mark Young ___ Nagios Enterprises, LLC Web: www.nagios.com ------------------------------------------------------------------------ - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null