I guess I'm another "me too".  We use Nagios 3.0.6, but I had just setup an 
upgrade to 3.2.0.  From what I could see, my distributed nodes had been sending 
data just fine for 45 minutes or so.  When I double-checked the performance 
graphs just before retiring for the night I saw no data coming in.  When I 
traced this back to the distributed nodes, their scheduling queue showed no 
checks scheduled until well into the next day.  We have some checks that run as 
frequently as every minute.

I assumed this was a weird bug with 3.2.0, panicked and went back to 3.0.6 a 
little after midnight and things have been fine ever since.  I was going to 
spend more time observing 3.2.0 in a more contained environment to see if this 
was normal behavior.  My timing (checks stopping around 11pm Sunday night) 
sounds the same so perhaps it's not just my imagination.

One thing that bothered me a bit was that I didn't see messages in the central 
servers indicating that it was marking service checks as stale and checking 
automatically.  I saw no stale messages in the log and it should have been well 
past the freshness thresholds of most checks.  As I say, it was late and I 
decided to roll back before I investigated.

I've got thousands of service checks so forcing rescheduling wouldn't work for 
me.

Mark

From: Les Fenison [mailto:l...@deltatechnicalservices.com]
Sent: Monday, November 02, 2009 9:47 PM
To: Andy Howell
Cc: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Nagios stopped checking most of my services!

Well, so far 3 of us with the same problem on the same day.  I have to believe 
it is daylight savings time related.

My fix is to go click on each service one by one and reschedule.  Then they 
start checking normally again.

I wonder if there is anyway to force an automatic reschedule of all services 
and hosts for next year when this happens again?

Andy Howell wrote:
Les Fenison wrote:

I had nagios working great.  Checking 6 hosts and about 85 services.  Then 
suddenly, all services on all hosts except one stopped checking.  The next 
scheduled check is about 24 hours from the last check.  I had been checking 
every 5 minutes.

Restarting nagios didn't help.    I am using a gui NagioSQL to edit my 
configuration files so I suspect it did something to me but I have no clue 
where to look except where I have already looked.

What can cause nagios to just stop checking everything like that or to randomly 
switch to every 24 hours rather than the configured every 5 minutes?

I am having to manually do force checks to get it to check.

Here are some things I have checked...

Hosts  check_interval is 5, retry_interval is 1
Services  check_interval is 10, retry_interval is 2

So where could Nagios be getting the idea that it is suppose to be every 24 
hours?

I had the same experience yesterday. Maybe daylight savings related? At about 
11pm, all the services were scheduled for 11pm the following day. I figured it 
was something I did wrong. I noticed that "next_check" time in 
/var/log/nagios/retention.dat was wrong. I renamed the file and restarted 
nagios. It worked fine after that.

I using version 3.2.

Regards,

    Andy

--
________________________________
Les Fenison
Delta Technical Services
www.DeltaTechnicalServices.com<http://www.DeltaTechnicalServices.com>
l...@deltatechnicalservices.com<mailto:l...@deltatechnicalservices.com>
503-766-0076
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to