So you're seeing the scenario where nagios stops _all_ checks
altogether? I've had this happen when the nagios parent process dies,
and logs to nagios.log to this effect "[1139362901] Caught SIGSEGV,
shutting down... ". I was getting these very frequently when I went
above some apparent host/service threshhold (went away when I removed
about 128 nodes at one point recently). In these cases the CGI's still
respond for some reason, which seemed inappropriate...
I've also seen the same symptom, but without a well-advertised nagios
failure, where the process is still present in memory but checks aren't
executed and the CGI's are functional.
The third related (and my current bane...) issue is where MOST all
checks occur, but some (sometimes large) groups of unrelated actions no
longer occur. Host/service checks as a whole seem to be working, but
I'll notice that I haven't gotten an alert for something that failed,
and then see that whole class of service checks on one hostgroup aren't
running anymore... and then start to see the same issue with other
checks/actions as well.
I'd sure love to just have nagios start working again, as I'm strongly
against having to write an external framework for checking various parts
of Nagios and alerrt me when it's broken! Alternately, I've always kept
up to date on other OS monitor/alert frameworks and still nothing is as
extensible as Nagios is (yet).
/eli
Terry wrote:
In just looking at the logs, the status.log is being continuously
updated as normal but when checks stop, the nagios.log stops gathering
entries as well.
On 3/17/06, Eli Stair <[EMAIL PROTECTED]> wrote:
I've been seeing this continuously in 2.0beta/rc/releases. For details
on my situation/posts check the devel/users archives, I'm curious if any
similarities exist. I haven't gotten acknowledgement/resolution on this
either, the only thing I've determined is that (in my case) stopping
nagios and restarting with the retention file zeroed resolves the issue
100%.
In the case of having an extra nagios process running that can
definitely cause this and other issues. In my case that's never been
present and thus not the cause...
/eli
Terry wrote:
I am seeing this as well. I have services that do not get checked
when they are scheduled:
Last Check Type: ACTIVE
Last Check Time: 03-17-2006 08:50:47
Status Data Age: 0d 1h 37m 51s
Next Scheduled Active Check: 03-17-2006 10:09:01
Latency: 342.408 seconds
Check Duration: 10.015 seconds
Last State Change: 03-16-2006 11:55:02
Current State Duration: 0d 22h 33m 36s
It is currently 10:29 and it still hasnt been checked. This is one of
many examples.
On 3/15/06, Matthias Eble
<[EMAIL PROTECTED]> wrote:
hi all!
we are experiencing occassional problems with nagios 2.0 stable. The
main process was reloaded due to configuration changes yesterday (Mar
14th). since then ps -ef looks like this:
nagios 1078 1 12 Mar09 ? 16:49:43 /opt/nagios/bin/nagios
-d /opt/nagios/etc/nagios.cfg
nagios 9431 1078 0 Mar14 ? 00:00:00 [nagios] <defunct>
and nagios stopped to check. Has anyone an idea what could have happened
? The nagios.log and status.dat files have not been updated since then.
thanks
matthias
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null