Martin,

I actually see this happen a lot as well on on my systems where we restart
a large number of apps on a daily code drop (sometimes 100s of systems X 6
apps per box).  Some apps will go to an unmonitored state yet the
application is still up and running and the pid file has a matching pid.
 The only way I have been able to resolve is to restart monit all together
and manually monitor the app again. Causes a lot of grief with my ops guys.


Here is another error string I also saw the other night where the pid
magically changed from 507 to 0, only way to resolve has been to fully
restart monit with the same procesure as above.

I am using monit verison 5.2.5.

<27> Jan 11 17:55:15.547617 -05:00 prod005 monit[5484]: 'WEB01' process
PPID changed from 507 to 0

-Chris

On Fri, Jan 13, 2012 at 9:01 AM, Martin Pala <[email protected]> wrote:

>
> On Jan 13, 2012, at 2:45 PM, Johannes Bauer wrote:
>
> > Hi Martin,
> >
> > On 13.01.2012 14:16, Martin Pala wrote:
> >
> >> you should check the monit logs - it will show why the service
> monitoring was disabled (whether it was some manual action, etc.).
> >
> > Well, monit is configured to log to syslog:
> >
> > set logfile syslog facility log_daemon
> >
> > And I can see that there are messages when monit starts, that the
> > control file syntax is okay, but that's it. There's no indication
> > whatsoever why the processes are in the unmonitored state -- this is
> > actually why I'm asking: because the logs do not show anything out of
> > the ordinary yet monit put all processes in the "unmonitored" state.
> >
> > Is there any automatic action which would cause monit to put a monitored
> > child into "unmonitored" autonomically? If so, how can this mechanism be
> > disabled?
>
>
> There are two possible ways how the service can get unmonitored
> automatically:
>
> 1.) when the "if <x> restarts within <y> cycles then timeout" statement is
> used, the monit will unmonitor the service if this condition matches
>
> 2.) when you use dependency ("depends on <service>") and the parent
> service is stopped/unmonitored (aither via the timeout statement or
> manually by admin) - then the stop/unmonitor action cascades to the child
> services too.
>
>
> Also Monit <= 5.2.5 *temporarily* displayed "Not monitored" while the
> service restart was pending - the monitoring state returned back to
> "Monitored" when the restart finished … this was fixed in Monit 5.3 as it
> was confusing and it displayes "Monitored" during restart too.
>
> If none of the above cases matches your configuration, the most probable
> cause is, that somebody manually unmonitored/stopped the service via Monit.
>
> Rergards,
> Martin
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to