Martin, I actually see this happen a lot as well on on my systems where we restart a large number of apps on a daily code drop (sometimes 100s of systems X 6 apps per box). Some apps will go to an unmonitored state yet the application is still up and running and the pid file has a matching pid. The only way I have been able to resolve is to restart monit all together and manually monitor the app again. Causes a lot of grief with my ops guys.
Here is another error string I also saw the other night where the pid magically changed from 507 to 0, only way to resolve has been to fully restart monit with the same procesure as above. I am using monit verison 5.2.5. <27> Jan 11 17:55:15.547617 -05:00 prod005 monit[5484]: 'WEB01' process PPID changed from 507 to 0 -Chris On Fri, Jan 13, 2012 at 9:01 AM, Martin Pala <[email protected]> wrote: > > On Jan 13, 2012, at 2:45 PM, Johannes Bauer wrote: > > > Hi Martin, > > > > On 13.01.2012 14:16, Martin Pala wrote: > > > >> you should check the monit logs - it will show why the service > monitoring was disabled (whether it was some manual action, etc.). > > > > Well, monit is configured to log to syslog: > > > > set logfile syslog facility log_daemon > > > > And I can see that there are messages when monit starts, that the > > control file syntax is okay, but that's it. There's no indication > > whatsoever why the processes are in the unmonitored state -- this is > > actually why I'm asking: because the logs do not show anything out of > > the ordinary yet monit put all processes in the "unmonitored" state. > > > > Is there any automatic action which would cause monit to put a monitored > > child into "unmonitored" autonomically? If so, how can this mechanism be > > disabled? > > > There are two possible ways how the service can get unmonitored > automatically: > > 1.) when the "if <x> restarts within <y> cycles then timeout" statement is > used, the monit will unmonitor the service if this condition matches > > 2.) when you use dependency ("depends on <service>") and the parent > service is stopped/unmonitored (aither via the timeout statement or > manually by admin) - then the stop/unmonitor action cascades to the child > services too. > > > Also Monit <= 5.2.5 *temporarily* displayed "Not monitored" while the > service restart was pending - the monitoring state returned back to > "Monitored" when the restart finished … this was fixed in Monit 5.3 as it > was confusing and it displayes "Monitored" during restart too. > > If none of the above cases matches your configuration, the most probable > cause is, that somebody manually unmonitored/stopped the service via Monit. > > Rergards, > Martin > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general >
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
