I can upgrade one of my dev environments tonight to 5.3.2, I read the release notes and saw one of the fixes I think was in 5.3.1 around a speed up to finding monitored apps. Should have some feedback early next week from our ops guys after they do a few restarts.
On Fri, Jan 13, 2012 at 9:24 AM, Martin Pala <[email protected]> wrote: > Christian, can you please provide full monit logs for the timeframe when > some of these problems occurred and monit configuration? > > Please can you try this upgrade some of your systems to monit 5.3.2 and > run it in verbose mode? (add -v option). The mentioned fix of the > monitoring-mode-while-restart-is-pending may be related to the problem. > > Regarding the PPID error - it was probably generated because monit had > problem to collect the process data. The monit logs should provide more > informations. > > Regards, > Martin > > > On Jan 13, 2012, at 3:11 PM, Christopher Johnston wrote: > > Martin, > > I actually see this happen a lot as well on on my systems where we restart > a large number of apps on a daily code drop (sometimes 100s of systems X 6 > apps per box). Some apps will go to an unmonitored state yet the > application is still up and running and the pid file has a matching pid. > The only way I have been able to resolve is to restart monit all together > and manually monitor the app again. Causes a lot of grief with my ops guys. > > > Here is another error string I also saw the other night where the pid > magically changed from 507 to 0, only way to resolve has been to fully > restart monit with the same procesure as above. > > I am using monit verison 5.2.5. > > <27> Jan 11 17:55:15.547617 -05:00 prod005 monit[5484]: 'WEB01' process > PPID changed from 507 to 0 > > -Chris > > On Fri, Jan 13, 2012 at 9:01 AM, Martin Pala <[email protected]>wrote: > >> >> On Jan 13, 2012, at 2:45 PM, Johannes Bauer wrote: >> >> > Hi Martin, >> > >> > On 13.01.2012 14:16, Martin Pala wrote: >> > >> >> you should check the monit logs - it will show why the service >> monitoring was disabled (whether it was some manual action, etc.). >> > >> > Well, monit is configured to log to syslog: >> > >> > set logfile syslog facility log_daemon >> > >> > And I can see that there are messages when monit starts, that the >> > control file syntax is okay, but that's it. There's no indication >> > whatsoever why the processes are in the unmonitored state -- this is >> > actually why I'm asking: because the logs do not show anything out of >> > the ordinary yet monit put all processes in the "unmonitored" state. >> > >> > Is there any automatic action which would cause monit to put a monitored >> > child into "unmonitored" autonomically? If so, how can this mechanism be >> > disabled? >> >> >> There are two possible ways how the service can get unmonitored >> automatically: >> >> 1.) when the "if <x> restarts within <y> cycles then timeout" statement >> is used, the monit will unmonitor the service if this condition matches >> >> 2.) when you use dependency ("depends on <service>") and the parent >> service is stopped/unmonitored (aither via the timeout statement or >> manually by admin) - then the stop/unmonitor action cascades to the child >> services too. >> >> >> Also Monit <= 5.2.5 *temporarily* displayed "Not monitored" while the >> service restart was pending - the monitoring state returned back to >> "Monitored" when the restart finished … this was fixed in Monit 5.3 as it >> was confusing and it displayes "Monitored" during restart too. >> >> If none of the above cases matches your configuration, the most probable >> cause is, that somebody manually unmonitored/stopped the service via Monit. >> >> Rergards, >> Martin >> -- >> To unsubscribe: >> https://lists.nongnu.org/mailman/listinfo/monit-general >> > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general > > > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general >
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
