I can upgrade one of my dev environments tonight to 5.3.2, I read the
release notes and saw one of the fixes I think was in 5.3.1 around a speed
up to finding monitored apps.   Should have some feedback early next week
from our ops guys after they do a few restarts.

On Fri, Jan 13, 2012 at 9:24 AM, Martin Pala <[email protected]> wrote:

> Christian, can you please provide full monit logs for the timeframe when
> some of these problems occurred and monit configuration?
>
> Please can you try this upgrade some of your systems to monit 5.3.2 and
> run it in verbose mode? (add -v option). The mentioned fix of the
> monitoring-mode-while-restart-is-pending may be related to the problem.
>
> Regarding the PPID error - it was probably generated because monit had
> problem to collect the process data. The monit logs should provide more
> informations.
>
> Regards,
> Martin
>
>
> On Jan 13, 2012, at 3:11 PM, Christopher Johnston wrote:
>
> Martin,
>
> I actually see this happen a lot as well on on my systems where we restart
> a large number of apps on a daily code drop (sometimes 100s of systems X 6
> apps per box).  Some apps will go to an unmonitored state yet the
> application is still up and running and the pid file has a matching pid.
>  The only way I have been able to resolve is to restart monit all together
> and manually monitor the app again. Causes a lot of grief with my ops guys.
>
>
> Here is another error string I also saw the other night where the pid
> magically changed from 507 to 0, only way to resolve has been to fully
> restart monit with the same procesure as above.
>
> I am using monit verison 5.2.5.
>
> <27> Jan 11 17:55:15.547617 -05:00 prod005 monit[5484]: 'WEB01' process
> PPID changed from 507 to 0
>
> -Chris
>
> On Fri, Jan 13, 2012 at 9:01 AM, Martin Pala <[email protected]>wrote:
>
>>
>> On Jan 13, 2012, at 2:45 PM, Johannes Bauer wrote:
>>
>> > Hi Martin,
>> >
>> > On 13.01.2012 14:16, Martin Pala wrote:
>> >
>> >> you should check the monit logs - it will show why the service
>> monitoring was disabled (whether it was some manual action, etc.).
>> >
>> > Well, monit is configured to log to syslog:
>> >
>> > set logfile syslog facility log_daemon
>> >
>> > And I can see that there are messages when monit starts, that the
>> > control file syntax is okay, but that's it. There's no indication
>> > whatsoever why the processes are in the unmonitored state -- this is
>> > actually why I'm asking: because the logs do not show anything out of
>> > the ordinary yet monit put all processes in the "unmonitored" state.
>> >
>> > Is there any automatic action which would cause monit to put a monitored
>> > child into "unmonitored" autonomically? If so, how can this mechanism be
>> > disabled?
>>
>>
>> There are two possible ways how the service can get unmonitored
>> automatically:
>>
>> 1.) when the "if <x> restarts within <y> cycles then timeout" statement
>> is used, the monit will unmonitor the service if this condition matches
>>
>> 2.) when you use dependency ("depends on <service>") and the parent
>> service is stopped/unmonitored (aither via the timeout statement or
>> manually by admin) - then the stop/unmonitor action cascades to the child
>> services too.
>>
>>
>> Also Monit <= 5.2.5 *temporarily* displayed "Not monitored" while the
>> service restart was pending - the monitoring state returned back to
>> "Monitored" when the restart finished … this was fixed in Monit 5.3 as it
>> was confusing and it displayes "Monitored" during restart too.
>>
>> If none of the above cases matches your configuration, the most probable
>> cause is, that somebody manually unmonitored/stopped the service via Monit.
>>
>> Rergards,
>> Martin
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to