Just to follow-up, I figured out what was causing the shutdown issue. The
process giving me shutdown issues (foo) has a dependency on a different
process (bar) for which I do not control the startup/shutdown.  So the
config looks as follows:

   check process foo
       ...
       depends bar

Before "bar" is shutdown by a method outside of my control I issue a "monit
unmonitor bar". What I was unaware of is issuing this command on the "bar"
process results in it being issued internally for all other processes that
are dependent on it. A minute later when I issue the "monit stop foo"
command it does nothing as it no longer believes the "foo" process is
running.

I would argue that in this situation monit should perform the "stop program"
action for safety instead of the "unmonitor" action as "foo" shouldn't be
running if "bar" isn't and monit no longer knows if "bar" is running.

So my options are to either flip the steps (stop foo, then unmonitor bar)
or just remove the dependency. I'll probably go with the first option as
the second could have give us some bad outcomes.

Marc

On Thu, Mar 7, 2019 at 8:00 AM Marc Rossi <mross...@gmail.com> wrote:

> Agree on the pidfile stuff (and we have ran into those "somewhat unwanted
> incidents" by not using them). We usually do but sometimes it is out of my
> control and you can't fight'em all.
>
> On Tue, Mar 5, 2019 at 3:22 PM SZÉPE Viktor <vik...@szepe.net> wrote:
>
>> Idézem/Quoting Marc Rossi <mross...@gmail.com>:
>>
>> > Yeah was looking through the code and saw the call to check if process
>> is
>> > running before issuing stop (ProcessTree_findProcess), so that was only
>> > thought I had as well.
>> >
>> > check process foo matching /usr/local/bin/foo.py
>> >       start program = "/bin/bash -l -c 'nohup /usr/local/bin/foo.py &'"
>> as
>> > uid "nobody"
>> >       stop program = "/usr/bin/pkill -u nobody -f
>> /usr/local/bin/foo.py" as
>> > uid "nobody"
>> >       if uptime > 11 hours then alert
>> >       if uptime > 12 hours then exec "/usr/bin/pkill -u nobody -f -9
>> > /usr/local/bin/foo.py" as uid "nobody"
>> >       if 2 restarts within 3 cycles then timeout
>> >       group apps
>> >       depends foo.py
>> >
>> > check process bar matching ^/usr/local/bin/bar
>> >       start program = "/bin/bash -lc 'HOME=/home/someuser nohup
>> > /usr/local/bin/bar.sh > /tmp/bar-startup.out 2>&1 &'"
>> >       stop program = "/bin/bash -c '/usr/bin/pkill -f
>> ^/usr/local/bin/bar;
>> > sleep 1; /usr/bin/pkill -f ^/usr/local/bin/bar'"
>> >       onreboot nostart
>> >       if uptime > 12 hours then exec "/usr/bin/pkill -9 -f
>> > ^/usr/local/bin/bar"
>> >       group apps
>> >       mode passive
>>
>> BTW it is highly dangerous to run pid file-less and interpreted
>> software with Monit as you may meet some unwanted incidents
>>
>> Try implementing a pid file in your scripts.
>>
>> All the best!
>>
>>
>> SZÉPE Viktor, webes alkalmazás üzemeltetés / Running your application
>> https://github.com/szepeviktor/debian-server-tools/blob/master/CV.md
>> ~~~
>> ügyelet/hotline: +36-20-4242498  s...@szepe.net  skype: szepe.viktor
>> Budapest, III. kerület
>>
>>
>>
>>
>>
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
-- 
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to