Yeah was looking through the code and saw the call to check if process is running before issuing stop (ProcessTree_findProcess), so that was only thought I had as well.
check process foo matching /usr/local/bin/foo.py start program = "/bin/bash -l -c 'nohup /usr/local/bin/foo.py &'" as uid "nobody" stop program = "/usr/bin/pkill -u nobody -f /usr/local/bin/foo.py" as uid "nobody" if uptime > 11 hours then alert if uptime > 12 hours then exec "/usr/bin/pkill -u nobody -f -9 /usr/local/bin/foo.py" as uid "nobody" if 2 restarts within 3 cycles then timeout group apps depends foo.py check process bar matching ^/usr/local/bin/bar start program = "/bin/bash -lc 'HOME=/home/someuser nohup /usr/local/bin/bar.sh > /tmp/bar-startup.out 2>&1 &'" stop program = "/bin/bash -c '/usr/bin/pkill -f ^/usr/local/bin/bar; sleep 1; /usr/bin/pkill -f ^/usr/local/bin/bar'" onreboot nostart if uptime > 12 hours then exec "/usr/bin/pkill -9 -f ^/usr/local/bin/bar" group apps mode passive Here are logs from yesterday and today wrt to "bar" [CST Mar 1 15:15:01] info : 'bar' stop action done [CST Mar 4 07:02:01] info : 'bar' start on user request [CST Mar 4 07:02:01] info : 'bar' start action done [CST Mar 4 07:02:01] error : 'bar' uptime test failed for /usr/local/bin/bar-- current uptime is 259177 seconds <we get above since it failed to shutdown on 3/1> [CST Mar 4 07:02:01] info : 'bar' exec: '/usr/bin/pkill -9 -f /usr/local/bin/bar' [CST Mar 4 07:02:21] error : 'bar' process is not running <above line repeats every 20 seconds until we manually start it via monit> [CST Mar 4 07:51:11] info : 'bar' start: '/bin/bash -lc HOME=/home/someuser nohup /usr/local/bin/bar.sh > /tmp/bar-startup.out 2>&1 &' [CST Mar 4 07:51:11] info : 'bar' start action done [CST Mar 4 07:51:11] info : 'bar' process is running with pid 4897 [CST Mar 4 07:51:11] info : 'bar' uptime test succeeded [current uptime = 1 seconds] [CST Mar 4 15:15:01] info : 'bar' stop on user request [CST Mar 4 15:15:01] info : 'bar' stop action done <below same thing repeats itself the following morning> [CST Mar 5 07:02:01] info : 'bar' start on user request [CST Mar 5 07:02:01] info : 'bar' start action done [CST Mar 5 07:02:01] error : 'bar' uptime test failed for /usr/local/bin/bar-- current uptime is 83451 seconds [CST Mar 5 07:02:01] info : 'bar' exec: '/usr/bin/pkill -9 -f /usr/local/bin/bar' Thanks again for looking. Worst case I'll just build a debug version of monit with some extra logging to see what is going on. On Tue, Mar 5, 2019 at 2:40 PM mart...@tildeslash.com < mart...@tildeslash.com> wrote: > Hi, > > please can you add the configuration of "foo" and "bar" services? > > There are for example these possible reasons: > > 1.) the "bar" service is a process and monit detected that the process is > not running - in this case it gets a fast path and stop is skipped (the > process is not running) > > 2.) there was a problem if you used "check program" in combination with > the "every" statement ... fixed in monit 5.25.3: > https://bitbucket.org/tildeslash/monit/issues/759 > > Best regards, > Martin > > > On 5 Mar 2019, at 16:24, Marc Rossi <mross...@gmail.com> wrote: > > Looking through source right now but figured I'd throw it out to list to > see if this is something obvious I'm doing wrong. > > Long time monit user but on a few of our apps we have recently been having > problems with the shutdown action possibly not running. > > For the app that DOES shut down properly logs show the following: > > [CST Mar 4 17:00:02] info : 'foo' stop on user request > [CST Mar 4 17:00:02] info : Monit daemon with PID 17733 awakened > [CST Mar 4 17:00:02] info : Awakened by User defined signal 1 > [CST Mar 4 17:00:02] info : 'foo' stop: '/usr/bin/pkill -u nobody -f > /usr/local/bin/foo.py' > [CST Mar 4 17:00:02] info : 'foo' stop action done > > For the app that is not stopping properly logs show the following: > > [CST Mar 4 15:15:01] info : 'bar' stop on user request > [CST Mar 4 15:15:01] info : Monit daemon with PID 17733 awakened > [CST Mar 4 15:15:01] info : Awakened by User defined signal 1 > [CST Mar 4 15:15:01] info : 'bar' stop action done > > Could be a red herring but where is the stop action line in the second log > excerpt? Now the shutdown commands are indeed different between foo & bar > but still would expect to see the stop action listed. > > TIA > Marc > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general > > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general