Yeah was looking through the code and saw the call to check if process is
running before issuing stop (ProcessTree_findProcess), so that was only
thought I had as well.

check process foo matching /usr/local/bin/foo.py
      start program = "/bin/bash -l -c 'nohup /usr/local/bin/foo.py &'" as
uid "nobody"
      stop program = "/usr/bin/pkill -u nobody -f /usr/local/bin/foo.py" as
uid "nobody"
      if uptime > 11 hours then alert
      if uptime > 12 hours then exec "/usr/bin/pkill -u nobody -f -9
/usr/local/bin/foo.py" as uid "nobody"
      if 2 restarts within 3 cycles then timeout
      group apps
      depends foo.py

check process bar matching ^/usr/local/bin/bar
      start program = "/bin/bash -lc 'HOME=/home/someuser nohup
/usr/local/bin/bar.sh > /tmp/bar-startup.out 2>&1 &'"
      stop program = "/bin/bash -c '/usr/bin/pkill -f ^/usr/local/bin/bar;
sleep 1; /usr/bin/pkill -f ^/usr/local/bin/bar'"
      onreboot nostart
      if uptime > 12 hours then exec "/usr/bin/pkill -9 -f
^/usr/local/bin/bar"
      group apps
      mode passive

Here are logs from yesterday and today wrt to "bar"

[CST Mar  1 15:15:01] info     : 'bar' stop action done
[CST Mar  4 07:02:01] info     : 'bar' start on user request
[CST Mar  4 07:02:01] info     : 'bar' start action done
[CST Mar  4 07:02:01] error    : 'bar' uptime test failed for
/usr/local/bin/bar-- current uptime is 259177 seconds
<we get above since it failed to shutdown on 3/1>
[CST Mar  4 07:02:01] info     : 'bar' exec: '/usr/bin/pkill -9 -f
/usr/local/bin/bar'
[CST Mar  4 07:02:21] error    : 'bar' process is not running
<above line repeats every 20 seconds until we manually start it via monit>
[CST Mar  4 07:51:11] info     : 'bar' start: '/bin/bash -lc
HOME=/home/someuser nohup /usr/local/bin/bar.sh > /tmp/bar-startup.out 2>&1
&'
[CST Mar  4 07:51:11] info     : 'bar' start action done
[CST Mar  4 07:51:11] info     : 'bar' process is running with pid 4897
[CST Mar  4 07:51:11] info     : 'bar' uptime test succeeded [current
uptime = 1 seconds]
[CST Mar  4 15:15:01] info     : 'bar' stop on user request
[CST Mar  4 15:15:01] info     : 'bar' stop action done
<below same thing repeats itself the following morning>
[CST Mar  5 07:02:01] info     : 'bar' start on user request
[CST Mar  5 07:02:01] info     : 'bar' start action done
[CST Mar  5 07:02:01] error    : 'bar' uptime test failed for
/usr/local/bin/bar-- current uptime is 83451 seconds
[CST Mar  5 07:02:01] info     : 'bar' exec: '/usr/bin/pkill -9 -f
/usr/local/bin/bar'

Thanks again for looking. Worst case I'll just build a debug version of
monit with some extra logging to see what is going on.



On Tue, Mar 5, 2019 at 2:40 PM mart...@tildeslash.com <
mart...@tildeslash.com> wrote:

> Hi,
>
> please can you add the configuration of "foo" and "bar" services?
>
> There are for example these possible reasons:
>
> 1.) the "bar" service is a process and monit detected that the process is
> not running - in this case it gets a fast path and stop is skipped (the
> process is not running)
>
> 2.) there was a problem if you used "check program" in combination with
> the "every" statement ... fixed in monit 5.25.3:
> https://bitbucket.org/tildeslash/monit/issues/759
>
> Best regards,
> Martin
>
>
> On 5 Mar 2019, at 16:24, Marc Rossi <mross...@gmail.com> wrote:
>
> Looking through source right now but figured I'd throw it out to list to
> see if this is something obvious I'm doing wrong.
>
> Long time monit user but on a few of our apps we have recently been having
> problems with the shutdown action possibly not running.
>
> For the app that DOES shut down properly logs show the following:
>
> [CST Mar  4 17:00:02] info     : 'foo' stop on user request
> [CST Mar  4 17:00:02] info     : Monit daemon with PID 17733 awakened
> [CST Mar  4 17:00:02] info     : Awakened by User defined signal 1
> [CST Mar  4 17:00:02] info     : 'foo' stop: '/usr/bin/pkill -u nobody -f
> /usr/local/bin/foo.py'
> [CST Mar  4 17:00:02] info     : 'foo' stop action done
>
> For the app that is not stopping properly logs show the following:
>
> [CST Mar  4 15:15:01] info     : 'bar' stop on user request
> [CST Mar  4 15:15:01] info     : Monit daemon with PID 17733 awakened
> [CST Mar  4 15:15:01] info     : Awakened by User defined signal 1
> [CST Mar  4 15:15:01] info     : 'bar' stop action done
>
> Could be a red herring but where is the stop action line in the second log
> excerpt? Now the shutdown commands are indeed different between foo & bar
> but still would expect to see the stop action listed.
>
> TIA
> Marc
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
-- 
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to