That gets me close, but when a process dies monit goes into
something of a tailspin. Here is what I see in the logs when I break
apache: 

Dec 7 09:04:16 tecate monit[27339]: 'apache' process is not
running 
Dec 7 09:04:16 tecate monit[27339]: 'apache' exec:
/usr/bin/monit 
Dec 7 09:04:16 tecate monit[27339]: 'ospfd' stop on user
request 
Dec 7 09:04:16 tecate monit[27339]: monit daemon at 27339
awakened 
Dec 7 09:04:16 tecate monit[27339]: Awakened by User defined
signal 1 
Dec 7 09:04:16 tecate monit[27339]: 'ospfd' stop:
/etc/init.d/ospfd 
Dec 7 09:04:16 tecate monit[27339]: 'ospfd' stop
action done 
Dec 7 09:04:16 tecate monit[27339]: 'apache' process is not
running 
Dec 7 09:04:16 tecate monit[27339]: 'apache' exec:
/usr/bin/monit 
Dec 7 09:04:16 tecate monit[27339]: 'ospfd' stop on user
request 
Dec 7 09:04:16 tecate monit[27339]: monit daemon at 27339
awakened 
Dec 7 09:04:16 tecate monit[27339]: Awakened by User defined
signal 1 
Dec 7 09:04:16 tecate monit[27339]: 'ospfd' stop action done

Dec 7 09:04:16 tecate monit[612]: monit: action failed -- Other action
already in progress -- please try again later 
Dec 7 09:04:16 tecate
monit[27339]: 'apache' process is not running 
Dec 7 09:04:16 tecate
monit[27339]: 'apache' exec: /usr/bin/monit 
Dec 7 09:04:16 tecate
monit[27339]: 'ospfd' stop on user request 
Dec 7 09:04:16 tecate
monit[27339]: monit daemon at 27339 awakened 
Dec 7 09:04:16 tecate
monit[27339]: Awakened by User defined signal 1 

Dec 7 09:04:16 tecate
monit[27339]: Awakened by User defined signal 1 
Dec 7 09:04:16 tecate
monit[27339]: 'ospfd' stop action done 
Dec 7 09:04:16 tecate
monit[27339]: 'apache' process is not running 
Dec 7 09:04:16 tecate
monit[27339]: 'apache' exec: /usr/bin/monit 
Dec 7 09:04:16 tecate
monit[27339]: 'ospfd' stop on user request 
Dec 7 09:04:16 tecate
monit[27339]: monit daemon at 27339 awakened 

...

And those messages
are repeating as fast as I can tail the log. Any ideas how to make it
behave a bit better?

Here is what I have in my apache and ospf configs
now:

check process apache with pidfile /var/run/httpd.pid
 start
program = "/etc/init.d/httpd start"
 stop program = "/etc/init.d/httpd
stop"
 if does not exist
 then exec "/usr/bin/monit stop ospfd &&
/usr/bin/monit restart apache"
 else if recovered then exec
"/usr/bin/monit monitor ospfd"
 if failed host localhost port 80
protocol http
 and request "/" then restart
 if children > 50 then
restart
 if 2 restarts within 2 cycles then timeout
 group server

depends on tomcat
check process ospfd with pidfile
/var/run/quagga/ospfd.pid
 start program = "/etc/init.d/ospfd start"

stop program = "/etc/init.d/ospfd stop"
 depends on apache
 depends on
fcserver
 depends on mysql
 depends on tomcat
 group network

On
07.12.2011 01:15, Martin Pala wrote: 

> The dependency in monit is
currently "soft" … it defines the start/stop order and action cascading,
but it doesn't wait for the parent service to recover before starting
the dependent service. We'll address this in the future and will support
"hard" dependencies. 
> As a workaround you can do something like this:

> --8 
> 
> check process apache with pidfile /var/run/httpd.pid 
>
start program = "/etc/init.d/httpd start" 
> stop program =
"/etc/init.d/httpd stop" 
> if does not exist then exec "/bin/bash -c
'/usr/bin/monit stop ospfd && /usr/bin/monit restart apache'" else if
recovered "/usr/bin/monit start ospfd" 
> check process ospfd with
pidfile /var/run/quagga/ospfd.pid 
> start program = "/etc/init.d/ospfd
start" 
> stop program = "/etc/init.d/ospfd stop" 
> --8 
> => if the
apache crashes, the monit ospfd service is stopped and will be started
only if it will recover. 
> Regards, 
> Martin 
> 
> On Dec 6, 2011, at
11:39 PM, drich wrote: 
> 
>> Changing the cycle time changes the
frequency but not what happens. I still see essentially the following:

>> 
>> * apache stops and monit detects it
>> * monit attempts a
restart
>> * monit stops ospfd
>> * mont starts apache
>> * monit
unmonitors ospfd
>> * ... 30 seconds later ...
>> * apache fails to
start
>> * monit starts ospfd
>> * ... repeat the above cycle ...
>> *
after 2 cycles, it triggers my "2 restarts" rule and stops ospfd
>> 
>>
I don't think it should be starting ospfd at all since the dependent
service is failing to restart. 
>> 
>> On 06.12.2011 10:34, Rory Toma
wrote: 
>> 
>>> What is your cycle time? Is it 30 sec? If it is, try
increasing it to 1 minute.
>>> 
>>> On 12/6/11 9:12 AM, drich wrote:

>>> 
>>>> As I mentioned in an earlier e-mail, I'm trying to get monit
to watch a group of processes so it can start/stop ospfd for an anycast
high availability application. However, in doing this I'm seeing some
odd behaviour that doesn't match what I expect -- is this a bug? 
>>>>

>>>> In the scenario below, why is it ever trying to start ospfd? If
apache is down, shouldn't ospfd stay down until apache comes back up or
is monitored again after being unmonitored? It does end up in the
correct state at the end, but not without restarting and stopping ospfd
twice in the meantime. 
>>>> 
>>>> As an example, I have the following
configured: 
>>>> 
>>>> check process apache with pidfile
/var/run/httpd.pid
>>>> start program = "/etc/init.d/httpd start"
>>>>
stop program = "/etc/init.d/httpd stop"
>>>> if failed host localhost
port 80 protocol http
>>>> and request "/" then restart
>>>> if 2
restarts within 2 cycles then stop
>>>> 
>>>> check process ospfd with
pidfile /var/run/quagga/ospfd.pid
>>>> start program =
"/etc/init.d/ospfd start"
>>>> stop program = "/etc/init.d/ospfd
stop"
>>>> depends on apache 
>>>> 
>>>> If I make it so that apache
cannot run (by removing execute permissions on /usr/sbin/httpd) and then
kill it, I see the following in the monit logs: 
>>>> 
>>>> Dec 6
08:47:39 tecate monit[9988]: 'apache' process is not running 
>>>> Dec 6
08:47:39 tecate monit[9988]: 'apache' trying to restart 
>>>> Dec 6
08:47:39 tecate monit[9988]: 'ospfd' stop: /etc/init.d/ospfd 
>>>> Dec 6
08:47:39 tecate monit[9988]: 'apache' start: /etc/init.d/httpd 
>>>> Dec
6 08:47:40 tecate monit[9988]: 'ospfd' unmonitor on user request 
>>>>
Dec 6 08:47:40 tecate monit[9988]: monit daemon at 9988 awakened 
>>>>
Dec 6 08:48:09 tecate monit[9988]: 'apache' failed to start 
>>>> Dec 6
08:48:09 tecate monit[9988]: 'ospfd' start: /etc/init.d/ospfd 
>>>> Dec
6 08:48:09 tecate monit[9988]: 'ospfd' unmonitor action done 
>>>> Dec 6
08:48:09 tecate monit[9988]: Awakened by User defined signal 1 
>>>> Dec
6 08:48:09 tecate monit[9988]: 'apache' process is not running 
>>>> Dec
6 08:48:09 tecate monit[9988]: 'apache' trying to restart 
>>>> Dec 6
08:48:09 tecate monit[9988]: 'ospfd' stop: /etc/init.d/ospfd 
>>>> Dec 6
08:48:09 tecate monit[9988]: 'apache' start: /etc/init.d/httpd 
>>>> Dec
6 08:48:09 tecate monit[9988]: 'ospfd' unmonitor on user request 
>>>>
Dec 6 08:48:09 tecate monit[9988]: monit daemon at 9988 awakened 
>>>>
Dec 6 08:48:39 tecate monit[9988]: 'apache' failed to start 
>>>> Dec 6
08:48:39 tecate monit[9988]: 'ospfd' start: /etc/init.d/ospfd 
>>>> Dec
6 08:48:39 tecate monit[9988]: 'ospfd' unmonitor action done 
>>>> Dec 6
08:48:39 tecate monit[9988]: Awakened by User defined signal 1 
>>>> Dec
6 08:48:39 tecate monit[9988]: 'apache' service restarted 2 times within
2 cycles(s) - stop 
>>>> Dec 6 08:48:39 tecate monit[9988]: 'ospfd'
stop: /etc/init.d/ospfd 
>>>> Dec 6 08:48:39 tecate monit[9988]: 'ospfd'
unmonitor on user request 
>>>> Dec 6 08:48:39 tecate monit[9988]: monit
daemon at 9988 awakened 
>>>> Dec 6 08:48:39 tecate monit[9988]:
Awakened by User defined signal 1 
>>>> Dec 6 08:48:39 tecate
monit[9988]: 'ospfd' unmonitor action done 
>>>> 
>>>> -- 
>>>> 
>>>>
Dan Rich 
>>>> http://www.employees.org/~drich/ [1]
>>>> "Step up to red
alert!" "Are you sure, sir?
>>>> It means changing the bulb in the
sign..."
>>>> - Red Dwarf (BBC) 
>>>> 
>>>> --
>>>> To unsubscribe:
>>>>
https://lists.nongnu.org/mailman/listinfo/monit-general [2]
>> 
>> --

>> 
>> Dan Rich 
>> http://www.employees.org/~drich/ [4]
>> "Step up to
red alert!" "Are you sure, sir?
>> It means changing the bulb in the
sign..."
>> - Red Dwarf (BBC) --
>> To unsubscribe:
>>
https://lists.nongnu.org/mailman/listinfo/monit-general [5]

-- 

Dan
Rich  
http://www.employees.org/~drich/ [6]
 "Step up to red alert!"
"Are you sure, sir?
 It means changing the bulb in the sign..."
 - Red
Dwarf (BBC)   

Links:
------
[1] http://www.employees.org/%7Edrich/
[2]
https://lists.nongnu.org/mailman/listinfo/monit-general
[3]
mailto:[email protected]
[4] http://www.employees.org/%7Edrich/
[5]
https://lists.nongnu.org/mailman/listinfo/monit-general
[6]
http://www.employees.org/~drich/
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to