On 08/24/2015 04:52 AM, Andrei Borzenkov wrote: > 24.08.2015 12:35, Tom Yates пишет: >> I've got a failover firewall pair where the external interface is ADSL; >> that is, PPPoE. i've defined the service thus: >> >> primitive ExternalIP lsb:hb-adsl-helper \ >> op monitor interval="60s" >> >> and in addition written a noddy script /etc/init.d/hb-adsl-helper, thus: >> >> #!/bin/bash >> RETVAL=0 >> start() { >> /sbin/pppoe-start >> } >> stop() { >> /sbin/pppoe-stop >> } >> case "$1" in >> start) >> start >> ;; >> stop) >> stop >> ;; >> status) >> /sbin/ifconfig ppp0 >& /dev/null && exit 0 >> exit 1 >> ;; >> *) >> echo $"Usage: $0 {start|stop|status}" >> exit 3 >> esac >> exit $?
Pacemaker expects that LSB agents follow the LSB spec for return codes, and won't be able to behave properly if they don't: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-lsb However it's just as easy to write an OCF agent, which gives you more flexibility (accepting parameters, etc.): http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf >> The problem is that sometimes the ADSL connection falls over, as they >> do, eg: >> >> Aug 20 11:42:10 positron pppd[2469]: LCP terminated by peer >> Aug 20 11:42:10 positron pppd[2469]: Connect time 8619.4 minutes. >> Aug 20 11:42:10 positron pppd[2469]: Sent 1342528799 bytes, received >> 164420300 bytes. >> Aug 20 11:42:13 positron pppd[2469]: Connection terminated. >> Aug 20 11:42:13 positron pppd[2469]: Modem hangup >> Aug 20 11:42:13 positron pppoe[2470]: read (asyncReadFromPPP): Session >> 1735: Input/output error >> Aug 20 11:42:13 positron pppoe[2470]: Sent PADT >> Aug 20 11:42:13 positron pppd[2469]: Exit. >> Aug 20 11:42:13 positron pppoe-connect: PPPoE connection lost; >> attempting re-connection. >> >> CRMd then logs a bunch of stuff, followed by >> >> Aug 20 11:42:18 positron lrmd: [1760]: info: rsc:ExternalIP:8: stop >> Aug 20 11:42:18 positron lrmd: [28357]: WARN: For LSB init script, no >> additional parameters are needed. >> [...] >> Aug 20 11:42:18 positron pppoe-stop: Killing pppd >> Aug 20 11:42:18 positron pppoe-stop: Killing pppoe-connect >> Aug 20 11:42:18 positron lrmd: [1760]: WARN: Managed ExternalIP:stop >> process 28357 exited with return code 1. >> >> >> At this point, the PPPoE connection is down, and stays down. CRMd >> doesn't fail the group which contains both internal and external >> interfaces over to the other node, but nor does it try to restart the >> service. I'm fairly sure this is because I've done something >> boneheaded, but I can't get my bone head around what it might be. >> >> Any light anyone can shed is much appreciated. >> >> > > If stop operation failed resource state is undefined; pacemaker won't do > anything with this resource. Either make sure script returns success > when appropriate or the only option is to make it fence node where > resource was active. > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org