Re: [ClusterLabs] CRM managing ADSL connection; failure not handled
On 08/27/2015 03:04 AM, Tom Yates wrote: On Mon, 24 Aug 2015, Andrei Borzenkov wrote: 24.08.2015 13:32, Tom Yates пишет: if i understand you aright, my problem is that the stop script didn't return a 0 (OK) exit status, so CRM didn't know where to go. is the exit status of the stop script how CRM determines the status of the stop operation? correct does CRM also use the output of /etc/init.d/script status to determine continuing successful operation? It definitely does not use *output* of script - only return code. If the question is whether it probes resource additionally to checking stop exit code - I do not think so (I know it does it in some cases for systemd resources). i just thought i'd come back and follow-up. in testing this morning, i can confirm that the pppoe-stop command returns status 1 if pppd isn't running. that makes a standard init.d script, which passes on the return code of the stop command, unhelpful to CRM. i changed the script so that on stop, having run pppoe-stop, it checks for the existence of a working ppp0 interface, and returns 0 IFO there is none. Nice If resource was previously active and stop was attempted as cleanup after resource failure - yes, it should attempt to start it again. that is now what happens. it seems to try three time to bring up pppd, then kicks the service over to the other node. in the case of extended outages (ie, the ISP goes away for more than about 10 minutes), where both nodes have time to fail, we end up back in the bad old state (service failed on both nodes): [root@positron ~]# crm status [...] Online: [ electron positron ] Resource Group: BothIPs InternalIP (ocf::heartbeat:IPaddr):Started electron ExternalIP (lsb:hb-adsl-helper): Stopped Failed actions: ExternalIP_monitor_6 (node=positron, call=15, rc=7, status=complete): not running ExternalIP_start_0 (node=positron, call=17, rc=-2, status=Timed Out): unknown exec error ExternalIP_start_0 (node=electron, call=6, rc=-2, status=Timed Out): unknown exec error is there any way to configure CRM to keep kicking the service between the two nodes forever (ie, try three times on positron, kick service group to electron, try three times on electron, kick back to positron, lather rinse repeat...)? for a service like DSL, which can go away for extended periods through no local fault then suddenly and with no announcement come back, this would be most useful behaviour. Yes, see migration-threshold and failure-timeout. http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options thanks to all for help with this. thanks also to those who have suggested i rewrite this as an OCF agent (especially to ken gaillot who was kind enough to point me to documentation); i will look at that if time permits. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] resource-stickiness
On 08/27/2015 02:42 AM, Rakovec Jost wrote: Hi it doesn't work as I expected, I change name to: location loc-aapche-sles1 aapche role=Started 10: sles1 but after I manual move resource via HAWK to other node it auto add this line: location cli-prefer-aapche aapche role=Started inf: sles1 so now I have both lines: location cli-prefer-aapche aapche role=Started inf: sles1 location loc-aapche-sles1 aapche role=Started 10: sles1 When you manually move a resource using a command-line tool, those tools accomplish the moving by adding a constraint, like the one you see added above. Such tools generally provide another option to clear any constraints they added, which you can manually run after you are satisfied with the state of things. Until you do so, the added constraint will remain, and will affect resource placement. and resource-stickiness doesn't work since after fence node1 the resource is move back to node1 after node1 come back and this is what I don't like. I know that I can remove line that was added by cluster, but this is not the proper solution. Please tell me what is wrong. Thanks. My config: Resource placement depends on many factors. Scores affect the outcome; stickiness has a score, and each constraint has a score, and the active node with the highest score wins. In your config, resource-stickiness has a score of 1000, but cli-aapche-sles1 has a score of inf (infinity), so sles1 wins when it comes back online (infinity 1000). By contrast, loc-aapche-sles1 has a score of 10, so by itself, it would not cause the resource to move back (10 1000). To achieve what you want, clear the temporary constraint added by hawk, before sles1 comes back. node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=10.9.131.86 \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started inf: sles1 location loc-aapche-sles1 aapche role=Started 10: sles1 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure=classic openais (with plugin) \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true BR Jost From: Andrew Beekhof and...@beekhof.net Sent: Thursday, August 27, 2015 12:20 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] resource-stickiness On 26 Aug 2015, at 10:09 pm, Rakovec Jost jost.rako...@snt.si wrote: Sorry one typo: problem is the same location cli-prefer-aapche aapche role=Started 10: sles2 Change the name of your constraint. The 'cli-prefer-’ prefix is reserved for “temporary” constraints created by the command line tools (which therefor feel entitled to delete them as necessary). to: location cli-prefer-aapche aapche role=Started inf: sles2 It keep change to infinity. my configuration is: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=x.x.x.x \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced
Re: [ClusterLabs] resource-stickiness
Hi it doesn't work as I expected, I change name to: location loc-aapche-sles1 aapche role=Started 10: sles1 but after I manual move resource via HAWK to other node it auto add this line: location cli-prefer-aapche aapche role=Started inf: sles1 so now I have both lines: location cli-prefer-aapche aapche role=Started inf: sles1 location loc-aapche-sles1 aapche role=Started 10: sles1 and resource-stickiness doesn't work since after fence node1 the resource is move back to node1 after node1 come back and this is what I don't like. I know that I can remove line that was added by cluster, but this is not the proper solution. Please tell me what is wrong. Thanks. My config: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=10.9.131.86 \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started inf: sles1 location loc-aapche-sles1 aapche role=Started 10: sles1 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure=classic openais (with plugin) \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true BR Jost From: Andrew Beekhof and...@beekhof.net Sent: Thursday, August 27, 2015 12:20 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] resource-stickiness On 26 Aug 2015, at 10:09 pm, Rakovec Jost jost.rako...@snt.si wrote: Sorry one typo: problem is the same location cli-prefer-aapche aapche role=Started 10: sles2 Change the name of your constraint. The 'cli-prefer-’ prefix is reserved for “temporary” constraints created by the command line tools (which therefor feel entitled to delete them as necessary). to: location cli-prefer-aapche aapche role=Started inf: sles2 It keep change to infinity. my configuration is: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=x.x.x.x \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure=classic openais (with plugin) \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true and after migration: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=10.9.131.86 \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op