Re: [ClusterLabs] CRM managing ADSL connection; failure not handled

2015-08-27 Thread Ken Gaillot
On 08/27/2015 03:04 AM, Tom Yates wrote:
 On Mon, 24 Aug 2015, Andrei Borzenkov wrote:
 
 24.08.2015 13:32, Tom Yates пишет:
  if i understand you aright, my problem is that the stop script didn't
  return a 0 (OK) exit status, so CRM didn't know where to go.  is the
  exit status of the stop script how CRM determines the status of the
 stop
  operation?

 correct

  does CRM also use the output of /etc/init.d/script status to
 determine
  continuing successful operation?

 It definitely does not use *output* of script - only return code. If
 the question is whether it probes resource additionally to checking
 stop exit code - I do not think so (I know it does it in some cases
 for systemd resources).
 
 i just thought i'd come back and follow-up.  in testing this morning, i
 can confirm that the pppoe-stop command returns status 1 if pppd isn't
 running.  that makes a standard init.d script, which passes on the
 return code of the stop command, unhelpful to CRM.
 
 i changed the script so that on stop, having run pppoe-stop, it checks
 for the existence of a working ppp0 interface, and returns 0 IFO there
 is none.

Nice

 If resource was previously active and stop was attempted as cleanup
 after resource failure - yes, it should attempt to start it again.
 
 that is now what happens.  it seems to try three time to bring up pppd,
 then kicks the service over to the other node.
 
 in the case of extended outages (ie, the ISP goes away for more than
 about 10 minutes), where both nodes have time to fail, we end up back in
 the bad old state (service failed on both nodes):
 
 [root@positron ~]# crm status
 [...]
 Online: [ electron positron ]
 
  Resource Group: BothIPs
  InternalIP (ocf::heartbeat:IPaddr):Started electron
  ExternalIP (lsb:hb-adsl-helper):   Stopped
 
 Failed actions:
 ExternalIP_monitor_6 (node=positron, call=15, rc=7,
 status=complete): not running
 ExternalIP_start_0 (node=positron, call=17, rc=-2, status=Timed
 Out): unknown exec error
 ExternalIP_start_0 (node=electron, call=6, rc=-2, status=Timed Out):
 unknown exec error
 
 is there any way to configure CRM to keep kicking the service between
 the two nodes forever (ie, try three times on positron, kick service
 group to electron, try three times on electron, kick back to positron,
 lather rinse repeat...)?
 
 for a service like DSL, which can go away for extended periods through
 no local fault then suddenly and with no announcement come back, this
 would be most useful behaviour.

Yes, see migration-threshold and failure-timeout.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options

 thanks to all for help with this.  thanks also to those who have
 suggested i rewrite this as an OCF agent (especially to ken gaillot who
 was kind enough to point me to documentation); i will look at that if
 time permits.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource-stickiness

2015-08-27 Thread Ken Gaillot
On 08/27/2015 02:42 AM, Rakovec Jost wrote:
 Hi
 
 
 it doesn't work as I expected, I change name to:
 
 location loc-aapche-sles1 aapche role=Started 10: sles1
 
 
 but after I manual move resource via HAWK to other node it auto add this line:
 
 location cli-prefer-aapche aapche role=Started inf: sles1
 
 
 so now I have both lines:
 
 location cli-prefer-aapche aapche role=Started inf: sles1
 location loc-aapche-sles1 aapche role=Started 10: sles1

When you manually move a resource using a command-line tool, those tools
accomplish the moving by adding a constraint, like the one you see added
above.

Such tools generally provide another option to clear any constraints
they added, which you can manually run after you are satisfied with the
state of things. Until you do so, the added constraint will remain, and
will affect resource placement.

 
 and resource-stickiness doesn't work since after fence node1 the resource is 
 move back to node1 after node1 come back and this is what I don't like. I 
 know that I can remove line  that was added by cluster, but this is not the 
 proper solution. Please tell me what is wrong. Thanks.  My config: 

Resource placement depends on many factors. Scores affect the outcome;
stickiness has a score, and each constraint has a score, and the active
node with the highest score wins.

In your config, resource-stickiness has a score of 1000, but
cli-aapche-sles1 has a score of inf (infinity), so sles1 wins when it
comes back online (infinity  1000). By contrast, loc-aapche-sles1 has a
score of 10, so by itself, it would not cause the resource to move back
(10  1000).

To achieve what you want, clear the temporary constraint added by hawk,
before sles1 comes back.

 node sles1
 node sles2
 primitive filesystem Filesystem \
 params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60 \
 op monitor interval=20 timeout=40
 primitive myip IPaddr2 \
 params ip=10.9.131.86 \
 op start interval=0 timeout=20s \
 op stop interval=0 timeout=20s \
 op monitor interval=10s timeout=20s
 primitive stonith_sbd stonith:external/sbd \
 params pcmk_delay_max=30
 primitive web apache \
 params configfile=/etc/apache2/httpd.conf \
 op start interval=0 timeout=40s \
 op stop interval=0 timeout=60s \
 op monitor interval=10 timeout=20s
 group aapche filesystem myip web \
 meta target-role=Started is-managed=true resource-stickiness=1000
 location cli-prefer-aapche aapche role=Started inf: sles1
 location loc-aapche-sles1 aapche role=Started 10: sles1
 property cib-bootstrap-options: \
 stonith-enabled=true \
 no-quorum-policy=ignore \
 placement-strategy=balanced \
 expected-quorum-votes=2 \
 dc-version=1.1.12-f47ea56 \
 cluster-infrastructure=classic openais (with plugin) \
 last-lrm-refresh=1440502955 \
 stonith-timeout=40s
 rsc_defaults rsc-options: \
 resource-stickiness=1000 \
 migration-threshold=3
 op_defaults op-options: \
 timeout=600 \
 record-pending=true
 
 
 BR
 
 Jost
 
 
 
 
 From: Andrew Beekhof and...@beekhof.net
 Sent: Thursday, August 27, 2015 12:20 AM
 To: Cluster Labs - All topics related to open-source clustering welcomed
 Subject: Re: [ClusterLabs] resource-stickiness
 
 On 26 Aug 2015, at 10:09 pm, Rakovec Jost jost.rako...@snt.si wrote:

 Sorry  one typo: problem is the same


 location cli-prefer-aapche aapche role=Started 10: sles2
 
 Change the name of your constraint.
 The 'cli-prefer-’ prefix is reserved for “temporary” constraints created by 
 the command line tools (which therefor feel entitled to delete them as 
 necessary).
 

 to:

 location cli-prefer-aapche aapche role=Started inf: sles2


 It keep change to infinity.



 my configuration is:

 node sles1
 node sles2
 primitive filesystem Filesystem \
params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60 \
op monitor interval=20 timeout=40
 primitive myip IPaddr2 \
params ip=x.x.x.x \
op start interval=0 timeout=20s \
op stop interval=0 timeout=20s \
op monitor interval=10s timeout=20s
 primitive stonith_sbd stonith:external/sbd \
params pcmk_delay_max=30
 primitive web apache \
params configfile=/etc/apache2/httpd.conf \
op start interval=0 timeout=40s \
op stop interval=0 timeout=60s \
op monitor interval=10 timeout=20s
 group aapche filesystem myip web \
meta target-role=Started is-managed=true resource-stickiness=1000
 location cli-prefer-aapche aapche role=Started 10: sles2
 property cib-bootstrap-options: \
stonith-enabled=true \
no-quorum-policy=ignore \
placement-strategy=balanced 

Re: [ClusterLabs] resource-stickiness

2015-08-27 Thread Rakovec Jost
Hi


it doesn't work as I expected, I change name to:

location loc-aapche-sles1 aapche role=Started 10: sles1


but after I manual move resource via HAWK to other node it auto add this line:

location cli-prefer-aapche aapche role=Started inf: sles1


so now I have both lines:

location cli-prefer-aapche aapche role=Started inf: sles1
location loc-aapche-sles1 aapche role=Started 10: sles1


and resource-stickiness doesn't work since after fence node1 the resource is 
move back to node1 after node1 come back and this is what I don't like. I know 
that I can remove line  that was added by cluster, but this is not the proper 
solution. Please tell me what is wrong. Thanks.  My config: 

node sles1
node sles2
primitive filesystem Filesystem \
params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60 \
op monitor interval=20 timeout=40
primitive myip IPaddr2 \
params ip=10.9.131.86 \
op start interval=0 timeout=20s \
op stop interval=0 timeout=20s \
op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
params pcmk_delay_max=30
primitive web apache \
params configfile=/etc/apache2/httpd.conf \
op start interval=0 timeout=40s \
op stop interval=0 timeout=60s \
op monitor interval=10 timeout=20s
group aapche filesystem myip web \
meta target-role=Started is-managed=true resource-stickiness=1000
location cli-prefer-aapche aapche role=Started inf: sles1
location loc-aapche-sles1 aapche role=Started 10: sles1
property cib-bootstrap-options: \
stonith-enabled=true \
no-quorum-policy=ignore \
placement-strategy=balanced \
expected-quorum-votes=2 \
dc-version=1.1.12-f47ea56 \
cluster-infrastructure=classic openais (with plugin) \
last-lrm-refresh=1440502955 \
stonith-timeout=40s
rsc_defaults rsc-options: \
resource-stickiness=1000 \
migration-threshold=3
op_defaults op-options: \
timeout=600 \
record-pending=true


BR

Jost




From: Andrew Beekhof and...@beekhof.net
Sent: Thursday, August 27, 2015 12:20 AM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] resource-stickiness

 On 26 Aug 2015, at 10:09 pm, Rakovec Jost jost.rako...@snt.si wrote:

 Sorry  one typo: problem is the same


 location cli-prefer-aapche aapche role=Started 10: sles2

Change the name of your constraint.
The 'cli-prefer-’ prefix is reserved for “temporary” constraints created by the 
command line tools (which therefor feel entitled to delete them as necessary).


 to:

 location cli-prefer-aapche aapche role=Started inf: sles2


 It keep change to infinity.



 my configuration is:

 node sles1
 node sles2
 primitive filesystem Filesystem \
params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60 \
op monitor interval=20 timeout=40
 primitive myip IPaddr2 \
params ip=x.x.x.x \
op start interval=0 timeout=20s \
op stop interval=0 timeout=20s \
op monitor interval=10s timeout=20s
 primitive stonith_sbd stonith:external/sbd \
params pcmk_delay_max=30
 primitive web apache \
params configfile=/etc/apache2/httpd.conf \
op start interval=0 timeout=40s \
op stop interval=0 timeout=60s \
op monitor interval=10 timeout=20s
 group aapche filesystem myip web \
meta target-role=Started is-managed=true resource-stickiness=1000
 location cli-prefer-aapche aapche role=Started 10: sles2
 property cib-bootstrap-options: \
stonith-enabled=true \
no-quorum-policy=ignore \
placement-strategy=balanced \
expected-quorum-votes=2 \
dc-version=1.1.12-f47ea56 \
cluster-infrastructure=classic openais (with plugin) \
last-lrm-refresh=1440502955 \
stonith-timeout=40s
 rsc_defaults rsc-options: \
resource-stickiness=1000 \
migration-threshold=3
 op_defaults op-options: \
timeout=600 \
record-pending=true



 and after migration:


 node sles1
 node sles2
 primitive filesystem Filesystem \
params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60 \
op monitor interval=20 timeout=40
 primitive myip IPaddr2 \
params ip=10.9.131.86 \
op start interval=0 timeout=20s \
op stop interval=0 timeout=20s \
op monitor interval=10s timeout=20s
 primitive stonith_sbd stonith:external/sbd \
params pcmk_delay_max=30
 primitive web apache \
params configfile=/etc/apache2/httpd.conf \
op start interval=0 timeout=40s \
op stop interval=0 timeout=60s \
op