[ClusterLabs] 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-15 Thread he.hailong5
I just tried using colocation, it dosen't work.




I failed the node paas-controller-3, but sdclient_vip didn't get moved:




Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-1 paas-controller-2 ]

 Stopped: [ paas-controller-3 ]

 Clone Set: router_rep [router]

 router (ocf::heartbeat:router):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]

 Clone Set: apigateway_rep [apigateway]

 apigateway (ocf::heartbeat:apigateway):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]




here is the configuration:

>crm configure show

node $id="336855579" paas-controller-1

node $id="336855580" paas-controller-2

node $id="336855581" paas-controller-3

primitive apigateway ocf:heartbeat:apigateway \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive apigateway_vip ocf:heartbeat:IPaddr2 \

params ip="20.20.2.7" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

primitive router ocf:heartbeat:router \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive router_vip ocf:heartbeat:IPaddr2 \

params ip="10.10.1.7" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

primitive sdclient ocf:heartbeat:sdclient \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive sdclient_vip ocf:heartbeat:IPaddr2 \

params ip="10.10.1.8" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

clone apigateway_rep apigateway

clone router_rep router

clone sdclient_rep sdclient

colocation apigateway_colo +inf: apigateway_vip apigateway_rep:Started

colocation router_colo +inf: router_vip router_rep:Started

colocation sdclient_colo +inf: sdclient_vip sdclient_rep:Started

property $id="cib-bootstrap-options" \

dc-version="1.1.10-42f2063" \

cluster-infrastructure="corosync" \

stonith-enabled="false" \

no-quorum-policy="stop" \

start-failure-is-fatal="false" \

last-lrm-refresh="1486981647"

op_defaults $id="op_defaults-options" \

on-fail="restart"













原始邮件



发件人:何海龙10164561
收件人: <kgail...@redhat.com>
抄送人: <users@clusterlabs.org>
日 期 :2017年02月15日 10:54
主 题 :答复: Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail






Is there a reason not to use a colocation constraint instead? If X_vipis 
colocated with X, it will be moved if X fails.
[hhl]: the movement should take place as well if X stopped (the start is 
on-going). I don't know if the colocation would satisfy this requirement.
I don't see any reason in your configuration why the services wouldn'tbe 
restarted. It's possible the cluster tried to restart the service,but the stop 
action failed. Since you have stonith disabled, the clustercan't recover from a 
failed stop action.




[hhl]: the ocf logs showed the pacemaker never entered the stop function in 
this case.Is there a reason you disabled quorum? With 3 nodes, if they get 
splitinto groups of 1 node and 2 nodes, quorum is what keeps the groups 
fromboth starting all resources.




[hhl]: I enabled the quorum and had a retry, the same happens.

b.t.w, I repeat sevaral times today, and found when I trigger the condition on 
one node that w

[ClusterLabs] 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-14 Thread he.hailong5
Is there a reason not to use a colocation constraint instead? If X_vipis 
colocated with X, it will be moved if X fails.
[hhl]: the movement should take place as well if X stopped (the start is 
on-going). I don't know if the colocation would satisfy this requirement.
I don't see any reason in your configuration why the services wouldn'tbe 
restarted. It's possible the cluster tried to restart the service,but the stop 
action failed. Since you have stonith disabled, the clustercan't recover from a 
failed stop action.




[hhl]: the ocf logs showed the pacemaker never entered the stop function in 
this case.Is there a reason you disabled quorum? With 3 nodes, if they get 
splitinto groups of 1 node and 2 nodes, quorum is what keeps the groups 
fromboth starting all resources.




[hhl]: I enabled the quorum and had a retry, the same happens.

b.t.w, I repeat sevaral times today, and found when I trigger the condition on 
one node that would fail all the clone resources, only one would get restared, 
the other two would fail to restart.




> trigger the failure conditon on paas-controller-1




Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-2 paas-controller-3 ]

 Stopped: [ paas-controller-1 ]

 Clone Set: router_rep [router]

 router (ocf::heartbeat:router):Started paas-controller-1 
FAILED 

 Started: [ paas-controller-2 paas-controller-3 ]

 Clone Set: apigateway_rep [apigateway]

 apigateway (ocf::heartbeat:apigateway):Started paas-controller-1 
FAILED 

 Started: [ paas-controller-2 paas-controller-3 ]

 


> trigger the failure conditon on paas-controller-3



Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 Clone Set: sdclient_rep [sdclient]

 sdclient   (ocf::heartbeat:sdclient):  Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]

 Clone Set: router_rep [router]

 Started: [ paas-controller-1 paas-controller-2 ]

 Stopped: [ paas-controller-3 ]

 Clone Set: apigateway_rep [apigateway]

 apigateway (ocf::heartbeat:apigateway):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]










原始邮件



发件人: <kgail...@redhat.com>
收件人:何海龙10164561
抄送人: <users@clusterlabs.org>
日 期 :2017年02月15日 06:14
主 题 :Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail





On 02/13/2017 07:08 PM, he.hailo...@zte.com.cn wrote:
> Hi,
> 
> 
> > crm configure show
> 
> + crm configure show
> 
> node $id="336855579" paas-controller-1
> 
> node $id="336855580" paas-controller-2
> 
> node $id="336855581" paas-controller-3
> 
> primitive apigateway ocf:heartbeat:apigateway \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive apigateway_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="20.20.2.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive router ocf:heartbeat:router \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive router_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="10.10.1.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive sdclient ocf:heartbeat:sdclient \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive sdclient_vip ocf: