Is there a reason not to use a colocation constraint instead? If X_vipis
colocated with X, it will be moved if X fails.
[hhl]: the movement should take place as well if X stopped (the start is
on-going). I don't know if the colocation would satisfy this requirement.
I don't see any reason in your configuration why the services wouldn'tbe
restarted. It's possible the cluster tried to restart the service,but the stop
action failed. Since you have stonith disabled, the clustercan't recover from a
failed stop action.
[hhl]: the ocf logs showed the pacemaker never entered the stop function in
this case.Is there a reason you disabled quorum? With 3 nodes, if they get
splitinto groups of 1 node and 2 nodes, quorum is what keeps the groups
fromboth starting all resources.
[hhl]: I enabled the quorum and had a retry, the same happens.
b.t.w, I repeat sevaral times today, and found when I trigger the condition on
one node that would fail all the clone resources, only one would get restared,
the other two would fail to restart.
> trigger the failure conditon on paas-controller-1
Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
router_vip (ocf::heartbeat:IPaddr2): Started paas-controller-2
sdclient_vip (ocf::heartbeat:IPaddr2): Started paas-controller-3
apigateway_vip (ocf::heartbeat:IPaddr2): Started paas-controller-3
Clone Set: sdclient_rep [sdclient]
Started: [ paas-controller-2 paas-controller-3 ]
Stopped: [ paas-controller-1 ]
Clone Set: router_rep [router]
router (ocf::heartbeat:router):Started paas-controller-1
FAILED
Started: [ paas-controller-2 paas-controller-3 ]
Clone Set: apigateway_rep [apigateway]
apigateway (ocf::heartbeat:apigateway):Started paas-controller-1
FAILED
Started: [ paas-controller-2 paas-controller-3 ]
> trigger the failure conditon on paas-controller-3
Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
router_vip (ocf::heartbeat:IPaddr2): Started paas-controller-2
sdclient_vip (ocf::heartbeat:IPaddr2): Started paas-controller-3
apigateway_vip (ocf::heartbeat:IPaddr2): Started paas-controller-3
Clone Set: sdclient_rep [sdclient]
sdclient (ocf::heartbeat:sdclient): Started paas-controller-3
FAILED
Started: [ paas-controller-1 paas-controller-2 ]
Clone Set: router_rep [router]
Started: [ paas-controller-1 paas-controller-2 ]
Stopped: [ paas-controller-3 ]
Clone Set: apigateway_rep [apigateway]
apigateway (ocf::heartbeat:apigateway):Started paas-controller-3
FAILED
Started: [ paas-controller-1 paas-controller-2 ]
原始邮件
发件人: <kgail...@redhat.com>
收件人:何海龙10164561
抄送人: <users@clusterlabs.org>
日 期 :2017年02月15日 06:14
主 题 :Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail
On 02/13/2017 07:08 PM, he.hailo...@zte.com.cn wrote:
> Hi,
>
>
> > crm configure show
>
> + crm configure show
>
> node $id="336855579" paas-controller-1
>
> node $id="336855580" paas-controller-2
>
> node $id="336855581" paas-controller-3
>
> primitive apigateway ocf:heartbeat:apigateway \
>
> op monitor interval="2s" timeout="20s" on-fail="restart" \
>
> op stop interval="0" timeout="200s" on-fail="restart" \
>
> op start interval="0" timeout="h" on-fail="restart"
>
> primitive apigateway_vip ocf:heartbeat:IPaddr2 \
>
> params ip="20.20.2.7" cidr_netmask="24" \
>
> op start interval="0" timeout="20" \
>
> op stop interval="0" timeout="20" \
>
> op monitor timeout="20s" interval="2s" depth="0"
>
> primitive router ocf:heartbeat:router \
>
> op monitor interval="2s" timeout="20s" on-fail="restart" \
>
> op stop interval="0" timeout="200s" on-fail="restart" \
>
> op start interval="0" timeout="h" on-fail="restart"
>
> primitive router_vip ocf:heartbeat:IPaddr2 \
>
> params ip="10.10.1.7" cidr_netmask="24" \
>
> op start interval="0" timeout="20" \
>
> op stop interval="0" timeout="20" \
>
> op monitor timeout="20s" interval="2s" depth="0"
>
> primitive sdclient ocf:heartbeat:sdclient \
>
> op monitor interval="2s" timeout="20s" on-fail="restart" \
>
> op stop interval="0" timeout="200s" on-fail="restart" \
>
> op start interval="0" timeout="h" on-fail="restart"
>
> primitive sdclient_vip ocf: