from:"he.hailong5"

[ClusterLabs] are there equivelent restful apis for crm commands

2017-11-07 Thread he.hailong5

Hi list,






For some purpose, I have to acquire some info within the docker container that 
usually can be made by executing crm commands on the host. Importing this tool 
into the container may involve fixing lots of dependency issues and you still 
need to figure out how to connect to the server from within the container. So I 
am wondering if there are already equivelent restful apis I can facilitate via 
network tools in the container to achieve this. Or other good ideas?






Thanks___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] question about ocf metadata actions

2017-03-30 Thread he.hailong5

Hi,


Does the timeout configured in the ocf metadata actually take effect?




＜actions＞

＜action name="start" timeout="300s" /＞

＜action name="stop" timeout="200s" /＞

＜action name="status" timeout="20s" /＞

＜action name="monitor" depth="0" timeout="20s" interval="2s" /＞

＜action name="meta-data" timeout="120s" /＞

＜action name="validate-all"  timeout="20s" /＞

＜/actions＞




what's the relationship with the ones configured using "crm configure 
primitive" ?

Br,

Allen___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: Re: 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-16 Thread he.hailong5

adding "sleep 5" before return in the stop func fixed the issue. so I suspect 
there must be concurrency bug somewhere in the code. just FYI.














原始邮件



发件人： ＜kgail...@redhat.com＞
收件人：何海龙10164561
抄送人： ＜users@clusterlabs.org＞
日 期 ：2017年02月15日 23:22
主 题 ：Re: 答复: Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail





On 02/15/2017 03:57 AM, he.hailo...@zte.com.cn wrote:
＞ I just tried using colocation, it dosen't work.
＞ 
＞ 
＞ I failed the node paas-controller-3, but sdclient_vip didn't get moved:

The colocation would work, but the problem you're having with router and
apigateway is preventing it from getting that far. In other words,
router and apigateway are still running on the node (they have not been
successfully stopped), so the colocation is still valid.

I suspect that the return codes from your custom resource agents may be
the issue. Make sure that your agents conform to these guidelines:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

In particular, "start" should not return until a monitor operation would
return success, "stop" should not return until a monitor would return
"not running", and "monitor" should return "not running" if called on a
host where the service hasn't started yet. Be sure you are returning the
proper OCF_* codes according to the table in the link above.

If the documentation is unclear, please ask here about anything you are
unsure of.

＞ 
＞ Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
＞ 
＞ 
＞  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
＞ 
＞  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
＞ 
＞  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
＞ 
＞  Clone Set: sdclient_rep [sdclient]
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 ]
＞ 
＞  Stopped: [ paas-controller-3 ]
＞ 
＞  Clone Set: router_rep [router]
＞ 
＞  router (ocf::heartbeat:router):Started
＞ paas-controller-3 FAILED 
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 ]
＞ 
＞  Clone Set: apigateway_rep [apigateway]
＞ 
＞  apigateway (ocf::heartbeat:apigateway):Started
＞ paas-controller-3 FAILED 
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 ]
＞ 
＞ 
＞ here is the configuration:
＞ 
＞ ＞crm configure show
＞ 
＞ node $id="336855579" paas-controller-1
＞ 
＞ node $id="336855580" paas-controller-2
＞ 
＞ node $id="336855581" paas-controller-3
＞ 
＞ primitive apigateway ocf:heartbeat:apigateway \
＞ 
＞ op monitor interval="2s" timeout="20s" on-fail="restart" \
＞ 
＞ op stop interval="0" timeout="200s" on-fail="restart" \
＞ 
＞ op start interval="0" timeout="h" on-fail="restart"
＞ 
＞ primitive apigateway_vip ocf:heartbeat:IPaddr2 \
＞ 
＞ params ip="20.20.2.7" cidr_netmask="24" \
＞ 
＞ op start interval="0" timeout="20" \
＞ 
＞ op stop interval="0" timeout="20" \
＞ 
＞ op monitor timeout="20s" interval="2s" depth="0"
＞ 
＞ primitive router ocf:heartbeat:router \
＞ 
＞ op monitor interval="2s" timeout="20s" on-fail="restart" \
＞ 
＞ op stop interval="0" timeout="200s" on-fail="restart" \
＞ 
＞ op start interval="0" timeout="h" on-fail="restart"
＞ 
＞ primitive router_vip ocf:heartbeat:IPaddr2 \
＞ 
＞ params ip="10.10.1.7" cidr_netmask="24" \
＞ 
＞ op start interval="0" timeout="20" \
＞ 
＞ op stop interval="0" timeout="20" \
＞ 
＞ op monitor timeout="20s" interval="2s" depth="0"
＞ 
＞ primitive sdclient ocf:heartbeat:sdclient \
＞ 
＞ op monitor interval="2s" timeout="20s" on-fail="restart" \
＞ 
＞ op stop interval="0" timeout="200s" on-fail="restart" \
＞ 
＞ op start interval="0" timeout="h" on-fail="restart"
＞ 
＞ primitive sdclient_vip ocf:heartbeat:IPaddr2 \
＞ 
＞ params ip="10.10.1.8" cidr_netmask="24" \
＞ 
＞ op start interval="0" timeout="20" \
＞ 
＞ op stop interval="0" timeout="20" \
＞ 
＞ op monitor timeout="20s" interval="2s" depth="0"
＞ 
＞ clone apigateway_rep apigateway
＞ 
＞ clone router_rep router
＞ 
＞ clone sdclient_rep sdclient
＞ 
＞ colocation apigateway_colo +inf: apigateway_vip apigateway_rep:Started
＞ 
＞ colocation router_colo +inf: router_vip router_rep:Started
＞ 
＞ colocation sdclient_colo +inf: sdclient_vip sdclient_rep:Started
＞ 
＞ property $id="cib-bootstrap-options" \
＞ 
＞ dc-version="1.1.10-42f2063" \
＞ 
＞ cluster-infrastructure="corosync" \
＞ 
＞ stonith-enabled="false" \
＞ 
＞ no-quorum-policy="stop" \
＞ 
＞ start-failure-is-fatal="false" \
＞ 
＞ last-lrm-refresh="1486981647"
＞ 
＞ op_defaults $id="op_defaults-options" \
＞ 
＞ on-fail="restart"
＞ 
＞ 
＞ 
＞ 
＞ 
＞ 原始邮件
＞ *发件人：*何海龙10164561
＞ *收件人：*＜kgail...@redhat.com＞
＞ *抄送人：*＜users@clusterlabs.org＞
＞ *日 期 ：*2017年02月15日 10:54
＞ *主 题 ：**答复: Re: 答复: Re: [ClusterLabs] clone resource not get
＞ restarted on fail*
＞ 
＞ 
＞ Is

[ClusterLabs] 答复: Re: 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-15 Thread he.hailong5

please note that everything works fine when there is only one clone resource 
configured, the resource will get restarted and the vip will get moved. 

anyway, I will check my ocfs again.














原始邮件



发件人： ＜kgail...@redhat.com＞
收件人：何海龙10164561
抄送人： ＜users@clusterlabs.org＞
日 期 ：2017年02月15日 23:22
主 题 ：Re: 答复: Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail





On 02/15/2017 03:57 AM, he.hailo...@zte.com.cn wrote:
＞ I just tried using colocation, it dosen't work.
＞ 
＞ 
＞ I failed the node paas-controller-3, but sdclient_vip didn't get moved:

The colocation would work, but the problem you're having with router and
apigateway is preventing it from getting that far. In other words,
router and apigateway are still running on the node (they have not been
successfully stopped), so the colocation is still valid.

I suspect that the return codes from your custom resource agents may be
the issue. Make sure that your agents conform to these guidelines:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

In particular, "start" should not return until a monitor operation would
return success, "stop" should not return until a monitor would return
"not running", and "monitor" should return "not running" if called on a
host where the service hasn't started yet. Be sure you are returning the
proper OCF_* codes according to the table in the link above.

If the documentation is unclear, please ask here about anything you are
unsure of.

＞ 
＞ Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
＞ 
＞ 
＞  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
＞ 
＞  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
＞ 
＞  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
＞ 
＞  Clone Set: sdclient_rep [sdclient]
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 ]
＞ 
＞  Stopped: [ paas-controller-3 ]
＞ 
＞  Clone Set: router_rep [router]
＞ 
＞  router (ocf::heartbeat:router):Started
＞ paas-controller-3 FAILED 
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 ]
＞ 
＞  Clone Set: apigateway_rep [apigateway]
＞ 
＞  apigateway (ocf::heartbeat:apigateway):Started
＞ paas-controller-3 FAILED 
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 ]
＞ 
＞ 
＞ here is the configuration:
＞ 
＞ ＞crm configure show
＞ 
＞ node $id="336855579" paas-controller-1
＞ 
＞ node $id="336855580" paas-controller-2
＞ 
＞ node $id="336855581" paas-controller-3
＞ 
＞ primitive apigateway ocf:heartbeat:apigateway \
＞ 
＞ op monitor interval="2s" timeout="20s" on-fail="restart" \
＞ 
＞ op stop interval="0" timeout="200s" on-fail="restart" \
＞ 
＞ op start interval="0" timeout="h" on-fail="restart"
＞ 
＞ primitive apigateway_vip ocf:heartbeat:IPaddr2 \
＞ 
＞ params ip="20.20.2.7" cidr_netmask="24" \
＞ 
＞ op start interval="0" timeout="20" \
＞ 
＞ op stop interval="0" timeout="20" \
＞ 
＞ op monitor timeout="20s" interval="2s" depth="0"
＞ 
＞ primitive router ocf:heartbeat:router \
＞ 
＞ op monitor interval="2s" timeout="20s" on-fail="restart" \
＞ 
＞ op stop interval="0" timeout="200s" on-fail="restart" \
＞ 
＞ op start interval="0" timeout="h" on-fail="restart"
＞ 
＞ primitive router_vip ocf:heartbeat:IPaddr2 \
＞ 
＞ params ip="10.10.1.7" cidr_netmask="24" \
＞ 
＞ op start interval="0" timeout="20" \
＞ 
＞ op stop interval="0" timeout="20" \
＞ 
＞ op monitor timeout="20s" interval="2s" depth="0"
＞ 
＞ primitive sdclient ocf:heartbeat:sdclient \
＞ 
＞ op monitor interval="2s" timeout="20s" on-fail="restart" \
＞ 
＞ op stop interval="0" timeout="200s" on-fail="restart" \
＞ 
＞ op start interval="0" timeout="h" on-fail="restart"
＞ 
＞ primitive sdclient_vip ocf:heartbeat:IPaddr2 \
＞ 
＞ params ip="10.10.1.8" cidr_netmask="24" \
＞ 
＞ op start interval="0" timeout="20" \
＞ 
＞ op stop interval="0" timeout="20" \
＞ 
＞ op monitor timeout="20s" interval="2s" depth="0"
＞ 
＞ clone apigateway_rep apigateway
＞ 
＞ clone router_rep router
＞ 
＞ clone sdclient_rep sdclient
＞ 
＞ colocation apigateway_colo +inf: apigateway_vip apigateway_rep:Started
＞ 
＞ colocation router_colo +inf: router_vip router_rep:Started
＞ 
＞ colocation sdclient_colo +inf: sdclient_vip sdclient_rep:Started
＞ 
＞ property $id="cib-bootstrap-options" \
＞ 
＞ dc-version="1.1.10-42f2063" \
＞ 
＞ cluster-infrastructure="corosync" \
＞ 
＞ stonith-enabled="false" \
＞ 
＞ no-quorum-policy="stop" \
＞ 
＞ start-failure-is-fatal="false" \
＞ 
＞ last-lrm-refresh="1486981647"
＞ 
＞ op_defaults $id="op_defaults-options" \
＞ 
＞ on-fail="restart"
＞ 
＞ 
＞ 
＞ 
＞ 
＞ 原始邮件
＞ *发件人：*何海龙10164561
＞ *收件人：*＜kgail...@redhat.com＞
＞ *抄送人：*＜users@clusterlabs.org＞
＞ *日 期 ：*2017年02月15日 10:54
＞ *主 题 ：**答复: Re: 答复: Re: [ClusterLabs] clone

[ClusterLabs] 答复: 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-15 Thread he.hailong5

I switch back to "location"  tonight to continue with the testing, at least 
sometimes the vip is moving..

I mentioned earlier, with "location", only one clone resource would get 
restarted, the other two would not,,,but just now, all 3 clone resources get 
restarted and the vips get moved as expected..




Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-2 paas-controller-3 ]

 Stopped: [ paas-controller-1 ]

 Clone Set: router_rep [router]

 Started: [ paas-controller-2 paas-controller-3 ]

 Stopped: [ paas-controller-1 ]

 Clone Set: apigateway_rep [apigateway]

 Started: [ paas-controller-2 paas-controller-3 ]

 Stopped: [ paas-controller-1 ]




I really get lost...so, please tell me I may shoot some known/unknown bug.


again, I am using




Pacemaker 1.1.10

Corosync Cluster Engine ('2.3.3')




Please let me know what's the latest stable version?







原始邮件



发件人：何海龙10164561
收件人： ＜kgail...@redhat.com＞
抄送人： ＜users@clusterlabs.org＞
日 期 ：2017年02月15日 17:59
主 题 ：[ClusterLabs] 答复: Re: 答复: Re:  clone resource not get restarted on fail






I just tried using colocation, it dosen't work.




I failed the node paas-controller-3, but sdclient_vip didn't get moved:




Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-1 paas-controller-2 ]

 Stopped: [ paas-controller-3 ]

 Clone Set: router_rep [router]

 router (ocf::heartbeat:router):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]

 Clone Set: apigateway_rep [apigateway]

 apigateway (ocf::heartbeat:apigateway):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]




here is the configuration:

＞crm configure show

node $id="336855579" paas-controller-1

node $id="336855580" paas-controller-2

node $id="336855581" paas-controller-3

primitive apigateway ocf:heartbeat:apigateway \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive apigateway_vip ocf:heartbeat:IPaddr2 \

params ip="20.20.2.7" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

primitive router ocf:heartbeat:router \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive router_vip ocf:heartbeat:IPaddr2 \

params ip="10.10.1.7" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

primitive sdclient ocf:heartbeat:sdclient \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive sdclient_vip ocf:heartbeat:IPaddr2 \

params ip="10.10.1.8" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

clone apigateway_rep apigateway

clone router_rep router

clone sdclient_rep sdclient

colocation apigateway_colo +inf: apigateway_vip apigateway_rep:Started

colocation router_colo +inf: router_vip router_rep:Started

colocation sdclient_colo +inf: sdclient_vip sdclient_rep:Started

property $id="cib-bootstrap-options" \

dc-version="1.1.10-42f2063" \

cluster-infrastructure="corosync" \

stonith-enabled="false" \

no-quorum-policy="stop" \

start-failure-is-fatal="false" \

last-lrm-refresh="1486981647"

op_defaults $id="op_defaults-options" \

on-fail="restart"






















发件人：何海龙10164561
收件人： ＜kgail...@redhat.com＞
抄送人： ＜users@clusterlabs.org＞
日 期 ：2017年02月15日 10:54
主 题 ：答复: Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail






Is there a reason not to use a colocation constraint instead? If X_vipis 
colocated with X, it will be moved if X fails.
[hhl]: the movement should take place as well if X stopped (the start is 
on-going). I don't know if

[ClusterLabs] 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-15 Thread he.hailong5

I just tried using colocation, it dosen't work.




I failed the node paas-controller-3, but sdclient_vip didn't get moved:




Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-1 paas-controller-2 ]

 Stopped: [ paas-controller-3 ]

 Clone Set: router_rep [router]

 router (ocf::heartbeat:router):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]

 Clone Set: apigateway_rep [apigateway]

 apigateway (ocf::heartbeat:apigateway):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]




here is the configuration:

＞crm configure show

node $id="336855579" paas-controller-1

node $id="336855580" paas-controller-2

node $id="336855581" paas-controller-3

primitive apigateway ocf:heartbeat:apigateway \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive apigateway_vip ocf:heartbeat:IPaddr2 \

params ip="20.20.2.7" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

primitive router ocf:heartbeat:router \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive router_vip ocf:heartbeat:IPaddr2 \

params ip="10.10.1.7" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

primitive sdclient ocf:heartbeat:sdclient \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive sdclient_vip ocf:heartbeat:IPaddr2 \

params ip="10.10.1.8" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

clone apigateway_rep apigateway

clone router_rep router

clone sdclient_rep sdclient

colocation apigateway_colo +inf: apigateway_vip apigateway_rep:Started

colocation router_colo +inf: router_vip router_rep:Started

colocation sdclient_colo +inf: sdclient_vip sdclient_rep:Started

property $id="cib-bootstrap-options" \

dc-version="1.1.10-42f2063" \

cluster-infrastructure="corosync" \

stonith-enabled="false" \

no-quorum-policy="stop" \

start-failure-is-fatal="false" \

last-lrm-refresh="1486981647"

op_defaults $id="op_defaults-options" \

on-fail="restart"













原始邮件



发件人：何海龙10164561
收件人： ＜kgail...@redhat.com＞
抄送人： ＜users@clusterlabs.org＞
日 期 ：2017年02月15日 10:54
主 题 ：答复: Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail






Is there a reason not to use a colocation constraint instead? If X_vipis 
colocated with X, it will be moved if X fails.
[hhl]: the movement should take place as well if X stopped (the start is 
on-going). I don't know if the colocation would satisfy this requirement.
I don't see any reason in your configuration why the services wouldn'tbe 
restarted. It's possible the cluster tried to restart the service,but the stop 
action failed. Since you have stonith disabled, the clustercan't recover from a 
failed stop action.




[hhl]: the ocf logs showed the pacemaker never entered the stop function in 
this case.Is there a reason you disabled quorum? With 3 nodes, if they get 
splitinto groups of 1 node and 2 nodes, quorum is what keeps the groups 
fromboth starting all resources.




[hhl]: I enabled the quorum and had a retry, the same happens.

b.t.w, I repeat sevaral times today, and found when I trigger the condition on 
one node that would fail all the clone resources, only one would get restared, 
the other two would fail to restart.




＞ trigger the failure conditon on paas-controller-1




Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-2 paas-controller-3 ]

 Stopped: [ paas-controller-1 ]

 Clone Set: router_rep [router]

 router (ocf::heartbeat:router):

[ClusterLabs] 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-14 Thread he.hailong5

Is there a reason not to use a colocation constraint instead? If X_vipis 
colocated with X, it will be moved if X fails.
[hhl]: the movement should take place as well if X stopped (the start is 
on-going). I don't know if the colocation would satisfy this requirement.
I don't see any reason in your configuration why the services wouldn'tbe 
restarted. It's possible the cluster tried to restart the service,but the stop 
action failed. Since you have stonith disabled, the clustercan't recover from a 
failed stop action.




[hhl]: the ocf logs showed the pacemaker never entered the stop function in 
this case.Is there a reason you disabled quorum? With 3 nodes, if they get 
splitinto groups of 1 node and 2 nodes, quorum is what keeps the groups 
fromboth starting all resources.




[hhl]: I enabled the quorum and had a retry, the same happens.

b.t.w, I repeat sevaral times today, and found when I trigger the condition on 
one node that would fail all the clone resources, only one would get restared, 
the other two would fail to restart.




＞ trigger the failure conditon on paas-controller-1




Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-2 paas-controller-3 ]

 Stopped: [ paas-controller-1 ]

 Clone Set: router_rep [router]

 router (ocf::heartbeat:router):Started paas-controller-1 
FAILED 

 Started: [ paas-controller-2 paas-controller-3 ]

 Clone Set: apigateway_rep [apigateway]

 apigateway (ocf::heartbeat:apigateway):Started paas-controller-1 
FAILED 

 Started: [ paas-controller-2 paas-controller-3 ]

 


＞ trigger the failure conditon on paas-controller-3



Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 Clone Set: sdclient_rep [sdclient]

 sdclient   (ocf::heartbeat:sdclient):  Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]

 Clone Set: router_rep [router]

 Started: [ paas-controller-1 paas-controller-2 ]

 Stopped: [ paas-controller-3 ]

 Clone Set: apigateway_rep [apigateway]

 apigateway (ocf::heartbeat:apigateway):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]










原始邮件



发件人： ＜kgail...@redhat.com＞
收件人：何海龙10164561
抄送人： ＜users@clusterlabs.org＞
日 期 ：2017年02月15日 06:14
主 题 ：Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail





On 02/13/2017 07:08 PM, he.hailo...@zte.com.cn wrote:
＞ Hi,
＞ 
＞ 
＞ ＞ crm configure show
＞ 
＞ + crm configure show
＞ 
＞ node $id="336855579" paas-controller-1
＞ 
＞ node $id="336855580" paas-controller-2
＞ 
＞ node $id="336855581" paas-controller-3
＞ 
＞ primitive apigateway ocf:heartbeat:apigateway \
＞ 
＞ op monitor interval="2s" timeout="20s" on-fail="restart" \
＞ 
＞ op stop interval="0" timeout="200s" on-fail="restart" \
＞ 
＞ op start interval="0" timeout="h" on-fail="restart"
＞ 
＞ primitive apigateway_vip ocf:heartbeat:IPaddr2 \
＞ 
＞ params ip="20.20.2.7" cidr_netmask="24" \
＞ 
＞ op start interval="0" timeout="20" \
＞ 
＞ op stop interval="0" timeout="20" \
＞ 
＞ op monitor timeout="20s" interval="2s" depth="0"
＞ 
＞ primitive router ocf:heartbeat:router \
＞ 
＞ op monitor interval="2s" timeout="20s" on-fail="restart" \
＞ 
＞ op stop interval="0" timeout="200s" on-fail="restart" \
＞ 
＞ op start interval="0" timeout="h" on-fail="restart"
＞ 
＞ primitive router_vip ocf:heartbeat:IPaddr2 \
＞ 
＞ params ip="10.10.1.7" cidr_netmask="24" \
＞ 
＞ op start interval="0" timeout="20" \
＞ 
＞ op stop interval="0" timeout="20" \
＞ 
＞ op monitor timeout="20s" interval="2s" depth="0"
＞ 
＞ primitive sdclient ocf:heartbeat:sdclient \
＞ 
＞ op monitor interval="2s" timeout="20s" on-fail="restart" \
＞ 
＞ op stop interval="0" timeout="200s" on-fail="restart" \
＞ 
＞ op start interval="0" timeout="h" on-fail="restart"
＞ 
＞ primitive sdclient_vip ocf:heartbeat:IPaddr2 \
＞ 
＞ params ip="10.10.1.8" cidr_netmask="24" \
＞ 
＞ op start interval="0" timeout="20" \
＞ 
＞ op stop interval="0" timeout="20" \
＞ 
＞ op monitor timeout="20s" interval="2s" depth="0"
＞ 
＞ clone apigateway_rep apigateway
＞ 
＞ clone router_rep router
＞ 
＞ clone sdclient_rep sdclient
＞ 
＞ location apigateway_loc apigateway_vip \
＞ 
＞ rule $id="apigateway_loc-rule" +inf: apigateway_workable eq 1
＞ 
＞ location router_loc

[ClusterLabs] 答复: Re: clone resource not get restarted on fail

2017-02-13 Thread he.hailong5

Hi,




＞ crm configure show

+ crm configure show

node $id="336855579" paas-controller-1

node $id="336855580" paas-controller-2

node $id="336855581" paas-controller-3

primitive apigateway ocf:heartbeat:apigateway \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive apigateway_vip ocf:heartbeat:IPaddr2 \

params ip="20.20.2.7" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

primitive router ocf:heartbeat:router \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive router_vip ocf:heartbeat:IPaddr2 \

params ip="10.10.1.7" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

primitive sdclient ocf:heartbeat:sdclient \

op monitor interval="2s" timeout="20s" on-fail="restart" \

op stop interval="0" timeout="200s" on-fail="restart" \

op start interval="0" timeout="h" on-fail="restart"

primitive sdclient_vip ocf:heartbeat:IPaddr2 \

params ip="10.10.1.8" cidr_netmask="24" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor timeout="20s" interval="2s" depth="0"

clone apigateway_rep apigateway

clone router_rep router

clone sdclient_rep sdclient

location apigateway_loc apigateway_vip \

rule $id="apigateway_loc-rule" +inf: apigateway_workable eq 1

location router_loc router_vip \

rule $id="router_loc-rule" +inf: router_workable eq 1

location sdclient_loc sdclient_vip \

rule $id="sdclient_loc-rule" +inf: sdclient_workable eq 1

property $id="cib-bootstrap-options" \

dc-version="1.1.10-42f2063" \

cluster-infrastructure="corosync" \

stonith-enabled="false" \

no-quorum-policy="ignore" \

start-failure-is-fatal="false" \

last-lrm-refresh="1486981647"

op_defaults $id="op_defaults-options" \

on-fail="restart"

-




and B.T.W, I am using "crm_attribute -N $HOSTNAME -q -l reboot --name 
＜prefix＞_workable -v ＜1 or 0＞" in the monitor to update the transient 
attributes, which control the vip location.

and also found, the vip resource won't get moved if the related clone resource 
failed to restart.







原始邮件



发件人： ＜kgail...@redhat.com＞
收件人： ＜users@clusterlabs.org＞
日 期 ：2017年02月13日 23:04
主 题 ：Re: [ClusterLabs] clone resource not get restarted on fail





On 02/13/2017 07:57 AM, he.hailo...@zte.com.cn wrote:
＞ Pacemaker 1.1.10
＞ 
＞ Corosync 2.3.3
＞ 
＞ 
＞ this is a 3 nodes cluster configured with 3 clone resources, each
＞ attached wih a vip resource of IPAddr2:
＞ 
＞ 
＞ ＞crm status
＞ 
＞ 
＞ Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
＞ 
＞ 
＞  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
＞ 
＞  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
＞ 
＞  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
＞ 
＞  Clone Set: sdclient_rep [sdclient]
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
＞ 
＞  Clone Set: router_rep [router]
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
＞ 
＞  Clone Set: apigateway_rep [apigateway]
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
＞ 
＞ 
＞ It is observed that sometimes the clone resource is stuck to monitor
＞ when the service fails:
＞ 
＞ 
＞  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
＞ 
＞  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
＞ 
＞  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
＞ 
＞  Clone Set: sdclient_rep [sdclient]
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 ]
＞ 
＞  Stopped: [ paas-controller-3 ]
＞ 
＞  Clone Set: router_rep [router]
＞ 
＞  router (ocf::heartbeat:router):Started
＞ paas-controller-3 FAILED 
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 ]
＞ 
＞  Clone Set: apigateway_rep [apigateway]
＞ 
＞  apigateway (ocf::heartbeat:apigateway):Started
＞ paas-controller-3 FAILED 
＞ 
＞  Started: [ paas-controller-1 paas-controller-2 ]
＞ 
＞ 
＞ in the example above. the sdclient_rep get restarted on node 3, while
＞ the other two hang at monitoring on node 3, here are the ocf logs:
＞ 
＞ 
＞ abnormal (apigateway_rep):
＞ 
＞ 2017-02-13 18:27:53 [23586]===print_log test_monitor run_func main===
＞ Starting health check.
＞ 
＞

[ClusterLabs] clone resource not get restarted on fail

2017-02-13 Thread he.hailong5

Pacemaker 1.1.10

Corosync 2.3.3




this is a 3 nodes cluster configured with 3 clone resources, each attached wih 
a vip resource of IPAddr2:




＞crm status




Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]

 Clone Set: router_rep [router]

 Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]

 Clone Set: apigateway_rep [apigateway]

 Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




It is observed that sometimes the clone resource is stuck to monitor when the 
service fails:




 router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 

 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-1 paas-controller-2 ]

 Stopped: [ paas-controller-3 ]

 Clone Set: router_rep [router]

 router (ocf::heartbeat:router):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]

 Clone Set: apigateway_rep [apigateway]

 apigateway (ocf::heartbeat:apigateway):Started paas-controller-3 
FAILED 

 Started: [ paas-controller-1 paas-controller-2 ]




in the example above. the sdclient_rep get restarted on node 3, while the other 
two hang at monitoring on node 3, here are the ocf logs:




abnormal (apigateway_rep):


2017-02-13 18:27:53 [23586]===print_log test_monitor run_func main=== Starting 
health check.

2017-02-13 18:27:53 [23586]===print_log test_monitor run_func main=== health 
check succeed.

2017-02-13 18:27:55 [24010]===print_log test_monitor run_func main=== Starting 
health check.

2017-02-13 18:27:55 [24010]===print_log test_monitor run_func main=== Failed: 
docker daemon is not running.

2017-02-13 18:27:57 [24095]===print_log test_monitor run_func main=== Starting 
health check.

2017-02-13 18:27:57 [24095]===print_log test_monitor run_func main=== Failed: 
docker daemon is not running.

2017-02-13 18:27:59 [24159]===print_log test_monitor run_func main=== Starting 
health check.

2017-02-13 18:27:59 [24159]===print_log test_monitor run_func main=== Failed: 
docker daemon is not running.




normal (sdclient_rep):

2017-02-13 18:27:52 [23507]===print_log sdclient_monitor run_func main=== 
health check succeed.

2017-02-13 18:27:54 [23630]===print_log sdclient_monitor run_func main=== 
Starting health check.

2017-02-13 18:27:54 [23630]===print_log sdclient_monitor run_func main=== 
Failed: docker daemon is not running.

2017-02-13 18:27:55 [23710]===print_log sdclient_stop run_func main=== Starting 
stop the container.

2017-02-13 18:27:55 [23710]===print_log sdclient_stop run_func main=== docker 
daemon lost, pretend stop succeed.

2017-02-13 18:27:55 [23763]===print_log sdclient_start run_func main=== 
Starting run the container.

2017-02-13 18:27:55 [23763]===print_log sdclient_start run_func main=== docker 
daemon lost, try again in 5 secs.

2017-02-13 18:28:00 [23763]===print_log sdclient_start run_func main=== docker 
daemon lost, try again in 5 secs.

2017-02-13 18:28:05 [23763]===print_log sdclient_start run_func main=== docker 
daemon lost, try again in 5 secs.




If I disable 2 clone resource, the switch over test for one clone resource 
works as expected: fail the service -＞ monitor fails -＞ stop -＞ start




Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]




 sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-2 

 Clone Set: sdclient_rep [sdclient]

 Started: [ paas-controller-1 paas-controller-2 ]

 Stopped: [ paas-controller-3 ]




what's the reason behind___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] pacemaker monitor fails didn't trigger stop and start

2017-02-12 Thread he.hailong5

Hi,





I am using Pacemaker 1.1.10 and observe that the failure of monitor operation 
not always trigger stop and start, instead it keeps monitoring.


is there any exception or implication that would make the pacemaker behave like 
this?





BR,


Allen___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] are there equivelent restful apis for crm commands

[ClusterLabs] question about ocf metadata actions

[ClusterLabs] 答复: Re: 答复: Re: 答复: Re: clone resource not get restarted on fail

[ClusterLabs] 答复: Re: 答复: Re: 答复: Re: clone resource not get restarted on fail

[ClusterLabs] 答复: 答复: Re: 答复: Re: clone resource not get restarted on fail

[ClusterLabs] 答复: Re: 答复: Re: clone resource not get restarted on fail

[ClusterLabs] 答复: Re: 答复: Re: clone resource not get restarted on fail

[ClusterLabs] 答复: Re: clone resource not get restarted on fail

[ClusterLabs] clone resource not get restarted on fail

[ClusterLabs] pacemaker monitor fails didn't trigger stop and start

10 matches

Site Navigation

Mail list logo

Footer information