Re: [Linux-HA] How do I clear the Failed actions section?
Am 07.03.2012 um 18:01 schrieb Florian Haas: On Wed, Mar 7, 2012 at 5:51 PM, William Seligman selig...@nevis.columbia.edu wrote: Again, a disclaimer: I am not an expert. Your advice was spot on. :) But what to do, if cleanup is not working? And everything is running: # crm status Last updated: Thu Mar 8 12:27:00 2012 Stack: Heartbeat Current DC: xen10 (5ab5ba3d-3be5-4763-83e7-90aaa49361a6) - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, unknown expected votes 12 Resources configured. Online: [ xen10 xen11 ] xen_www (ocf::heartbeat:Xen): Started xen11 Master/Slave Set: DrbdClone1 Masters: [ xen11 ] Slaves: [ xen10 ] xen_typo3 (ocf::heartbeat:Xen): Started xen11 xen_shopdb(ocf::heartbeat:Xen): Started xen10 xen_admintool (ocf::heartbeat:Xen): Started xen11 xen_cmsdb (ocf::heartbeat:Xen): Started xen11 Master/Slave Set: DrbdClone2 Resource Group: group_drbd2:0 xen_drbd2_1:0 (ocf::linbit:drbd): Slave xen10 (unmanaged) FAILED xen_drbd2_2:0 (ocf::linbit:drbd): Stopped Masters: [ xen11 ] Master/Slave Set: DrbdClone3 Masters: [ xen10 ] Slaves: [ xen11 ] Master/Slave Set: DrbdClone5 Masters: [ xen11 ] Slaves: [ xen10 ] Master/Slave Set: DrbdClone6 Slaves: [ xen11 xen10 ] Master/Slave Set: DrbdClone4 Masters: [ xen11 ] Slaves: [ xen10 ] Failed actions: xen_cmsdb_monitor_3000 (node=xen10, call=571, rc=7, status=complete): not running xen_drbd1_2:1_promote_0 (node=xen10, call=5205, rc=1, status=complete): unknown error xen_drbd2_1:1_promote_0 (node=xen10, call=790, rc=1, status=complete): unknown error xen_ns2_monitor_3000 (node=xen10, call=601, rc=7, status=complete): not running xen_drbd3_1:1_promote_0 (node=xen10, call=383, rc=-2, status=Timed Out): unknown exec error xen_drbd2_1:0_promote_0 (node=xen10, call=1326, rc=-2, status=Timed Out): unknown exec error xen_drbd2_1:0_stop_0 (node=xen10, call=1348, rc=-2, status=Timed Out): unknown exec error xen11:# crm resource cleanup xen_drbd2_1 Error performing operation: The object/attribute does not exist Error performing operation: The object/attribute does not exist # xm list NameID Mem VCPUs State Time(s) Domain-0 0 100516 r- 40648.5 admintool5 4096 2 - b 7455.4 cmsdb3 2048 2 - b 2106.5 typo32 1024 2 - b 2890.9 www 1 1024 1 - b855.0 xen11:# drbdadm status drbd-status version=8.3.7 api=88 resources config_file=/etc/drbd.conf resource minor=1 name=drbd1_1 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=2 name=drbd1_2 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=3 name=drbd2_1 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=4 name=drbd2_2 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=5 name=drbd3_1 cs=Connected ro1=Secondary ro2=Primary ds1=UpToDate ds2=UpToDate / resource minor=6 name=drbd3_2 cs=Connected ro1=Secondary ro2=Primary ds1=UpToDate ds2=UpToDate / resource minor=7 name=drbd4_1 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=8 name=drbd4_2 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=9 name=drbd5_1 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=10 name=drbd5_2 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=11 name=drbd6_1 cs=StandAlone ro1=Secondary ro2=Unknown ds1=Outdated ds2=DUnknown / resource minor=12 name=drbd6_2 cs=StandAlone ro1=Secondary ro2=Unknown ds1=Outdated ds2=DUnknown / !-- resource minor=13 name=drbd7_1 not available or not yet created -- !-- resource minor=14 name=drbd7_2 not available or not yet created -- !-- resource minor=15 name=drbd8_1 not available or not yet created -- !-- resource minor=16 name=drbd8_2 not available or not yet created -- /resources /drbd-status Helmut Wollmersdorfer ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How do I clear the Failed actions section?
On 3/8/12 6:53 AM, Helmut Wollmersdorfer wrote: Am 07.03.2012 um 18:01 schrieb Florian Haas: On Wed, Mar 7, 2012 at 5:51 PM, William Seligman selig...@nevis.columbia.edu wrote: Again, a disclaimer: I am not an expert. Your advice was spot on. :) But what to do, if cleanup is not working? And everything is running: # crm status Last updated: Thu Mar 8 12:27:00 2012 Stack: Heartbeat Current DC: xen10 (5ab5ba3d-3be5-4763-83e7-90aaa49361a6) - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, unknown expected votes 12 Resources configured. Online: [ xen10 xen11 ] xen_www (ocf::heartbeat:Xen): Started xen11 Master/Slave Set: DrbdClone1 Masters: [ xen11 ] Slaves: [ xen10 ] xen_typo3(ocf::heartbeat:Xen): Started xen11 xen_shopdb (ocf::heartbeat:Xen): Started xen10 xen_admintool(ocf::heartbeat:Xen): Started xen11 xen_cmsdb(ocf::heartbeat:Xen): Started xen11 Master/Slave Set: DrbdClone2 Resource Group: group_drbd2:0 xen_drbd2_1:0(ocf::linbit:drbd): Slave xen10 (unmanaged) FAILED xen_drbd2_2:0(ocf::linbit:drbd): Stopped Masters: [ xen11 ] Master/Slave Set: DrbdClone3 Masters: [ xen10 ] Slaves: [ xen11 ] Master/Slave Set: DrbdClone5 Masters: [ xen11 ] Slaves: [ xen10 ] Master/Slave Set: DrbdClone6 Slaves: [ xen11 xen10 ] Master/Slave Set: DrbdClone4 Masters: [ xen11 ] Slaves: [ xen10 ] Failed actions: xen_cmsdb_monitor_3000 (node=xen10, call=571, rc=7, status=complete): not running xen_drbd1_2:1_promote_0 (node=xen10, call=5205, rc=1, status=complete): unknown error xen_drbd2_1:1_promote_0 (node=xen10, call=790, rc=1, status=complete): unknown error xen_ns2_monitor_3000 (node=xen10, call=601, rc=7, status=complete): not running xen_drbd3_1:1_promote_0 (node=xen10, call=383, rc=-2, status=Timed Out): unknown exec error xen_drbd2_1:0_promote_0 (node=xen10, call=1326, rc=-2, status=Timed Out): unknown exec error xen_drbd2_1:0_stop_0 (node=xen10, call=1348, rc=-2, status=Timed Out): unknown exec error xen11:# crm resource cleanup xen_drbd2_1 Error performing operation: The object/attribute does not exist Error performing operation: The object/attribute does not exist Given the list of resources displayed by crm_mon, the command you need is crm resource cleanup DrbdClone2 I can't say whether that will fix your problems, but you won't get the does not exist message. Somewhere in either Pacemaker Explained or Clusters From Scratch, it says that once you clone or ms a resource, you can't refer to that resource as an individual anymore; you have to use the clone/ms name. What I did when faced with a problem like yours is cat /proc/drbd, look at the lines for the failed drbd, and fix it on my own. Then I'd type the cleanup command for pacemaker to pick up the current state of the resource. # xm list NameID Mem VCPUs State Time(s) Domain-0 0 100516 r- 40648.5 admintool5 4096 2 - b 7455.4 cmsdb3 2048 2 - b 2106.5 typo32 1024 2 - b 2890.9 www 1 1024 1 - b855.0 xen11:# drbdadm status drbd-status version=8.3.7 api=88 resources config_file=/etc/drbd.conf resource minor=1 name=drbd1_1 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=2 name=drbd1_2 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=3 name=drbd2_1 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=4 name=drbd2_2 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=5 name=drbd3_1 cs=Connected ro1=Secondary ro2=Primary ds1=UpToDate ds2=UpToDate / resource minor=6 name=drbd3_2 cs=Connected ro1=Secondary ro2=Primary ds1=UpToDate ds2=UpToDate / resource minor=7 name=drbd4_1 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=8 name=drbd4_2 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=9 name=drbd5_1 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=10 name=drbd5_2 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=11 name=drbd6_1 cs=StandAlone ro1=Secondary ro2=Unknown ds1=Outdated ds2=DUnknown / resource minor=12 name=drbd6_2 cs=StandAlone ro1=Secondary ro2=Unknown ds1=Outdated ds2=DUnknown / !-- resource minor=13 name=drbd7_1 not available or not yet created -- !-- resource minor=14 name=drbd7_2 not available or not yet created -- !-- resource minor=15 name=drbd8_1 not available or
Re: [Linux-HA] How do I clear the Failed actions section?
Am 08.03.2012 um 13:33 schrieb William Seligman: On 3/8/12 6:53 AM, Helmut Wollmersdorfer wrote: [...] Master/Slave Set: DrbdClone2 Resource Group: group_drbd2:0 xen_drbd2_1:0 (ocf::linbit:drbd): Slave xen10 (unmanaged) FAILED xen_drbd2_2:0 (ocf::linbit:drbd): Stopped Masters: [ xen11 ] [...] xen_drbd2_1:1_promote_0 (node=xen10, call=790, rc=1, status=complete): unknown error [...] xen_drbd2_1:0_promote_0 (node=xen10, call=1326, rc=-2, status=Timed Out): unknown exec error xen_drbd2_1:0_stop_0 (node=xen10, call=1348, rc=-2, status=Timed Out): unknown exec error xen11:# crm resource cleanup xen_drbd2_1 Error performing operation: The object/attribute does not exist Error performing operation: The object/attribute does not exist Given the list of resources displayed by crm_mon, the command you need is crm resource cleanup DrbdClone2 Thx. Works fine. I can't say whether that will fix your problems, but you won't get the does not exist message. Somewhere in either Pacemaker Explained or Clusters From Scratch, it says that once you clone or ms a resource, you can't refer to that resource as an individual anymore; you have to use the clone/ms name. What I did when faced with a problem like yours is cat /proc/drbd, look at the lines for the failed drbd, and fix it on my own. Then I'd type the cleanup command for pacemaker to pick up the current state of the resource. The DRBD-resources are fine (see below). The failed action messages in the CRM seem to get not cleaned sometimes for some reason. resource minor=3 name=drbd2_1 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / resource minor=4 name=drbd2_2 cs=Connected ro1=Primary ro2=Secondary ds1=UpToDate ds2=UpToDate / Thx again Helmut Wollmersdorfer ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How do I clear the Failed actions section?
I would just want to share that the command recommended did NOT move the resource to another node. It basically clears the Failed Actions section. Thanks again, Bill. Regards, j On Tue, Mar 6, 2012 at 11:46 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 2:38 PM, Jerome Yanga wrote: Do you know by chance if that command you have provided bounces the resource? I don't know what you mean by bounce the resource. According to: http://www.clusterlabs.org/doc/crm_cli.html the command refreshes the resource status. Depending on your configuration, it might shift a resource to another node. But I am not an expert! I merely knew how to clear up the error message. On Tue, Mar 6, 2012 at 10:28 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 1:04 PM, Jerome Yanga wrote: crm_mon shows the error below. Failed actions: � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132, rc=-2, status=Timed Out): unknown exec error I have check DRBD and the mirror is connected and uptodate on both nodes. The error above caused the resources to failover and it seems to be working OK. �However, the failed actions section has not disappeared. How do I clear this error? crm resource cleanup drbd0 -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How do I clear the Failed actions section?
On 3/7/12 10:50 AM, Jerome Yanga wrote: I would just want to share that the command recommended did NOT move the resource to another node. It basically clears the Failed Actions section. This is why I was conditional in my response. Suppose you had something like the following: primitive MyResource ocf:heartbeat:Dummy location MyResourcePreferredNode MyResource 10: my-node-a.example.com with no resource-stickiness set. Assume MyResource fails on my-node-a, and is moved to my-node-b. Then if you were to do: crm resource cleanup MyResource pacemaker might move MyResource back to my-node-a. It might even move it back without that example MyResourcePreferredNode constraint. If you want to avoid that, consider per-resource or global resource-stickiness: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05s03s02.html http://www.gossamer-threads.com/lists/linuxha/pacemaker/64076 Again, a disclaimer: I am not an expert. On Tue, Mar 6, 2012 at 11:46 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 2:38 PM, Jerome Yanga wrote: Do you know by chance if that command you have provided bounces the resource? I don't know what you mean by bounce the resource. According to: http://www.clusterlabs.org/doc/crm_cli.html the command refreshes the resource status. Depending on your configuration, it might shift a resource to another node. But I am not an expert! I merely knew how to clear up the error message. On Tue, Mar 6, 2012 at 10:28 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 1:04 PM, Jerome Yanga wrote: crm_mon shows the error below. Failed actions: � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132, rc=-2, status=Timed Out): unknown exec error I have check DRBD and the mirror is connected and uptodate on both nodes. The error above caused the resources to failover and it seems to be working OK. �However, the failed actions section has not disappeared. How do I clear this error? crm resource cleanup drbd0 -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137| Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/ smime.p7s Description: S/MIME Cryptographic Signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How do I clear the Failed actions section?
On Wed, Mar 7, 2012 at 5:51 PM, William Seligman selig...@nevis.columbia.edu wrote: Again, a disclaimer: I am not an expert. Your advice was spot on. :) Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How do I clear the Failed actions section?
On 3/6/12 1:04 PM, Jerome Yanga wrote: crm_mon shows the error below. Failed actions: drbd0:1_monitor_59000 (node=testserver1.example.com, call=132, rc=-2, status=Timed Out): unknown exec error I have check DRBD and the mirror is connected and uptodate on both nodes. The error above caused the resources to failover and it seems to be working OK. However, the failed actions section has not disappeared. How do I clear this error? crm resource cleanup drbd0 -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137| Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/ smime.p7s Description: S/MIME Cryptographic Signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How do I clear the Failed actions section?
Thanks, Bill. Do you know by chance if that command you have provided bounces the resource? Regards, j On Tue, Mar 6, 2012 at 10:28 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 1:04 PM, Jerome Yanga wrote: crm_mon shows the error below. Failed actions: drbd0:1_monitor_59000 (node=testserver1.example.com, call=132, rc=-2, status=Timed Out): unknown exec error I have check DRBD and the mirror is connected and uptodate on both nodes. The error above caused the resources to failover and it seems to be working OK. However, the failed actions section has not disappeared. How do I clear this error? crm resource cleanup drbd0 -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How do I clear the Failed actions section?
Understood. Thanks again, Bill. Regards, j On Tue, Mar 6, 2012 at 11:46 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 2:38 PM, Jerome Yanga wrote: Do you know by chance if that command you have provided bounces the resource? I don't know what you mean by bounce the resource. According to: http://www.clusterlabs.org/doc/crm_cli.html the command refreshes the resource status. Depending on your configuration, it might shift a resource to another node. But I am not an expert! I merely knew how to clear up the error message. On Tue, Mar 6, 2012 at 10:28 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 1:04 PM, Jerome Yanga wrote: crm_mon shows the error below. Failed actions: � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132, rc=-2, status=Timed Out): unknown exec error I have check DRBD and the mirror is connected and uptodate on both nodes. The error above caused the resources to failover and it seems to be working OK. �However, the failed actions section has not disappeared. How do I clear this error? crm resource cleanup drbd0 -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems