Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-08 Thread Helmut Wollmersdorfer

Am 07.03.2012 um 18:01 schrieb Florian Haas:

 On Wed, Mar 7, 2012 at 5:51 PM, William Seligman
 selig...@nevis.columbia.edu wrote:
 Again, a disclaimer: I am not an expert.

 Your advice was spot on. :)

But what to do, if cleanup is not working? And everything is running:

# crm status

Last updated: Thu Mar  8 12:27:00 2012
Stack: Heartbeat
Current DC: xen10 (5ab5ba3d-3be5-4763-83e7-90aaa49361a6) - partition  
with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
12 Resources configured.


Online: [ xen10 xen11 ]

  xen_www   (ocf::heartbeat:Xen):   Started xen11
  Master/Slave Set: DrbdClone1
  Masters: [ xen11 ]
  Slaves: [ xen10 ]
  xen_typo3 (ocf::heartbeat:Xen):   Started xen11
  xen_shopdb(ocf::heartbeat:Xen):   Started xen10
  xen_admintool (ocf::heartbeat:Xen):   Started xen11
  xen_cmsdb (ocf::heartbeat:Xen):   Started xen11
  Master/Slave Set: DrbdClone2
  Resource Group: group_drbd2:0
  xen_drbd2_1:0 (ocf::linbit:drbd): Slave xen10 (unmanaged)  
FAILED
  xen_drbd2_2:0 (ocf::linbit:drbd): Stopped
  Masters: [ xen11 ]
  Master/Slave Set: DrbdClone3
  Masters: [ xen10 ]
  Slaves: [ xen11 ]
  Master/Slave Set: DrbdClone5
  Masters: [ xen11 ]
  Slaves: [ xen10 ]
  Master/Slave Set: DrbdClone6
  Slaves: [ xen11 xen10 ]
  Master/Slave Set: DrbdClone4
  Masters: [ xen11 ]
  Slaves: [ xen10 ]

Failed actions:
 xen_cmsdb_monitor_3000 (node=xen10, call=571, rc=7,  
status=complete): not running
 xen_drbd1_2:1_promote_0 (node=xen10, call=5205, rc=1,  
status=complete): unknown error
 xen_drbd2_1:1_promote_0 (node=xen10, call=790, rc=1,  
status=complete): unknown error
 xen_ns2_monitor_3000 (node=xen10, call=601, rc=7,  
status=complete): not running
 xen_drbd3_1:1_promote_0 (node=xen10, call=383, rc=-2,  
status=Timed Out): unknown exec error
 xen_drbd2_1:0_promote_0 (node=xen10, call=1326, rc=-2,  
status=Timed Out): unknown exec error
 xen_drbd2_1:0_stop_0 (node=xen10, call=1348, rc=-2, status=Timed  
Out): unknown exec error

xen11:# crm resource cleanup xen_drbd2_1
Error performing operation: The object/attribute does not exist
Error performing operation: The object/attribute does not exist

# xm list
NameID   Mem VCPUs   
State   Time(s)
Domain-0 0  100516 r-   
40648.5
admintool5  4096 2 - 
b   7455.4
cmsdb3  2048 2 - 
b   2106.5
typo32  1024 2 - 
b   2890.9
www  1  1024 1 - 
b855.0


xen11:# drbdadm status
drbd-status version=8.3.7 api=88
resources config_file=/etc/drbd.conf
resource minor=1 name=drbd1_1 cs=Connected ro1=Primary  
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=2 name=drbd1_2 cs=Connected ro1=Primary  
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=3 name=drbd2_1 cs=Connected ro1=Primary  
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=4 name=drbd2_2 cs=Connected ro1=Primary  
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=5 name=drbd3_1 cs=Connected ro1=Secondary  
ro2=Primary ds1=UpToDate ds2=UpToDate /
resource minor=6 name=drbd3_2 cs=Connected ro1=Secondary  
ro2=Primary ds1=UpToDate ds2=UpToDate /
resource minor=7 name=drbd4_1 cs=Connected ro1=Primary  
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=8 name=drbd4_2 cs=Connected ro1=Primary  
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=9 name=drbd5_1 cs=Connected ro1=Primary  
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=10 name=drbd5_2 cs=Connected ro1=Primary  
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=11 name=drbd6_1 cs=StandAlone ro1=Secondary  
ro2=Unknown ds1=Outdated ds2=DUnknown /
resource minor=12 name=drbd6_2 cs=StandAlone ro1=Secondary  
ro2=Unknown ds1=Outdated ds2=DUnknown /
!-- resource minor=13 name=drbd7_1 not available or not yet  
created --
!-- resource minor=14 name=drbd7_2 not available or not yet  
created --
!-- resource minor=15 name=drbd8_1 not available or not yet  
created --
!-- resource minor=16 name=drbd8_2 not available or not yet  
created --
/resources
/drbd-status

Helmut Wollmersdorfer


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-08 Thread William Seligman

On 3/8/12 6:53 AM, Helmut Wollmersdorfer wrote:


Am 07.03.2012 um 18:01 schrieb Florian Haas:


On Wed, Mar 7, 2012 at 5:51 PM, William Seligman
selig...@nevis.columbia.edu  wrote:

Again, a disclaimer: I am not an expert.


Your advice was spot on. :)


But what to do, if cleanup is not working? And everything is running:

# crm status

Last updated: Thu Mar  8 12:27:00 2012
Stack: Heartbeat
Current DC: xen10 (5ab5ba3d-3be5-4763-83e7-90aaa49361a6) - partition
with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
12 Resources configured.


Online: [ xen10 xen11 ]

   xen_www  (ocf::heartbeat:Xen):   Started xen11
   Master/Slave Set: DrbdClone1
   Masters: [ xen11 ]
   Slaves: [ xen10 ]
   xen_typo3(ocf::heartbeat:Xen):   Started xen11
   xen_shopdb   (ocf::heartbeat:Xen):   Started xen10
   xen_admintool(ocf::heartbeat:Xen):   Started xen11
   xen_cmsdb(ocf::heartbeat:Xen):   Started xen11
   Master/Slave Set: DrbdClone2
   Resource Group: group_drbd2:0
   xen_drbd2_1:0(ocf::linbit:drbd): Slave xen10 (unmanaged)
FAILED
   xen_drbd2_2:0(ocf::linbit:drbd): Stopped
   Masters: [ xen11 ]
   Master/Slave Set: DrbdClone3
   Masters: [ xen10 ]
   Slaves: [ xen11 ]
   Master/Slave Set: DrbdClone5
   Masters: [ xen11 ]
   Slaves: [ xen10 ]
   Master/Slave Set: DrbdClone6
   Slaves: [ xen11 xen10 ]
   Master/Slave Set: DrbdClone4
   Masters: [ xen11 ]
   Slaves: [ xen10 ]

Failed actions:
  xen_cmsdb_monitor_3000 (node=xen10, call=571, rc=7,
status=complete): not running
  xen_drbd1_2:1_promote_0 (node=xen10, call=5205, rc=1,
status=complete): unknown error
  xen_drbd2_1:1_promote_0 (node=xen10, call=790, rc=1,
status=complete): unknown error
  xen_ns2_monitor_3000 (node=xen10, call=601, rc=7,
status=complete): not running
  xen_drbd3_1:1_promote_0 (node=xen10, call=383, rc=-2,
status=Timed Out): unknown exec error
  xen_drbd2_1:0_promote_0 (node=xen10, call=1326, rc=-2,
status=Timed Out): unknown exec error
  xen_drbd2_1:0_stop_0 (node=xen10, call=1348, rc=-2, status=Timed
Out): unknown exec error

xen11:# crm resource cleanup xen_drbd2_1
Error performing operation: The object/attribute does not exist
Error performing operation: The object/attribute does not exist


Given the list of resources displayed by crm_mon, the command you need is

crm resource cleanup DrbdClone2

I can't say whether that will fix your problems, but you won't get the 
does not exist message.


Somewhere in either Pacemaker Explained or Clusters From Scratch, it 
says that once you clone or ms a resource, you can't refer to that 
resource as an individual anymore; you have to use the clone/ms name.


What I did when faced with a problem like yours is cat /proc/drbd, 
look at the lines for the failed drbd, and fix it on my own. Then I'd 
type the cleanup command for pacemaker to pick up the current state of 
the resource.



# xm list
NameID   Mem VCPUs
State   Time(s)
Domain-0 0  100516 r-
40648.5
admintool5  4096 2 -
b   7455.4
cmsdb3  2048 2 -
b   2106.5
typo32  1024 2 -
b   2890.9
www  1  1024 1 -
b855.0


xen11:# drbdadm status
drbd-status version=8.3.7 api=88
resources config_file=/etc/drbd.conf
resource minor=1 name=drbd1_1 cs=Connected ro1=Primary
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=2 name=drbd1_2 cs=Connected ro1=Primary
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=3 name=drbd2_1 cs=Connected ro1=Primary
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=4 name=drbd2_2 cs=Connected ro1=Primary
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=5 name=drbd3_1 cs=Connected ro1=Secondary
ro2=Primary ds1=UpToDate ds2=UpToDate /
resource minor=6 name=drbd3_2 cs=Connected ro1=Secondary
ro2=Primary ds1=UpToDate ds2=UpToDate /
resource minor=7 name=drbd4_1 cs=Connected ro1=Primary
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=8 name=drbd4_2 cs=Connected ro1=Primary
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=9 name=drbd5_1 cs=Connected ro1=Primary
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=10 name=drbd5_2 cs=Connected ro1=Primary
ro2=Secondary ds1=UpToDate ds2=UpToDate /
resource minor=11 name=drbd6_1 cs=StandAlone ro1=Secondary
ro2=Unknown ds1=Outdated ds2=DUnknown /
resource minor=12 name=drbd6_2 cs=StandAlone ro1=Secondary
ro2=Unknown ds1=Outdated ds2=DUnknown /
!-- resource minor=13 name=drbd7_1 not available or not yet
created --
!-- resource minor=14 name=drbd7_2 not available or not yet
created --
!-- resource minor=15 name=drbd8_1 not available or 

Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-08 Thread Helmut Wollmersdorfer

Am 08.03.2012 um 13:33 schrieb William Seligman:

 On 3/8/12 6:53 AM, Helmut Wollmersdorfer wrote:

 [...]

   Master/Slave Set: DrbdClone2
   Resource Group: group_drbd2:0
   xen_drbd2_1:0  (ocf::linbit:drbd): Slave xen10 (unmanaged)
 FAILED
   xen_drbd2_2:0  (ocf::linbit:drbd): Stopped
   Masters: [ xen11 ]
 [...]
  xen_drbd2_1:1_promote_0 (node=xen10, call=790, rc=1,
 status=complete): unknown error
 [...]
  xen_drbd2_1:0_promote_0 (node=xen10, call=1326, rc=-2,
 status=Timed Out): unknown exec error
  xen_drbd2_1:0_stop_0 (node=xen10, call=1348, rc=-2, status=Timed
 Out): unknown exec error

 xen11:# crm resource cleanup xen_drbd2_1
 Error performing operation: The object/attribute does not exist
 Error performing operation: The object/attribute does not exist

 Given the list of resources displayed by crm_mon, the command you  
 need is

 crm resource cleanup DrbdClone2

Thx. Works fine.


 I can't say whether that will fix your problems, but you won't get  
 the does not exist message.

 Somewhere in either Pacemaker Explained or Clusters From  
 Scratch, it says that once you clone or ms a resource, you can't  
 refer to that resource as an individual anymore; you have to use the  
 clone/ms name.

 What I did when faced with a problem like yours is cat /proc/drbd,  
 look at the lines for the failed drbd, and fix it on my own. Then  
 I'd type the cleanup command for pacemaker to pick up the current  
 state of the resource.

The DRBD-resources are fine (see below). The failed action messages in  
the CRM seem to get not cleaned sometimes for some reason.

 resource minor=3 name=drbd2_1 cs=Connected ro1=Primary
 ro2=Secondary ds1=UpToDate ds2=UpToDate /
 resource minor=4 name=drbd2_2 cs=Connected ro1=Primary
 ro2=Secondary ds1=UpToDate ds2=UpToDate /

Thx again

Helmut Wollmersdorfer



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-07 Thread Jerome Yanga
I would just want to share that the command recommended did NOT move
the resource to another node.  It basically clears the Failed Actions
section.

Thanks again, Bill.

Regards,
j

On Tue, Mar 6, 2012 at 11:46 AM, William Seligman
selig...@nevis.columbia.edu wrote:
 On 3/6/12 2:38 PM, Jerome Yanga wrote:

 Do you know by chance if that command you have provided bounces the resource?

 I don't know what you mean by bounce the resource. According to:

 http://www.clusterlabs.org/doc/crm_cli.html

 the command refreshes the resource status. Depending on your configuration, it
 might shift a resource to another node.

 But I am not an expert! I merely knew how to clear up the error message.

 On Tue, Mar 6, 2012 at 10:28 AM, William Seligman
 selig...@nevis.columbia.edu wrote:
 On 3/6/12 1:04 PM, Jerome Yanga wrote:
 crm_mon shows the error below.

 Failed actions:
 � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132,
 rc=-2, status=Timed Out): unknown exec error

 I have check DRBD and the mirror is connected and uptodate on both nodes.

 The error above caused the resources to failover and it seems to be
 working OK. �However, the failed actions section has not disappeared.
 How do I clear this error?

 crm resource cleanup drbd0

 --
 Bill Seligman             | Phone: (914) 591-2823
 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
 PO Box 137                |
 Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-07 Thread William Seligman
On 3/7/12 10:50 AM, Jerome Yanga wrote:
 I would just want to share that the command recommended did NOT move
 the resource to another node.  It basically clears the Failed Actions
 section.

This is why I was conditional in my response. Suppose you had something like the
following:

primitive MyResource ocf:heartbeat:Dummy
location MyResourcePreferredNode MyResource 10: my-node-a.example.com

with no resource-stickiness set. Assume MyResource fails on my-node-a, and is
moved to my-node-b. Then if you were to do:

crm resource cleanup MyResource

pacemaker might move MyResource back to my-node-a. It might even move it back
without that example MyResourcePreferredNode constraint. If you want to avoid
that, consider per-resource or global resource-stickiness:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch05s03s02.html
http://www.gossamer-threads.com/lists/linuxha/pacemaker/64076

Again, a disclaimer: I am not an expert.

 On Tue, Mar 6, 2012 at 11:46 AM, William Seligman
 selig...@nevis.columbia.edu wrote:
 On 3/6/12 2:38 PM, Jerome Yanga wrote:

 Do you know by chance if that command you have provided bounces the 
 resource?

 I don't know what you mean by bounce the resource. According to:

 http://www.clusterlabs.org/doc/crm_cli.html

 the command refreshes the resource status. Depending on your configuration, 
 it
 might shift a resource to another node.

 But I am not an expert! I merely knew how to clear up the error message.

 On Tue, Mar 6, 2012 at 10:28 AM, William Seligman
 selig...@nevis.columbia.edu wrote:
 On 3/6/12 1:04 PM, Jerome Yanga wrote:
 crm_mon shows the error below.

 Failed actions:
 � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132,
 rc=-2, status=Timed Out): unknown exec error

 I have check DRBD and the mirror is connected and uptodate on both nodes.

 The error above caused the resources to failover and it seems to be
 working OK. �However, the failed actions section has not disappeared.
 How do I clear this error?

 crm resource cleanup drbd0

-- 
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
PO Box 137|
Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/



smime.p7s
Description: S/MIME Cryptographic Signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-07 Thread Florian Haas
On Wed, Mar 7, 2012 at 5:51 PM, William Seligman
selig...@nevis.columbia.edu wrote:
 Again, a disclaimer: I am not an expert.

Your advice was spot on. :)

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-06 Thread William Seligman
On 3/6/12 1:04 PM, Jerome Yanga wrote:
 crm_mon shows the error below.
 
 Failed actions:
 drbd0:1_monitor_59000 (node=testserver1.example.com, call=132,
 rc=-2, status=Timed Out): unknown exec error
 
 I have check DRBD and the mirror is connected and uptodate on both nodes.
 
 The error above caused the resources to failover and it seems to be
 working OK.  However, the failed actions section has not disappeared.
 How do I clear this error?

crm resource cleanup drbd0

-- 
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
PO Box 137|
Irvington NY 10533 USA| http://www.nevis.columbia.edu/~seligman/



smime.p7s
Description: S/MIME Cryptographic Signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-06 Thread Jerome Yanga
Thanks, Bill.

Do you know by chance if that command you have provided bounces the resource?

Regards,
j

On Tue, Mar 6, 2012 at 10:28 AM, William Seligman
selig...@nevis.columbia.edu wrote:
 On 3/6/12 1:04 PM, Jerome Yanga wrote:
 crm_mon shows the error below.

 Failed actions:
     drbd0:1_monitor_59000 (node=testserver1.example.com, call=132,
 rc=-2, status=Timed Out): unknown exec error

 I have check DRBD and the mirror is connected and uptodate on both nodes.

 The error above caused the resources to failover and it seems to be
 working OK.  However, the failed actions section has not disappeared.
 How do I clear this error?

 crm resource cleanup drbd0

 --
 Bill Seligman             | Phone: (914) 591-2823
 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
 PO Box 137                |
 Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-06 Thread Jerome Yanga
Understood.  Thanks again, Bill.

Regards,
j

On Tue, Mar 6, 2012 at 11:46 AM, William Seligman
selig...@nevis.columbia.edu wrote:
 On 3/6/12 2:38 PM, Jerome Yanga wrote:

 Do you know by chance if that command you have provided bounces the resource?

 I don't know what you mean by bounce the resource. According to:

 http://www.clusterlabs.org/doc/crm_cli.html

 the command refreshes the resource status. Depending on your configuration, it
 might shift a resource to another node.

 But I am not an expert! I merely knew how to clear up the error message.

 On Tue, Mar 6, 2012 at 10:28 AM, William Seligman
 selig...@nevis.columbia.edu wrote:
 On 3/6/12 1:04 PM, Jerome Yanga wrote:
 crm_mon shows the error below.

 Failed actions:
 � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132,
 rc=-2, status=Timed Out): unknown exec error

 I have check DRBD and the mirror is connected and uptodate on both nodes.

 The error above caused the resources to failover and it seems to be
 working OK. �However, the failed actions section has not disappeared.
 How do I clear this error?

 crm resource cleanup drbd0

 --
 Bill Seligman             | Phone: (914) 591-2823
 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
 PO Box 137                |
 Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems