On 18 Feb 2014, at 5:52 am, Asgaroth <li...@blueface.com> wrote: >> -----Original Message----- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: 17 February 2014 00:55 >> To: li...@blueface.com; The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] node1 fencing itself after node2 being fenced >> >> >> If you have configured cman to use fence_pcmk, then all cman/dlm/clvmd >> fencing operations are sent to Pacemaker. >> If you aren't running pacemaker, then you have a big problem as no-one can >> perform fencing. > > I have configured pacemaker as the resource manager and I have it enabled to > start on boot-up too as follows: > > chkconfig cman on > chkconfig clvmd on > chkconfig pacemaker on > >> >> I don't know if you are testing without pacemaker running, but if so you >> would need to configure cman with real fencing devices. >> > > I have been testing with pacemaker running and the fencing appears to be > operating fine, the issue I seem to have is that clvmd is unable re-acquire > its locks when attempting to rejoin the cluster after a fence operation, so > it looks like clvmd just hangs when the startup script fires it off on > boot-up. When the 3rd node is in this state (hung clvmd), then the other 2 > nodes are unable to obtain locks from the third node as clvmd has hung, as > an example, this is what happens when the 3rd node is hung at the clvmd > startup phase after pacemaker has issued a fence operation (running pvs on > node1)
The 3rd node should (and needs to be) fenced at this point to allow the cluster to continue. Is this not happening? Did you specify on-fail=fence for the clvmd agent? > > [root@test01 ~]# pvs > Error locking on node test03: Command timed out > Unable to obtain global lock. > > The dlm elements look fine to me here too: > > [root@test01 ~]# dlm_tool ls > dlm lockspaces > name cdr > id 0xa8054052 > flags 0x00000008 fs_reg > change member 2 joined 0 remove 1 failed 1 seq 2,2 > members 1 2 > > name clvmd > id 0x4104eefa > flags 0x00000000 > change member 3 joined 1 remove 0 failed 0 seq 3,3 > members 1 2 3 > > So it looks like cman/dlm are operating properly, however, clvmd hangs and > never exits so pacemaker never starts on the 3rd node. So the 3rd node is in > "pending" state while clvmd is hung: > > [root@test02 ~]# crm_mon -Afr -1 > Last updated: Mon Feb 17 15:52:28 2014 > Last change: Mon Feb 17 15:43:16 2014 via cibadmin on test01 > Stack: cman > Current DC: test02 - partition with quorum > Version: 1.1.10-14.el6_5.2-368c726 > 3 Nodes configured > 15 Resources configured > > > Node test03: pending > Online: [ test01 test02 ] > > Full list of resources: > > fence_test01 (stonith:fence_vmware_soap): Started test01 > fence_test02 (stonith:fence_vmware_soap): Started test02 > fence_test03 (stonith:fence_vmware_soap): Started test01 > Clone Set: fs_cdr-clone [fs_cdr] > Started: [ test01 test02 ] > Stopped: [ test03 ] > Resource Group: sftp01-vip > vip-001 (ocf::heartbeat:IPaddr2): Started test01 > vip-002 (ocf::heartbeat:IPaddr2): Started test01 > Resource Group: sftp02-vip > vip-003 (ocf::heartbeat:IPaddr2): Started test02 > vip-004 (ocf::heartbeat:IPaddr2): Started test02 > Resource Group: sftp03-vip > vip-005 (ocf::heartbeat:IPaddr2): Started test02 > vip-006 (ocf::heartbeat:IPaddr2): Started test02 > sftp01 (lsb:sftp01): Started test01 > sftp02 (lsb:sftp02): Started test02 > sftp03 (lsb:sftp03): Started test02 > > Node Attributes: > * Node test01: > * Node test02: > * Node test03: > > Migration summary: > * Node test03: > * Node test02: > * Node test01: > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org