Digimer, Yes, for the configuration that includes drbd, 'crm-fence-peer.sh' and 'resource-and-stonith' are included in the configuration.
Thanks, Bob Haxo On Wed, 2014-01-01 at 01:04 -0500, Digimer wrote: > Did you hook DRBD into pacemaker's fencing using 'crm-fence-peer.sh' and > set the fencing policy to 'resource-and-stonith;'? If not, do so! It > will protect against split-brains. > > digimer > > On 01/01/14 01:03 AM, Bob Haxo wrote: > > Digimer, > > > > Ok, sounds reasonable and I will investigate this further on Jan 2. WRT > > DRBD ... geeee, I don't recall multiple fencings. I'll check that also > > on Jan 2. > > > > Emmanuel, > > > > I have not seen pending fencing operations with "dlm_tool ls" ... but I > > have seen the word "pending" elsewhere (crm_mon?) without considering > > that it might be fencing that is pending. Interesting. > > > > Thanks & my best wishes for a healthy new year. > > Bob Haxo > > > > > > On Wed, 2014-01-01 at 00:19 -0500, Digimer wrote: > >> This is probably because cman (which is it's own cluster stack and used > >> to provide DLM and quorum to pacemaker on EL6) detected the node failed > >> after the initial fence and called it's own fence. You see a similar > >> behaviour when using DRBD. It will also call a fence when the peer dies > >> (even when it died because of a controlled fence call). In theory, > >> pacemaker using cman's dlm with DRBD would trigger three fences per > >> failure. :) > >> > >> digimer > >> > >> On 01/01/14 12:04 AM, emmanuel segura wrote: > >> > maybe you missing log when you had fenced the node? because i think the > >> > clvmd hungup because your node are in unclean state, use dlm_tool ls to > >> > see if you any pending fencing operation. > >> > > >> > > >> > 2014/1/1 Bob Haxo <bh...@sgi.com <mailto:bh...@sgi.com> > >> > <mailto:bh...@sgi.com>> > >> > > >> > __ > >> > Greetings ... Happy New Year! > >> > > >> > I am testing a configuration that is created from example in > >> > "Chapter 6. Configuring a GFS2 File System in a Cluster" of the "Red > >> > Hat Enterprise Linux 7.0 Beta Global File System 2" document. Only > >> > addition is stonith:fence_ipmilan. After encountering this issue > >> > when I configured with "crm", I re-configured using "pcs". I've > >> > included the configuration below. > >> > > >> > I'm thinking that, in a 2-node cluster, if I run "stonith_admin -F > >> > <peer-node>", then <peer-node> should reboot and cleanly rejoin the > >> > cluster. This is not happening. > >> > > >> > What ultimately happens is that after the initially fenced node > >> > reboots, the system from which the stonith_admin -F command was run > >> > is fenced and reboots. The fencing stops there, leaving the cluster > >> > in an appropriate state. > >> > > >> > The issue seems to reside with clvmd/lvm. With the reboot of the > >> > initially fenced node, the clvmd resource fails on the surviving > >> > node, with a maximum of errors. I hypothesize there is an issue > >> > with locks, but have insufficient knowledge of clvmd/lvm locks to > >> > prove or disprove this hypothesis. > >> > > >> > Have I missed something ... > >> > > >> > 1) Is this expected behavior, and always the reboot of the fencing > >> > node happens? > >> > > >> > 2) Or, maybe I didn't correctly duplicate the Chapter 6 example? > >> > > >> > 3) Or, perhaps something is wrong or omitted from the Chapter 6 > >> > example? > >> > > >> > Suggestions will be much appreciated. > >> > > >> > Thanks, > >> > Bob Haxo > >> > > >> > RHEL6.5 > >> > pacemaker-cli-1.1.10-14.el6_5.1.x86_64 > >> > crmsh-1.2.5-55.1sgi709r3.rhel6.x86_64 > >> > pacemaker-libs-1.1.10-14.el6_5.1.x86_64 > >> > cman-3.0.12.1-59.el6_5.1.x86_64 > >> > pacemaker-1.1.10-14.el6_5.1.x86_64 > >> > corosynclib-1.4.1-17.el6.x86_64 > >> > corosync-1.4.1-17.el6.x86_64 > >> > pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64 > >> > > >> > Cluster Name: mici > >> > Corosync Nodes: > >> > > >> > Pacemaker Nodes: > >> > mici-admin mici-admin2 > >> > > >> > Resources: > >> > Clone: clusterfs-clone > >> > Meta Attrs: interleave=true target-role=Started > >> > Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem) > >> > Attributes: device=/dev/vgha2/lv_clust2 directory=/images > >> > fstype=gfs2 options=defaults,noatime,nodiratime > >> > Operations: monitor on-fail=fence interval=30s > >> > (clusterfs-monitor-interval-30s) > >> > Clone: clvmd-clone > >> > Meta Attrs: interleave=true ordered=true target-role=Started > >> > Resource: clvmd (class=lsb type=clvmd) > >> > Operations: monitor on-fail=fence interval=30s > >> > (clvmd-monitor-interval-30s) > >> > Clone: dlm-clone > >> > Meta Attrs: interleave=true ordered=true > >> > Resource: dlm (class=ocf provider=pacemaker type=controld) > >> > Operations: monitor on-fail=fence interval=30s > >> > (dlm-monitor-interval-30s) > >> > > >> > Stonith Devices: > >> > Resource: p_ipmi_fencing_1 (class=stonith type=fence_ipmilan) > >> > Attributes: ipaddr=128.##.##.78 login=XXXXX passwd=XXXXX > >> > lanplus=1 action=reboot pcmk_host_check=static-list > >> > pcmk_host_list=mici-admin > >> > Meta Attrs: target-role=Started > >> > Operations: monitor start-delay=30 interval=60s timeout=30 > >> > (p_ipmi_fencing_1-monitor-60s) > >> > Resource: p_ipmi_fencing_2 (class=stonith type=fence_ipmilan) > >> > Attributes: ipaddr=128.##.##.220 login=XXXXX passwd=XXXXX > >> > lanplus=1 action=reboot pcmk_host_check=static-list > >> > pcmk_host_list=mici-admin2 > >> > Meta Attrs: target-role=Started > >> > Operations: monitor start-delay=30 interval=60s timeout=30 > >> > (p_ipmi_fencing_2-monitor-60s) > >> > Fencing Levels: > >> > > >> > Location Constraints: > >> > Resource: p_ipmi_fencing_1 > >> > Disabled on: mici-admin (score:-INFINITY) > >> > (id:location-p_ipmi_fencing_1-mici-admin--INFINITY) > >> > Resource: p_ipmi_fencing_2 > >> > Disabled on: mici-admin2 (score:-INFINITY) > >> > (id:location-p_ipmi_fencing_2-mici-admin2--INFINITY) > >> > Ordering Constraints: > >> > start dlm-clone then start clvmd-clone (Mandatory) > >> > (id:order-dlm-clone-clvmd-clone-mandatory) > >> > start clvmd-clone then start clusterfs-clone (Mandatory) > >> > (id:order-clvmd-clone-clusterfs-clone-mandatory) > >> > Colocation Constraints: > >> > clusterfs-clone with clvmd-clone (INFINITY) > >> > (id:colocation-clusterfs-clone-clvmd-clone-INFINITY) > >> > clvmd-clone with dlm-clone (INFINITY) > >> > (id:colocation-clvmd-clone-dlm-clone-INFINITY) > >> > > >> > Cluster Properties: > >> > cluster-infrastructure: cman > >> > dc-version: 1.1.10-14.el6_5.1-368c726 > >> > last-lrm-refresh: 1388530552 > >> > no-quorum-policy: ignore > >> > stonith-enabled: true > >> > Node Attributes: > >> > mici-admin: standby=off > >> > mici-admin2: standby=off > >> > > >> > > >> > Last updated: Tue Dec 31 17:15:55 2013 > >> > Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin > >> > Stack: cman > >> > Current DC: mici-admin2 - partition with quorum > >> > Version: 1.1.10-14.el6_5.1-368c726 > >> > 2 Nodes configured > >> > 8 Resources configured > >> > > >> > Online: [ mici-admin mici-admin2 ] > >> > > >> > Full list of resources: > >> > > >> > p_ipmi_fencing_1 (stonith:fence_ipmilan): Started > >> > mici-admin2 > >> > p_ipmi_fencing_2 (stonith:fence_ipmilan): Started > >> > mici-admin > >> > Clone Set: clusterfs-clone [clusterfs] > >> > Started: [ mici-admin mici-admin2 ] > >> > Clone Set: clvmd-clone [clvmd] > >> > Started: [ mici-admin mici-admin2 ] > >> > Clone Set: dlm-clone [dlm] > >> > Started: [ mici-admin mici-admin2 ] > >> > > >> > Migration summary: > >> > * Node mici-admin: > >> > * Node mici-admin2: > >> > > >> > ===================================================== > >> > crm_mon after the fenced node reboots. Shows the failure of clvmd > >> > that then > >> > occurs, which in turn triggers a fencing of that nnode > >> > > >> > Last updated: Tue Dec 31 17:06:55 2013 > >> > Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin > >> > Stack: cman > >> > Current DC: mici-admin - partition with quorum > >> > Version: 1.1.10-14.el6_5.1-368c726 > >> > 2 Nodes configured > >> > 8 Resources configured > >> > > >> > Node mici-admin: UNCLEAN (online) > >> > Online: [ mici-admin2 ] > >> > > >> > Full list of resources: > >> > > >> > p_ipmi_fencing_1 (stonith:fence_ipmilan): Stopped > >> > p_ipmi_fencing_2 (stonith:fence_ipmilan): Started > >> > mici-admin > >> > Clone Set: clusterfs-clone [clusterfs] > >> > Started: [ mici-admin ] > >> > Stopped: [ mici-admin2 ] > >> > Clone Set: clvmd-clone [clvmd] > >> > clvmd (lsb:clvmd): FAILED mici-admin > >> > Stopped: [ mici-admin2 ] > >> > Clone Set: dlm-clone [dlm] > >> > Started: [ mici-admin mici-admin2 ] > >> > > >> > Migration summary: > >> > * Node mici-admin: > >> > clvmd: migration-threshold=1000000 fail-count=1 > >> > last-failure='Tue Dec 31 17:04:29 2013' > >> > * Node mici-admin2: > >> > > >> > Failed actions: > >> > clvmd_monitor_30000 on mici-admin 'unknown error' (1): call=60, > >> > status=Timed Out, la > >> > st-rc-change='Tue Dec 31 17:04:29 2013', queued=0ms, exec=0ms > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > Pacemaker mailing list:Pacemaker@oss.clusterlabs.org > >> > <mailto:Pacemaker@oss.clusterlabs.org> > >> > <mailto:Pacemaker@oss.clusterlabs.org> > >> >http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > > >> > Project Home:http://www.clusterlabs.org > >> > Getting > >> > started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> > Bugs:http://bugs.clusterlabs.org > >> > > >> > > >> > > >> > > >> > -- > >> > esta es mi vida e me la vivo hasta que dios quiera > >> > > >> > > >> > _______________________________________________ > >> > Pacemaker mailing list:Pacemaker@oss.clusterlabs.org > >> > <mailto:Pacemaker@oss.clusterlabs.org> > >> >http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > > >> > Project Home:http://www.clusterlabs.org > >> > Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> > Bugs:http://bugs.clusterlabs.org > >> > > >> > >> > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org