Greetings ... Happy New Year! I am testing a configuration that is created from example in "Chapter 6. Configuring a GFS2 File System in a Cluster" of the "Red Hat Enterprise Linux 7.0 Beta Global File System 2" document. Only addition is stonith:fence_ipmilan. After encountering this issue when I configured with "crm", I re-configured using "pcs". I've included the configuration below.
I'm thinking that, in a 2-node cluster, if I run "stonith_admin -F <peer-node>", then <peer-node> should reboot and cleanly rejoin the cluster. This is not happening. What ultimately happens is that after the initially fenced node reboots, the system from which the stonith_admin -F command was run is fenced and reboots. The fencing stops there, leaving the cluster in an appropriate state. The issue seems to reside with clvmd/lvm. With the reboot of the initially fenced node, the clvmd resource fails on the surviving node, with a maximum of errors. I hypothesize there is an issue with locks, but have insufficient knowledge of clvmd/lvm locks to prove or disprove this hypothesis. Have I missed something ... 1) Is this expected behavior, and always the reboot of the fencing node happens? 2) Or, maybe I didn't correctly duplicate the Chapter 6 example? 3) Or, perhaps something is wrong or omitted from the Chapter 6 example? Suggestions will be much appreciated. Thanks, Bob Haxo RHEL6.5 pacemaker-cli-1.1.10-14.el6_5.1.x86_64 crmsh-1.2.5-55.1sgi709r3.rhel6.x86_64 pacemaker-libs-1.1.10-14.el6_5.1.x86_64 cman-3.0.12.1-59.el6_5.1.x86_64 pacemaker-1.1.10-14.el6_5.1.x86_64 corosynclib-1.4.1-17.el6.x86_64 corosync-1.4.1-17.el6.x86_64 pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64 Cluster Name: mici Corosync Nodes: Pacemaker Nodes: mici-admin mici-admin2 Resources: Clone: clusterfs-clone Meta Attrs: interleave=true target-role=Started Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/vgha2/lv_clust2 directory=/images fstype=gfs2 options=defaults,noatime,nodiratime Operations: monitor on-fail=fence interval=30s (clusterfs-monitor-interval-30s) Clone: clvmd-clone Meta Attrs: interleave=true ordered=true target-role=Started Resource: clvmd (class=lsb type=clvmd) Operations: monitor on-fail=fence interval=30s (clvmd-monitor-interval-30s) Clone: dlm-clone Meta Attrs: interleave=true ordered=true Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: monitor on-fail=fence interval=30s (dlm-monitor-interval-30s) Stonith Devices: Resource: p_ipmi_fencing_1 (class=stonith type=fence_ipmilan) Attributes: ipaddr=128.##.##.78 login=XXXXX passwd=XXXXX lanplus=1 action=reboot pcmk_host_check=static-list pcmk_host_list=mici-admin Meta Attrs: target-role=Started Operations: monitor start-delay=30 interval=60s timeout=30 (p_ipmi_fencing_1-monitor-60s) Resource: p_ipmi_fencing_2 (class=stonith type=fence_ipmilan) Attributes: ipaddr=128.##.##.220 login=XXXXX passwd=XXXXX lanplus=1 action=reboot pcmk_host_check=static-list pcmk_host_list=mici-admin2 Meta Attrs: target-role=Started Operations: monitor start-delay=30 interval=60s timeout=30 (p_ipmi_fencing_2-monitor-60s) Fencing Levels: Location Constraints: Resource: p_ipmi_fencing_1 Disabled on: mici-admin (score:-INFINITY) (id:location-p_ipmi_fencing_1-mici-admin--INFINITY) Resource: p_ipmi_fencing_2 Disabled on: mici-admin2 (score:-INFINITY) (id:location-p_ipmi_fencing_2-mici-admin2--INFINITY) Ordering Constraints: start dlm-clone then start clvmd-clone (Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory) start clvmd-clone then start clusterfs-clone (Mandatory) (id:order-clvmd-clone-clusterfs-clone-mandatory) Colocation Constraints: clusterfs-clone with clvmd-clone (INFINITY) (id:colocation-clusterfs-clone-clvmd-clone-INFINITY) clvmd-clone with dlm-clone (INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY) Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.10-14.el6_5.1-368c726 last-lrm-refresh: 1388530552 no-quorum-policy: ignore stonith-enabled: true Node Attributes: mici-admin: standby=off mici-admin2: standby=off Last updated: Tue Dec 31 17:15:55 2013 Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin Stack: cman Current DC: mici-admin2 - partition with quorum Version: 1.1.10-14.el6_5.1-368c726 2 Nodes configured 8 Resources configured Online: [ mici-admin mici-admin2 ] Full list of resources: p_ipmi_fencing_1 (stonith:fence_ipmilan): Started mici-admin2 p_ipmi_fencing_2 (stonith:fence_ipmilan): Started mici-admin Clone Set: clusterfs-clone [clusterfs] Started: [ mici-admin mici-admin2 ] Clone Set: clvmd-clone [clvmd] Started: [ mici-admin mici-admin2 ] Clone Set: dlm-clone [dlm] Started: [ mici-admin mici-admin2 ] Migration summary: * Node mici-admin: * Node mici-admin2: ===================================================== crm_mon after the fenced node reboots. Shows the failure of clvmd that then occurs, which in turn triggers a fencing of that nnode Last updated: Tue Dec 31 17:06:55 2013 Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin Stack: cman Current DC: mici-admin - partition with quorum Version: 1.1.10-14.el6_5.1-368c726 2 Nodes configured 8 Resources configured Node mici-admin: UNCLEAN (online) Online: [ mici-admin2 ] Full list of resources: p_ipmi_fencing_1 (stonith:fence_ipmilan): Stopped p_ipmi_fencing_2 (stonith:fence_ipmilan): Started mici-admin Clone Set: clusterfs-clone [clusterfs] Started: [ mici-admin ] Stopped: [ mici-admin2 ] Clone Set: clvmd-clone [clvmd] clvmd (lsb:clvmd): FAILED mici-admin Stopped: [ mici-admin2 ] Clone Set: dlm-clone [dlm] Started: [ mici-admin mici-admin2 ] Migration summary: * Node mici-admin: clvmd: migration-threshold=1000000 fail-count=1 last-failure='Tue Dec 31 17:04:29 2013' * Node mici-admin2: Failed actions: clvmd_monitor_30000 on mici-admin 'unknown error' (1): call=60, status=Timed Out, la st-rc-change='Tue Dec 31 17:04:29 2013', queued=0ms, exec=0ms
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org