On 1/6/2014 6:24 PM, Bob Haxo wrote: > Hi Fabio, > >>> There is an example on how to configure gfs2 also in the rhel6.5 >>> pacemaker documentation, using pcs. > > Super! Please share the link to this documentation. I only discovered > the gfs2+pcs example with the rhel7 beta docs.
You are right, the gfs2 example was not published in Rev 1 of the pacemaker documentation for RHEL6.5. It´s entirely possible I missed it during doc review, sorry about that! https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/index.html Short version is: chkconfig cman on chkconfig clvmd on chkconfig pacemaker on Use the above doc to setup / start the cluster (stop after stonith config) Setup your clvmd storage (note that neither dlm or clvmd are managed by pacemaker in RHEL6.5 vs RHEL7 where it´s all managed by pacemaker). Start adding your resources/services here etc... Also, make absolutely sure you have all the latest updates from 6.5 Erratas installed. Fabio > > Bob Haxo > > > > On Sat, 2014-01-04 at 16:56 +0100, Fabio M. Di Nitto wrote: >> On 01/01/2014 01:57 AM, Bob Haxo wrote: >> > Greetings ... Happy New Year! >> > >> > I am testing a configuration that is created from example in "Chapter 6. >> > Configuring a GFS2 File System in a Cluster" of the "Red Hat Enterprise >> > Linux 7.0 Beta Global File System 2" document. Only addition is >> > stonith:fence_ipmilan. After encountering this issue when I configured >> > with "crm", I re-configured using "pcs". I've included the configuration >> > below. >> >> Hold on a second here.. why are you using RHEL7 documentation to >> configure RHEL6.5? Please don't mix :) there are some differences and we >> definitely never tested mixing those up. >> >> There is an example on how to configure gfs2 also in the rhel6.5 >> pacemaker documentation, using pcs. >> >> I personally never saw this behaviour, so it's entirely possible that >> mixing things up will result in unpredictable status. >> >> Fabio >> >> > >> > I'm thinking that, in a 2-node cluster, if I run "stonith_admin -F >> > <peer-node>", then <peer-node> should reboot and cleanly rejoin the >> > cluster. This is not happening. >> > >> > What ultimately happens is that after the initially fenced node reboots, >> > the system from which the stonith_admin -F command was run is fenced and >> > reboots. The fencing stops there, leaving the cluster in an appropriate >> > state. >> > >> > The issue seems to reside with clvmd/lvm. With the reboot of the >> > initially fenced node, the clvmd resource fails on the surviving node, >> > with a maximum of errors. I hypothesize there is an issue with locks, >> > but have insufficient knowledge of clvmd/lvm locks to prove or disprove >> > this hypothesis. >> > >> > Have I missed something ... >> > >> > 1) Is this expected behavior, and always the reboot of the fencing node >> > happens? >> > >> > 2) Or, maybe I didn't correctly duplicate the Chapter 6 example? >> > >> > 3) Or, perhaps something is wrong or omitted from the Chapter 6 example? >> > >> > Suggestions will be much appreciated. >> > >> > Thanks, >> > Bob Haxo >> > >> > RHEL6.5 >> > pacemaker-cli-1.1.10-14.el6_5.1.x86_64 >> > crmsh-1.2.5-55.1sgi709r3.rhel6.x86_64 >> > pacemaker-libs-1.1.10-14.el6_5.1.x86_64 >> > cman-3.0.12.1-59.el6_5.1.x86_64 >> > pacemaker-1.1.10-14.el6_5.1.x86_64 >> > corosynclib-1.4.1-17.el6.x86_64 >> > corosync-1.4.1-17.el6.x86_64 >> > pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64 >> > >> > Cluster Name: mici >> > Corosync Nodes: >> > >> > Pacemaker Nodes: >> > mici-admin mici-admin2 >> > >> > Resources: >> > Clone: clusterfs-clone >> > Meta Attrs: interleave=true target-role=Started >> > Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem) >> > Attributes: device=/dev/vgha2/lv_clust2 directory=/images fstype=gfs2 >> > options=defaults,noatime,nodiratime >> > Operations: monitor on-fail=fence interval=30s >> > (clusterfs-monitor-interval-30s) >> > Clone: clvmd-clone >> > Meta Attrs: interleave=true ordered=true target-role=Started >> > Resource: clvmd (class=lsb type=clvmd) >> > Operations: monitor on-fail=fence interval=30s >> > (clvmd-monitor-interval-30s) >> > Clone: dlm-clone >> > Meta Attrs: interleave=true ordered=true >> > Resource: dlm (class=ocf provider=pacemaker type=controld) >> > Operations: monitor on-fail=fence interval=30s >> > (dlm-monitor-interval-30s) >> > >> > Stonith Devices: >> > Resource: p_ipmi_fencing_1 (class=stonith type=fence_ipmilan) >> > Attributes: ipaddr=128.##.##.78 login=XXXXX passwd=XXXXX lanplus=1 >> > action=reboot pcmk_host_check=static-list pcmk_host_list=mici-admin >> > Meta Attrs: target-role=Started >> > Operations: monitor start-delay=30 interval=60s timeout=30 >> > (p_ipmi_fencing_1-monitor-60s) >> > Resource: p_ipmi_fencing_2 (class=stonith type=fence_ipmilan) >> > Attributes: ipaddr=128.##.##.220 login=XXXXX passwd=XXXXX lanplus=1 >> > action=reboot pcmk_host_check=static-list pcmk_host_list=mici-admin2 >> > Meta Attrs: target-role=Started >> > Operations: monitor start-delay=30 interval=60s timeout=30 >> > (p_ipmi_fencing_2-monitor-60s) >> > Fencing Levels: >> > >> > Location Constraints: >> > Resource: p_ipmi_fencing_1 >> > Disabled on: mici-admin (score:-INFINITY) >> > (id:location-p_ipmi_fencing_1-mici-admin--INFINITY) >> > Resource: p_ipmi_fencing_2 >> > Disabled on: mici-admin2 (score:-INFINITY) >> > (id:location-p_ipmi_fencing_2-mici-admin2--INFINITY) >> > Ordering Constraints: >> > start dlm-clone then start clvmd-clone (Mandatory) >> > (id:order-dlm-clone-clvmd-clone-mandatory) >> > start clvmd-clone then start clusterfs-clone (Mandatory) >> > (id:order-clvmd-clone-clusterfs-clone-mandatory) >> > Colocation Constraints: >> > clusterfs-clone with clvmd-clone (INFINITY) >> > (id:colocation-clusterfs-clone-clvmd-clone-INFINITY) >> > clvmd-clone with dlm-clone (INFINITY) >> > (id:colocation-clvmd-clone-dlm-clone-INFINITY) >> > >> > Cluster Properties: >> > cluster-infrastructure: cman >> > dc-version: 1.1.10-14.el6_5.1-368c726 >> > last-lrm-refresh: 1388530552 >> > no-quorum-policy: ignore >> > stonith-enabled: true >> > Node Attributes: >> > mici-admin: standby=off >> > mici-admin2: standby=off >> > >> > >> > Last updated: Tue Dec 31 17:15:55 2013 >> > Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin >> > Stack: cman >> > Current DC: mici-admin2 - partition with quorum >> > Version: 1.1.10-14.el6_5.1-368c726 >> > 2 Nodes configured >> > 8 Resources configured >> > >> > Online: [ mici-admin mici-admin2 ] >> > >> > Full list of resources: >> > >> > p_ipmi_fencing_1 (stonith:fence_ipmilan): Started mici-admin2 >> > p_ipmi_fencing_2 (stonith:fence_ipmilan): Started mici-admin >> > Clone Set: clusterfs-clone [clusterfs] >> > Started: [ mici-admin mici-admin2 ] >> > Clone Set: clvmd-clone [clvmd] >> > Started: [ mici-admin mici-admin2 ] >> > Clone Set: dlm-clone [dlm] >> > Started: [ mici-admin mici-admin2 ] >> > >> > Migration summary: >> > * Node mici-admin: >> > * Node mici-admin2: >> > >> > ===================================================== >> > crm_mon after the fenced node reboots. Shows the failure of clvmd that >> > then >> > occurs, which in turn triggers a fencing of that nnode >> > >> > Last updated: Tue Dec 31 17:06:55 2013 >> > Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin >> > Stack: cman >> > Current DC: mici-admin - partition with quorum >> > Version: 1.1.10-14.el6_5.1-368c726 >> > 2 Nodes configured >> > 8 Resources configured >> > >> > Node mici-admin: UNCLEAN (online) >> > Online: [ mici-admin2 ] >> > >> > Full list of resources: >> > >> > p_ipmi_fencing_1 (stonith:fence_ipmilan): Stopped >> > p_ipmi_fencing_2 (stonith:fence_ipmilan): Started mici-admin >> > Clone Set: clusterfs-clone [clusterfs] >> > Started: [ mici-admin ] >> > Stopped: [ mici-admin2 ] >> > Clone Set: clvmd-clone [clvmd] >> > clvmd (lsb:clvmd): FAILED mici-admin >> > Stopped: [ mici-admin2 ] >> > Clone Set: dlm-clone [dlm] >> > Started: [ mici-admin mici-admin2 ] >> > >> > Migration summary: >> > * Node mici-admin: >> > clvmd: migration-threshold=1000000 fail-count=1 last-failure='Tue Dec >> > 31 17:04:29 2013' >> > * Node mici-admin2: >> > >> > Failed actions: >> > clvmd_monitor_30000 on mici-admin 'unknown error' (1): call=60, >> > status=Timed Out, la >> > st-rc-change='Tue Dec 31 17:04:29 2013', queued=0ms, exec=0ms >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > <mailto:Pacemaker@oss.clusterlabs.org> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> > >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> <mailto:Pacemaker@oss.clusterlabs.org> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org