[ClusterLabs] fence_sanlock and pacemaker
Gents, I'm trying to configure a HA cluster with RHEL 6.5. Everything goes well except the fencing. The cluster's node are not connected to the management lan (where stand all the iLO/UPS/APC devices) and it's not planned to connecting them to this lan. With these constraints, I figured out that a way to get fencing working is to use *fence_sanlock*. I followed this tutorial: https://alteeve.ca/w/Watchdog_Recovery and I it worked (I got some problem with SELinux that I finally disabled like specified in the following RHEL 6.5 release note: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.5_Technical_Notes/ ) So perfect. The problem is that fence_sanlock relies on cman and not pacemaker. So with stonith disabled, pacemaker restarts the resources without waiting for the victim to be fenced and with stonith enabled, pacemaker complains about the lack of stonith resources and block all the cluster. I tried to put fence_sanlock as a stonith resource at the pacemaker level but as explained there http://oss.clusterlabs.org/pipermail/pacemaker/2013-May/017980.html it does not work and as explained there https://bugzilla.redhat.com/show_bug.cgi?id=962088 it's not planned to make it work. My question: is there a workaround ? Thank you, Laurent ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] multiple drives looks like balancing but why and causing troubles
I have a two node cluster. Both nodes are virtual and have five shared drives attached via sas controller. For some reason, the cluster shows both nodes have half the drives started on them. Not sure if this is called split brain or not. It definitely looks load balancing. But I did not set up load balancing. On my client, I only see the data for the shares on the active cluster node. But they should all be on the active cluster node. Any suggestions as to why this is happening? Is there a setting so that everything works on only one node at a time? pcs cluster status: Cluster name: CNAS Last updated: Wed Aug 26 13:35:47 2015 Last change: Wed Aug 26 13:28:55 2015 Stack: classic openais (with plugin) Current DC: nas02 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 11 Resources configured Online: [ nas01 nas02 ] Full list of resources: NAS(ocf::heartbeat:IPaddr2): Started nas01 Resource Group: datag datashare (ocf::heartbeat:Filesystem):Started nas02 dataserver (ocf::heartbeat:nfsserver): Started nas02 Resource Group: oomtlg oomtlshare (ocf::heartbeat:Filesystem):Started nas01 oomtlserver(ocf::heartbeat:nfsserver): Started nas01 Resource Group: oomtrg oomtrshare (ocf::heartbeat:Filesystem):Started nas02 oomtrserver(ocf::heartbeat:nfsserver): Started as02 Resource Group: oomblg oomblshare (ocf::heartbeat:Filesystem):Started nas01 oomblserver(ocf::heartbeat:nfsserver): Started nas01 Resource Group: oombrg oombrshare (ocf::heartbeat:Filesystem):Started nas02 oombrserver(ocf::heartbeat:nfsserver): Started nas02 pcs config show: Cluster Name: CNAS Corosync Nodes: nas01 nas02 Pacemaker Nodes: nas01 nas02 Resources: Resource: NAS (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.56.110 cidr_netmask=24 Operations: start interval=0s timeout=20s (NAS-start-timeout-20s) stop interval=0s timeout=20s (NAS-stop-timeout-20s) monitor interval=10s timeout=20s (NAS-monitor-interval-10s) Group: datag Resource: datashare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sdb1 directory=/data fstype=ext4 Operations: start interval=0s timeout=60 (datashare-start-timeout-60) stop interval=0s timeout=60 (datashare-stop-timeout-60) monitor interval=20 timeout=40 (datashare-monitor-interval-20) Resource: dataserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/data/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (dataserver-start-timeout-40) stop interval=0s timeout=20s (dataserver-stop-timeout-20s) monitor interval=10 timeout=20s (dataserver-monitor-interval-10) Group: oomtlg Resource: oomtlshare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sdc1 directory=/oomtl fstype=ext4 Operations: start interval=0s timeout=60 (oomtlshare-start-timeout-60) stop interval=0s timeout=60 (oomtlshare-stop-timeout-60) monitor interval=20 timeout=40 (oomtlshare-monitor-interval-20) Resource: oomtlserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/oomtl/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (oomtlserver-start-timeout-40) stop interval=0s timeout=20s (oomtlserver-stop-timeout-20s) monitor interval=10 timeout=20s (oomtlserver-monitor-interval-10) Group: oomtrg Resource: oomtrshare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sdd1 directory=/oomtr fstype=ext4 Operations: start interval=0s timeout=60 (oomtrshare-start-timeout-60) stop interval=0s timeout=60 (oomtrshare-stop-timeout-60) monitor interval=20 timeout=40 (oomtrshare-monitor-interval-20) Resource: oomtrserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/oomtr/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (oomtrserver-start-timeout-40) stop interval=0s timeout=20s (oomtrserver-stop-timeout-20s) monitor interval=10 timeout=20s (oomtrserver-monitor-interval-10) Group: oomblg Resource: oomblshare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sde1 directory=/oombl fstype=ext4 Operations: start interval=0s timeout=60 (oomblshare-start-timeout-60) stop interval=0s timeout=60 (oomblshare-stop-timeout-60) monitor interval=20 timeout=40 (oomblshare-monitor-interval-20) Resource: oomblserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/oombl/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (oomblserver-start-timeout-40) stop interval=0s timeout=20s
Re: [ClusterLabs] multiple drives looks like balancing but why and causing troubles
On 26/08/15 02:46 PM, Streeter, Michelle N wrote: I have a two node cluster. Both nodes are virtual and have five shared drives attached via sas controller. For some reason, the cluster shows both nodes have half the drives started on them. Not sure if this is called split brain or not. It definitely looks load balancing. But I did not set up load balancing. On my client, I only see the data for the shares on the active cluster node. But they should all be on the active cluster node. Any suggestions as to why this is happening? Is there a setting so that everything works on only one node at a time? Can you explain what you mean by shared drives? Are these iSCSI LUNs or direct connections to either port on SAS drives? A split-brain is when either node things the other is dead and is operating without coordinating with the peer. It is a disasterous situation with shared storage and it is what fencing (stonith) prevents, which you don't have configured. If you are using KVM, use fence_virsh or fence_virt. If you're using vmware, use fence_vmware. Please make this a priority before solving your storage issue. pcs cluster status: Cluster name: CNAS Last updated: Wed Aug 26 13:35:47 2015 Last change: Wed Aug 26 13:28:55 2015 Stack: classic openais (with plugin) Current DC: nas02 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 11 Resources configured Online: [ nas01 nas02 ] Full list of resources: NAS(ocf::heartbeat:IPaddr2): Started nas01 Resource Group: datag datashare (ocf::heartbeat:Filesystem):Started nas02 dataserver (ocf::heartbeat:nfsserver): Started nas02 Resource Group: oomtlg oomtlshare (ocf::heartbeat:Filesystem):Started nas01 oomtlserver(ocf::heartbeat:nfsserver): Started nas01 Resource Group: oomtrg oomtrshare (ocf::heartbeat:Filesystem):Started nas02 oomtrserver(ocf::heartbeat:nfsserver): Started as02 Resource Group: oomblg oomblshare (ocf::heartbeat:Filesystem):Started nas01 oomblserver(ocf::heartbeat:nfsserver): Started nas01 Resource Group: oombrg oombrshare (ocf::heartbeat:Filesystem):Started nas02 oombrserver(ocf::heartbeat:nfsserver): Started nas02 pcs config show: Cluster Name: CNAS Corosync Nodes: nas01 nas02 Pacemaker Nodes: nas01 nas02 Resources: Resource: NAS (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.56.110 cidr_netmask=24 Operations: start interval=0s timeout=20s (NAS-start-timeout-20s) stop interval=0s timeout=20s (NAS-stop-timeout-20s) monitor interval=10s timeout=20s (NAS-monitor-interval-10s) Group: datag Resource: datashare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sdb1 directory=/data fstype=ext4 Operations: start interval=0s timeout=60 (datashare-start-timeout-60) stop interval=0s timeout=60 (datashare-stop-timeout-60) monitor interval=20 timeout=40 (datashare-monitor-interval-20) Resource: dataserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/data/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (dataserver-start-timeout-40) stop interval=0s timeout=20s (dataserver-stop-timeout-20s) monitor interval=10 timeout=20s (dataserver-monitor-interval-10) Group: oomtlg Resource: oomtlshare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sdc1 directory=/oomtl fstype=ext4 Operations: start interval=0s timeout=60 (oomtlshare-start-timeout-60) stop interval=0s timeout=60 (oomtlshare-stop-timeout-60) monitor interval=20 timeout=40 (oomtlshare-monitor-interval-20) Resource: oomtlserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/oomtl/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40 (oomtlserver-start-timeout-40) stop interval=0s timeout=20s (oomtlserver-stop-timeout-20s) monitor interval=10 timeout=20s (oomtlserver-monitor-interval-10) Group: oomtrg Resource: oomtrshare (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/sdd1 directory=/oomtr fstype=ext4 Operations: start interval=0s timeout=60 (oomtrshare-start-timeout-60) stop interval=0s timeout=60 (oomtrshare-stop-timeout-60) monitor interval=20 timeout=40 (oomtrshare-monitor-interval-20) Resource: oomtrserver (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/oomtr/nfsinfo nfs_no_notify=true Operations: start interval=0s timeout=40
Re: [ClusterLabs] fence_sanlock and pacemaker
On 27 Aug 2015, at 4:11 am, Laurent B. laure...@qmail.re wrote: Gents, I'm trying to configure a HA cluster with RHEL 6.5. Everything goes well except the fencing. The cluster's node are not connected to the management lan (where stand all the iLO/UPS/APC devices) and it's not planned to connecting them to this lan. With these constraints, I figured out that a way to get fencing working is to use *fence_sanlock*. I followed this tutorial: https://alteeve.ca/w/Watchdog_Recovery and I it worked (I got some problem with SELinux that I finally disabled like specified in the following RHEL 6.5 release note: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.5_Technical_Notes/ ) So perfect. The problem is that fence_sanlock relies on cman and not pacemaker. So with stonith disabled, pacemaker restarts the resources without waiting for the victim to be fenced and with stonith enabled, pacemaker complains about the lack of stonith resources and block all the cluster. I tried to put fence_sanlock as a stonith resource at the pacemaker level but as explained there http://oss.clusterlabs.org/pipermail/pacemaker/2013-May/017980.html it does not work and as explained there https://bugzilla.redhat.com/show_bug.cgi?id=962088 it's not planned to make it work. My question: is there a workaround ? You’d have to build it yourself, but sbd could be an option Thank you, Laurent ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] resource-stickiness
Hi list, I have configure simple cluster on sles 11 sp4 and have a problem with auto_failover off. The problem is that when ever I migrate resource group via HAWK my configuration change from: location cli-prefer-aapche aapche role=Started 10: sles2 to: location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1 It keep change to inf. and then after fance node, resource is moving back to original node which I don't want. How can I avoid this situation? my configuration is: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=x.x.x.x \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure=classic openais (with plugin) \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true and after migration: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=10.9.131.86 \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure=classic openais (with plugin) \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true thanks Best Regards Jost ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] resource-stickiness
Sorry one typo: problem is the same location cli-prefer-aapche aapche role=Started 10: sles2 to: location cli-prefer-aapche aapche role=Started inf: sles2 It keep change to infinity. my configuration is: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=x.x.x.x \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure=classic openais (with plugin) \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true and after migration: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=10.9.131.86 \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started inf: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure=classic openais (with plugin) \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true From: Rakovec Jost Sent: Wednesday, August 26, 2015 1:33 PM To: users@clusterlabs.org Subject: resource-stickiness Hi list, I have configure simple cluster on sles 11 sp4 and have a problem with “auto_failover off. The problem is that when ever I migrate resource group via HAWK my configuration change from: location cli-prefer-aapche aapche role=Started 10: sles2 to: location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1 It keep change to inf. and then after fance node, resource is moving back to original node which I don't want. How can I avoid this situation? my configuration is: node sles1 node sles2 primitive filesystem Filesystem \ params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op monitor interval=20 timeout=40 primitive myip IPaddr2 \ params ip=x.x.x.x \ op start interval=0 timeout=20s \ op stop interval=0 timeout=20s \ op monitor interval=10s timeout=20s primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 primitive web apache \ params configfile=/etc/apache2/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ op monitor interval=10 timeout=20s group aapche filesystem myip web \ meta target-role=Started is-managed=true resource-stickiness=1000 location cli-prefer-aapche aapche role=Started 10: sles2 property cib-bootstrap-options: \ stonith-enabled=true \ no-quorum-policy=ignore \ placement-strategy=balanced \ expected-quorum-votes=2 \ dc-version=1.1.12-f47ea56 \ cluster-infrastructure=classic openais (with plugin) \ last-lrm-refresh=1440502955 \ stonith-timeout=40s rsc_defaults rsc-options: \ resource-stickiness=1000 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true and after migration:
[ClusterLabs] Antw: NFS exports
Streeter, Michelle N michelle.n.stree...@boeing.com schrieb am 26.08.2015 um 15:42 in Nachricht 9a18847a77a9a14da7e0fd240efcafc2504...@xch-phx-501.sw.nos.boeing.com: I have been using linux /etc/exports to put my exports for my cluster and it works fine this way as long as every node has this done. I tried to add the exportfs resource but this keeps failing. Did you use fully qualified names? Is it preferred that we use /etc/exports or the exportfs for pacemaker? Michelle Streeter ASC2 MCS - SDE/ACL/SDL/EDL OKC Software Engineer The Boeing Company ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org