[ClusterLabs] fence_sanlock and pacemaker

2015-08-26 Thread Laurent B.
Gents,

I'm trying to configure a HA cluster with RHEL 6.5. Everything goes well
except the fencing. The cluster's node are not connected to the
management lan (where stand all the iLO/UPS/APC devices) and it's not
planned to connecting them to this lan.

With these constraints, I figured out that a way to get fencing working
is to use *fence_sanlock*. I followed this tutorial:
https://alteeve.ca/w/Watchdog_Recovery and I it worked (I got some
problem with SELinux that I finally disabled like specified in the
following RHEL 6.5 release note:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.5_Technical_Notes/
)

So perfect. The problem is that fence_sanlock relies on cman and not
pacemaker. So with stonith disabled, pacemaker restarts the resources
without waiting for the victim to be fenced and with stonith enabled,
pacemaker complains about the lack of stonith resources and block all
the cluster.
I tried to put fence_sanlock as a stonith resource at the pacemaker
level but as explained there
http://oss.clusterlabs.org/pipermail/pacemaker/2013-May/017980.html it
does not work and as explained there
https://bugzilla.redhat.com/show_bug.cgi?id=962088 it's not planned to
make it work.

My question: is there a workaround ?

Thank you,

Laurent

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] multiple drives looks like balancing but why and causing troubles

2015-08-26 Thread Streeter, Michelle N
I have a two node cluster.  Both nodes are virtual and have five shared drives 
attached via sas controller.  For some reason, the cluster shows both nodes 
have half the drives started on them.   Not sure if this is called split brain 
or not.   It definitely looks load balancing.   But I did not set up load 
balancing.   On my client, I only see the data for the shares on the active 
cluster node.   But they should all be on the active cluster node.  Any 
suggestions as to why this is happening?  Is there a setting so that everything 
works on only one node at a time?

pcs cluster status:
Cluster name: CNAS
Last updated: Wed Aug 26 13:35:47 2015
Last change: Wed Aug 26 13:28:55 2015
Stack: classic openais (with plugin)
Current DC: nas02 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
11 Resources configured


Online: [ nas01 nas02 ]

Full list of resources:

NAS(ocf::heartbeat:IPaddr2):   Started nas01
Resource Group: datag
 datashare  (ocf::heartbeat:Filesystem):Started nas02
 dataserver (ocf::heartbeat:nfsserver): Started nas02
Resource Group: oomtlg
 oomtlshare (ocf::heartbeat:Filesystem):Started nas01
 oomtlserver(ocf::heartbeat:nfsserver): Started nas01
Resource Group: oomtrg
 oomtrshare (ocf::heartbeat:Filesystem):Started nas02
 oomtrserver(ocf::heartbeat:nfsserver): Started as02
Resource Group: oomblg
 oomblshare (ocf::heartbeat:Filesystem):Started nas01
 oomblserver(ocf::heartbeat:nfsserver): Started nas01
Resource Group: oombrg
 oombrshare (ocf::heartbeat:Filesystem):Started nas02
 oombrserver(ocf::heartbeat:nfsserver): Started nas02

pcs config show:
Cluster Name: CNAS
Corosync Nodes:
nas01 nas02
Pacemaker Nodes:
nas01 nas02

Resources:
Resource: NAS (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.56.110 cidr_netmask=24
  Operations: start interval=0s timeout=20s (NAS-start-timeout-20s)
  stop interval=0s timeout=20s (NAS-stop-timeout-20s)
  monitor interval=10s timeout=20s (NAS-monitor-interval-10s)
Group: datag
  Resource: datashare (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/sdb1 directory=/data fstype=ext4
   Operations: start interval=0s timeout=60 (datashare-start-timeout-60)
   stop interval=0s timeout=60 (datashare-stop-timeout-60)
   monitor interval=20 timeout=40 (datashare-monitor-interval-20)
  Resource: dataserver (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/data/nfsinfo nfs_no_notify=true
   Operations: start interval=0s timeout=40 (dataserver-start-timeout-40)
   stop interval=0s timeout=20s (dataserver-stop-timeout-20s)
   monitor interval=10 timeout=20s (dataserver-monitor-interval-10)
Group: oomtlg
  Resource: oomtlshare (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/sdc1 directory=/oomtl fstype=ext4
   Operations: start interval=0s timeout=60 (oomtlshare-start-timeout-60)
   stop interval=0s timeout=60 (oomtlshare-stop-timeout-60)
   monitor interval=20 timeout=40 (oomtlshare-monitor-interval-20)
  Resource: oomtlserver (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/oomtl/nfsinfo nfs_no_notify=true
   Operations: start interval=0s timeout=40 (oomtlserver-start-timeout-40)
   stop interval=0s timeout=20s (oomtlserver-stop-timeout-20s)
   monitor interval=10 timeout=20s (oomtlserver-monitor-interval-10)
Group: oomtrg
  Resource: oomtrshare (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/sdd1 directory=/oomtr fstype=ext4
   Operations: start interval=0s timeout=60 (oomtrshare-start-timeout-60)
   stop interval=0s timeout=60 (oomtrshare-stop-timeout-60)
   monitor interval=20 timeout=40 (oomtrshare-monitor-interval-20)
  Resource: oomtrserver (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/oomtr/nfsinfo nfs_no_notify=true
   Operations: start interval=0s timeout=40 (oomtrserver-start-timeout-40)
   stop interval=0s timeout=20s (oomtrserver-stop-timeout-20s)
   monitor interval=10 timeout=20s (oomtrserver-monitor-interval-10)
Group: oomblg
  Resource: oomblshare (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/sde1 directory=/oombl fstype=ext4
   Operations: start interval=0s timeout=60 (oomblshare-start-timeout-60)
   stop interval=0s timeout=60 (oomblshare-stop-timeout-60)
   monitor interval=20 timeout=40 (oomblshare-monitor-interval-20)
  Resource: oomblserver (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/oombl/nfsinfo nfs_no_notify=true
   Operations: start interval=0s timeout=40 (oomblserver-start-timeout-40)
   stop interval=0s timeout=20s 

Re: [ClusterLabs] multiple drives looks like balancing but why and causing troubles

2015-08-26 Thread Digimer
On 26/08/15 02:46 PM, Streeter, Michelle N wrote:
 I have a two node cluster.  Both nodes are virtual and have five shared
 drives attached via sas controller.  For some reason, the cluster shows
 both nodes have half the drives started on them.   Not sure if this is
 called split brain or not.   It definitely looks load balancing.   But I
 did not set up load balancing.   On my client, I only see the data for
 the shares on the active cluster node.   But they should all be on the
 active cluster node.  Any suggestions as to why this is happening?  Is
 there a setting so that everything works on only one node at a time?

Can you explain what you mean by shared drives? Are these iSCSI LUNs
or direct connections to either port on SAS drives?

A split-brain is when either node things the other is dead and is
operating without coordinating with the peer. It is a disasterous
situation with shared storage and it is what fencing (stonith) prevents,
which you don't have configured.

If you are using KVM, use fence_virsh or fence_virt. If you're using
vmware, use fence_vmware. Please make this a priority before solving
your storage issue.

 pcs cluster status:
 
 Cluster name: CNAS
 
 Last updated: Wed Aug 26 13:35:47 2015
 
 Last change: Wed Aug 26 13:28:55 2015
 
 Stack: classic openais (with plugin)
 
 Current DC: nas02 - partition with quorum
 
 Version: 1.1.11-97629de
 
 2 Nodes configured, 2 expected votes
 
 11 Resources configured
 
  
 
  
 
 Online: [ nas01 nas02 ]
 
  
 
 Full list of resources:
 
  
 
 NAS(ocf::heartbeat:IPaddr2):   Started nas01
 
 Resource Group: datag
 
  datashare  (ocf::heartbeat:Filesystem):Started nas02
 
  dataserver (ocf::heartbeat:nfsserver): Started nas02
 
 Resource Group: oomtlg
 
  oomtlshare (ocf::heartbeat:Filesystem):Started nas01
 
  oomtlserver(ocf::heartbeat:nfsserver): Started nas01
 
 Resource Group: oomtrg
 
  oomtrshare (ocf::heartbeat:Filesystem):Started nas02
 
  oomtrserver(ocf::heartbeat:nfsserver): Started as02
 
 Resource Group: oomblg
 
  oomblshare (ocf::heartbeat:Filesystem):Started nas01
 
  oomblserver(ocf::heartbeat:nfsserver): Started nas01
 
 Resource Group: oombrg
 
  oombrshare (ocf::heartbeat:Filesystem):Started nas02
 
  oombrserver(ocf::heartbeat:nfsserver): Started nas02
 
  
 
 pcs config show:
 
 Cluster Name: CNAS
 
 Corosync Nodes:
 
 nas01 nas02
 
 Pacemaker Nodes:
 
 nas01 nas02
 
  
 
 Resources:
 
 Resource: NAS (class=ocf provider=heartbeat type=IPaddr2)
 
   Attributes: ip=192.168.56.110 cidr_netmask=24
 
   Operations: start interval=0s timeout=20s (NAS-start-timeout-20s)
 
   stop interval=0s timeout=20s (NAS-stop-timeout-20s)
 
   monitor interval=10s timeout=20s (NAS-monitor-interval-10s)
 
 Group: datag
 
   Resource: datashare (class=ocf provider=heartbeat type=Filesystem)
 
Attributes: device=/dev/sdb1 directory=/data fstype=ext4
 
Operations: start interval=0s timeout=60 (datashare-start-timeout-60)
 
stop interval=0s timeout=60 (datashare-stop-timeout-60)
 
monitor interval=20 timeout=40
 (datashare-monitor-interval-20)
 
   Resource: dataserver (class=ocf provider=heartbeat type=nfsserver)
 
Attributes: nfs_shared_infodir=/data/nfsinfo nfs_no_notify=true
 
Operations: start interval=0s timeout=40 (dataserver-start-timeout-40)
 
stop interval=0s timeout=20s (dataserver-stop-timeout-20s)
 
monitor interval=10 timeout=20s
 (dataserver-monitor-interval-10)
 
 Group: oomtlg
 
   Resource: oomtlshare (class=ocf provider=heartbeat type=Filesystem)
 
Attributes: device=/dev/sdc1 directory=/oomtl fstype=ext4
 
Operations: start interval=0s timeout=60 (oomtlshare-start-timeout-60)
 
stop interval=0s timeout=60 (oomtlshare-stop-timeout-60)
 
monitor interval=20 timeout=40
 (oomtlshare-monitor-interval-20)
 
   Resource: oomtlserver (class=ocf provider=heartbeat type=nfsserver)
 
Attributes: nfs_shared_infodir=/oomtl/nfsinfo nfs_no_notify=true
 
Operations: start interval=0s timeout=40 (oomtlserver-start-timeout-40)
 
stop interval=0s timeout=20s (oomtlserver-stop-timeout-20s)
 
monitor interval=10 timeout=20s
 (oomtlserver-monitor-interval-10)
 
 Group: oomtrg
 
   Resource: oomtrshare (class=ocf provider=heartbeat type=Filesystem)
 
Attributes: device=/dev/sdd1 directory=/oomtr fstype=ext4
 
Operations: start interval=0s timeout=60 (oomtrshare-start-timeout-60)
 
stop interval=0s timeout=60 (oomtrshare-stop-timeout-60)
 
monitor interval=20 timeout=40
 (oomtrshare-monitor-interval-20)
 
   Resource: oomtrserver (class=ocf provider=heartbeat type=nfsserver)
 
Attributes: nfs_shared_infodir=/oomtr/nfsinfo nfs_no_notify=true
 
Operations: start interval=0s timeout=40 

Re: [ClusterLabs] fence_sanlock and pacemaker

2015-08-26 Thread Andrew Beekhof

 On 27 Aug 2015, at 4:11 am, Laurent B. laure...@qmail.re wrote:
 
 Gents,
 
 I'm trying to configure a HA cluster with RHEL 6.5. Everything goes well
 except the fencing. The cluster's node are not connected to the
 management lan (where stand all the iLO/UPS/APC devices) and it's not
 planned to connecting them to this lan.
 
 With these constraints, I figured out that a way to get fencing working
 is to use *fence_sanlock*. I followed this tutorial:
 https://alteeve.ca/w/Watchdog_Recovery and I it worked (I got some
 problem with SELinux that I finally disabled like specified in the
 following RHEL 6.5 release note:
 https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/6.5_Technical_Notes/
 )
 
 So perfect. The problem is that fence_sanlock relies on cman and not
 pacemaker. So with stonith disabled, pacemaker restarts the resources
 without waiting for the victim to be fenced and with stonith enabled,
 pacemaker complains about the lack of stonith resources and block all
 the cluster.
 I tried to put fence_sanlock as a stonith resource at the pacemaker
 level but as explained there
 http://oss.clusterlabs.org/pipermail/pacemaker/2013-May/017980.html it
 does not work and as explained there
 https://bugzilla.redhat.com/show_bug.cgi?id=962088 it's not planned to
 make it work.
 
 My question: is there a workaround ?

You’d have to build it yourself, but sbd could be an option

 
 Thank you,
 
 Laurent
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] resource-stickiness

2015-08-26 Thread Rakovec Jost
Hi list,



I have configure simple cluster on sles 11 sp4 and have a problem with 
auto_failover off. The problem is that when ever I migrate resource group via 
HAWK my configuration change from:


location cli-prefer-aapche aapche role=Started 10: sles2

to:

location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1


It keep change to inf.


and then after fance node, resource is moving back to original node which I 
don't want. How can I avoid this situation?

my configuration is:

node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=x.x.x.x \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile=/etc/apache2/httpd.conf \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-prefer-aapche aapche role=Started 10: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure=classic openais (with plugin) \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true



and after migration:

node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=10.9.131.86 \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile=/etc/apache2/httpd.conf \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1
location cli-prefer-aapche aapche role=Started 10: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure=classic openais (with plugin) \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true




thanks

Best Regards

Jost







___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource-stickiness

2015-08-26 Thread Rakovec Jost
Sorry  one typo: problem is the same



location cli-prefer-aapche aapche role=Started 10: sles2

to:

location cli-prefer-aapche aapche role=Started inf: sles2



It keep change to infinity.




my configuration is:

node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=x.x.x.x \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile=/etc/apache2/httpd.conf \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-prefer-aapche aapche role=Started 10: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure=classic openais (with plugin) \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true



and after migration:


node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=10.9.131.86 \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile=/etc/apache2/httpd.conf \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-prefer-aapche aapche role=Started inf: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure=classic openais (with plugin) \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true




From: Rakovec Jost
Sent: Wednesday, August 26, 2015 1:33 PM
To: users@clusterlabs.org
Subject: resource-stickiness


Hi list,



I have configure simple cluster on sles 11 sp4 and have a problem with 
“auto_failover off. The problem is that when ever I migrate resource group via 
HAWK my configuration change from:


location cli-prefer-aapche aapche role=Started 10: sles2

to:

location cli-ban-aapche-on-sles1 aapche role=Started -inf: sles1


It keep change to inf.


and then after fance node, resource is moving back to original node which I 
don't want. How can I avoid this situation?

my configuration is:

node sles1
node sles2
primitive filesystem Filesystem \
   params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
   op start interval=0 timeout=60 \
   op stop interval=0 timeout=60 \
   op monitor interval=20 timeout=40
primitive myip IPaddr2 \
   params ip=x.x.x.x \
   op start interval=0 timeout=20s \
   op stop interval=0 timeout=20s \
   op monitor interval=10s timeout=20s
primitive stonith_sbd stonith:external/sbd \
   params pcmk_delay_max=30
primitive web apache \
   params configfile=/etc/apache2/httpd.conf \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=60s \
   op monitor interval=10 timeout=20s
group aapche filesystem myip web \
   meta target-role=Started is-managed=true resource-stickiness=1000
location cli-prefer-aapche aapche role=Started 10: sles2
property cib-bootstrap-options: \
   stonith-enabled=true \
   no-quorum-policy=ignore \
   placement-strategy=balanced \
   expected-quorum-votes=2 \
   dc-version=1.1.12-f47ea56 \
   cluster-infrastructure=classic openais (with plugin) \
   last-lrm-refresh=1440502955 \
   stonith-timeout=40s
rsc_defaults rsc-options: \
   resource-stickiness=1000 \
   migration-threshold=3
op_defaults op-options: \
   timeout=600 \
   record-pending=true



and after migration:


[ClusterLabs] Antw: NFS exports

2015-08-26 Thread Ulrich Windl
 Streeter, Michelle N michelle.n.stree...@boeing.com schrieb am 
 26.08.2015
um 15:42 in Nachricht
9a18847a77a9a14da7e0fd240efcafc2504...@xch-phx-501.sw.nos.boeing.com:
 I have been using linux /etc/exports to put my exports for my cluster and it 
 works fine this way as long as every node has this done.
 
 I tried to add the exportfs resource but this keeps failing.

Did you use fully qualified names?

 
 Is it preferred that we use /etc/exports or the exportfs for pacemaker?
 
 Michelle Streeter
 ASC2 MCS - SDE/ACL/SDL/EDL OKC Software Engineer
 The Boeing Company





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org