subject:"\\\\\\\[ClusterLabs\\\\\\\] DRBD failover in Pacemaker"

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-08 Thread Dimitri Maziuk

On 09/08/2016 06:33 PM, Digimer wrote:

> With 'fencing resource-and-stonith;' and a {un,}fence-handler set, DRBD
> will block when the peer is lost until the fence handler script returns
> indicating the peer was fenced/stonithed. In this way, the secondary
> WON'T promote to Primary while the peer is still Primary. It will only
> promote AFTER confirmation that the old Primary is gone. Thus, no
> split-brain.

In 7 or 8 years of running several DRBD pairs I had split brain about 5
times and at least 2 of them were because I tugged on the crosslink
cable while mucking around the back of the rack. Maybe if you run a
zillion of stacked active-active resources on a 100-node cluster DRBD
split brain becomes a real problem, from where I'm sitting stonith'ing
DRBD nodes is a solution in search of a problem.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-08 Thread Dmitri Maziuk


On 2016-09-08 02:03, Digimer wrote:


You need to solve the problem with fencing in DRBD. Leaving it off WILL
result in a split-brain eventually, full stop. With working fencing, you
will NOT get a split-brain, full stop.


"Split brain is a situation where, due to temporary failure of all 
network links between cluster nodes, and possibly due to intervention by 
a cluster management software or human error, both nodes switched to the 
primary role while disconnected."

 -- DRBD Users Guide 8.4 # 2.9 Split brain notification.

About the only practical problem with *DRBD* split brain under pacemaker 
is that pacemaker won't let you run "drbdadm secondary && drbdadm 
connect --discard-my-data" as easy as busted ancient code did.


Dima


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-08 Thread Digimer

> Thank you for the responses, I followed Digimer's instructions along with 
> some information I had read on the DRBD site and configured fencing on the 
> DRBD resource. I also configured STONITH using IPMI in Pacemaker. I setup 
> Pacemaker first and verified that it kills the other node. 
> 
> After configuring DRBD fencing though I ran into a problem where failover 
> stopped working. If I disable fencing in DRBD when one node is taken offline 
> pacemaker kills it and everything fails over to the other as I would expect, 
> but with fencing enabled the second node doesn't become master in DRBD until 
> the first node completely finishes rebooting. This makes for a lot of 
> downtime, and if one of the nodes has a hardware failure it would never fail 
> over. I think its something to do with the fencing scripts. 
> 
> I am looking for complete redundancy including in the event of hardware 
> failure. Is there a way I can prevent Split-Brain while still allowing for 
> DRBD to failover to the other node? Right now I have only STONITH configured 
> in pacemaker and fencing turned OFF in DRBD. So far it works as I want it to 
> but sometimes when communication is lost between the two nodes the wrong one 
> ends up getting killed, and when that happens it results in Split-Brain on 
> recovery. I hope I described the situation well enough for someone to offer a 
> little help. I'm currently experimenting with the delays before STONITH to 
> see if I can figure something out.
> 
> Thank you,
> Devin

You need to solve the problem with fencing in DRBD. Leaving it off WILL
result in a split-brain eventually, full stop. With working fencing, you
will NOT get a split-brain, full stop.

With working fencing; nodes will block if fencing fails. So as an
example, if the IPMI fencing fails because the IPMI BMC died with the
host, then the surviving node(s) will hang. The logic is that it is
better to hang than risk a split-brain/corruption.

If fencing via IPMI works, then pacemaker should be told as much by
fence_ipmilan and recover as soon as the fence agent exits. If it
doesn't recover until the node returns, fencing is NOT configured
properly (or otherwise not working).

If you want to make sure that the cluster will recover no matter what,
then you will need a backup fence method. We do this by using IPMI as
the primary fence method and a pair of switched PDUs as a backup. So
with this setup, if a node fails, first pacemaker will try to shoot the
peer using IPMI. If IPMI fails (say because the host lost all power),
pacemaker gives up and moves on to PDU fencing. In this case, both PDUs
are called to open the circuits feeding the lost node, thus ensuring it
is off.

If for some reason both methods fail, pacemaker goes back to IPMI and
tries that again, then on to PDUs, ... and will loop until one of the
methods succeeds, leaving the cluster (intentionally) hung in the mean time.

digimer

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-08 Thread Devin Ortner


Message: 1
Date: Wed, 7 Sep 2016 19:23:04 +0900
From: Digimer <li...@alteeve.ca>
To: Cluster Labs - All topics related to open-source clustering
welcomed<users@clusterlabs.org>
Subject: Re: [ClusterLabs] DRBD failover in Pacemaker
Message-ID: <b1e95242-1b0d-ed28-2ba8-d6b58d152...@alteeve.ca>
Content-Type: text/plain; charset=windows-1252

> no-quorum-policy: ignore
> stonith-enabled: false

You must have fencing configured.

CentOS 6 uses pacemaker with the cman plugin. So setup cman
(cluster.conf) to use the fence_pcmk passthrough agent, then setup proper 
stonith in pacemaker (and test that it works). Finally, tell DRBD to use 
'fencing resource-and-stonith;' and configure the 'crm-{un,}fence-peer.sh' 
{un,}fence handlers.

See if that gets things working.

On 07/09/16 04:04 AM, Devin Ortner wrote:
> I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have 
> been using the "Clusters from Scratch" documentation to create my cluster and 
> I am running into a problem where DRBD is not failing over to the other node 
> when one goes down. Here is my "pcs status" prior to when it is supposed to 
> fail over:
> 
> --
> 
> 
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:50:21 2016Last change: Tue Sep  6 
> 14:50:17 2016 by root via crm_attribute on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with 
> quorum
> 2 nodes and 5 resources configured
> 
> Online: [ node1 node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):   Started node1
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>  Masters: [ node1 ]
>  Slaves: [ node2 ]
>  ClusterFS(ocf::heartbeat:Filesystem):Started node1
>  WebSite  (ocf::heartbeat:apache):Started node1
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
> last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> When I put node1 in standby everything fails over except DRBD:
> --
> 
> 
> [root@node1 ~]# pcs cluster standby node1
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:53:45 2016Last change: Tue Sep  6 
> 14:53:37 2016 by root via cibadmin on node2
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with 
> quorum
> 2 nodes and 5 resources configured
> 
> Node node1: standby
> Online: [ node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):   Started node2
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>  Slaves: [ node2 ]
>  Stopped: [ node1 ]
>  ClusterFS(ocf::heartbeat:Filesystem):Stopped
>  WebSite  (ocf::heartbeat:apache):Started node2
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
> last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> I have pasted the contents of "/var/log/messages" here: 
> http://pastebin.com/0i0FMzGZ Here is my Configuration: 
> http://pastebin.com/HqqBV90p
> 
> When I unstandby node1, it comes back as the master for the DRBD and 
> everything else stays running on node2 (Which is fine because I haven't setup 
> colocation constraints for that) Here is what I have after node1 is back:
> -
> 
> [root@node1 ~]# pcs cluster unstandby node1
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:57:46 2016Last change: Tue Sep  6 
> 14:57:42 2016 by root via cibadmin on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with 
> quorum
> 2 nodes and 5 resources configured
> 
> Online: [ node1 node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):   Started node2
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>  Masters: [ node1 ]
>  Slaves: [ node2 ]
>  ClusterFS(ocf::heartbeat:Filesystem):Started node1
>  WebSite  (ocf::heartbeat:apache):Started node2
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, st

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-07 Thread Ken Gaillot

On 09/06/2016 02:04 PM, Devin Ortner wrote:
> I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have 
> been using the "Clusters from Scratch" documentation to create my cluster and 
> I am running into a problem where DRBD is not failing over to the other node 
> when one goes down. Here is my "pcs status" prior to when it is supposed to 
> fail over:

The most up-to-date version of Clusters From Scratch targets CentOS 7.1,
which has corosync 2, pcs, and a recent pacemaker:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html

There is an older version targeting Fedora 13, which has CMAN, corosync
1, the crm shell, and an older pacemaker:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html

Your system is in between, with CMAN, corosync 1, pcs, and a newer
pacemaker, so you might want to compare the two guides as you go.

> --
> 
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:50:21 2016Last change: Tue Sep  6 
> 14:50:17 2016 by root via crm_attribute on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Online: [ node1 node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):   Started node1
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>  Masters: [ node1 ]
>  Slaves: [ node2 ]
>  ClusterFS(ocf::heartbeat:Filesystem):Started node1
>  WebSite  (ocf::heartbeat:apache):Started node1
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
> last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms

'unknown error' means the Filesystem resource agent returned an error
status. Check the system log for messages from the resource agent to see
what the error actually was.

> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> When I put node1 in standby everything fails over except DRBD:
> --
> 
> [root@node1 ~]# pcs cluster standby node1
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:53:45 2016Last change: Tue Sep  6 
> 14:53:37 2016 by root via cibadmin on node2
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Node node1: standby
> Online: [ node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):   Started node2
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>  Slaves: [ node2 ]
>  Stopped: [ node1 ]
>  ClusterFS(ocf::heartbeat:Filesystem):Stopped
>  WebSite  (ocf::heartbeat:apache):Started node2
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
> last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> I have pasted the contents of "/var/log/messages" here: 
> http://pastebin.com/0i0FMzGZ 
> Here is my Configuration: http://pastebin.com/HqqBV90p 

One thing lacking in Clusters From Scratch is that master/slave
resources such as ClusterDB should have two monitor operations, one for
the master role and one for the slave role. Something like:

op monitor interval=59s role=Master op monitor interval=60s role=Slave

Not sure if that will help your issue, but it's a good idea.

Another thing the guide should do differently is configure stonith
before drbd.

Once you have fencing working in pacemaker, take a look at LINBIT's DRBD
User Guide for whatever version you installed (
https://www.drbd.org/en/doc ) and look for the Pacemaker chapter. It
will describe how to connect the fencing between DRBD and Pacemaker's CIB.

Your constraints need a few tweaks: you have two "ClusterFS with
ClusterDBclone", one with "with-rsc-role:Master" and one without. You
want the one with Master. Your "Cluster_VIP with ClusterDBclone" should
also be with Master. When you colocate with a clone without specifying
the role, it means the resource can run anywhere any instance of the
clone is running (whether slave or master). In this case, you only want
the resources to run with the master instance, so you need to specify
that. That could be the main source of your issue.

> When I unstandby node1, it comes back as the master for the DRBD and 
> everything else stays running on node2 (Which is fine because I haven't setup 
> colocation constraints for that)
> Here is what I have after node1 is back: 
> -
> 
> [root@node1

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-07 Thread Greg Woods

On Tue, Sep 6, 2016 at 1:04 PM, Devin Ortner <
devin.ort...@gtshq.onmicrosoft.com> wrote:

> Master/Slave Set: ClusterDBclone [ClusterDB]
>  Masters: [ node1 ]
>  Slaves: [ node2 ]
>  ClusterFS  (ocf::heartbeat:Filesystem):Started node1
>

As Digimer said, you really need fencing when you are using DRBD. Otherwise
it's only a matter of time before your shared filesystem gets corrupted.

You also need an order constraint to be sure that the ClusterFS Filesystem
does not start until after the Master DRBD resource, and a colocation
constraint to ensure these are on the same node.

--Greg
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-07 Thread Dmitri Maziuk


On 2016-09-06 14:04, Devin Ortner wrote:

I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD.
I  have been using the "Clusters from Scratch" documentation to create my

cluster and I am running into a problem where DRBD is not failing over
to the other node when one goes down.

I forget if Clusters From Scratch spell this out: you have to create the 
DRBD volume and let it finish the initial sync before you let pacemaker 
near it. Was 'cat /proc/drbd' showing UpToDate/UpToDdate 
Primary/Secondary when you tried the failover?


Ignore the "stonith is optional; you *must* use stonith" mantra du jour.

Dima


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-07 Thread Digimer

> no-quorum-policy: ignore
> stonith-enabled: false

You must have fencing configured.

CentOS 6 uses pacemaker with the cman plugin. So setup cman
(cluster.conf) to use the fence_pcmk passthrough agent, then setup
proper stonith in pacemaker (and test that it works). Finally, tell DRBD
to use 'fencing resource-and-stonith;' and configure the
'crm-{un,}fence-peer.sh' {un,}fence handlers.

See if that gets things working.

On 07/09/16 04:04 AM, Devin Ortner wrote:
> I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have 
> been using the "Clusters from Scratch" documentation to create my cluster and 
> I am running into a problem where DRBD is not failing over to the other node 
> when one goes down. Here is my "pcs status" prior to when it is supposed to 
> fail over:
> 
> --
> 
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:50:21 2016Last change: Tue Sep  6 
> 14:50:17 2016 by root via crm_attribute on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Online: [ node1 node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):   Started node1
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>  Masters: [ node1 ]
>  Slaves: [ node2 ]
>  ClusterFS(ocf::heartbeat:Filesystem):Started node1
>  WebSite  (ocf::heartbeat:apache):Started node1
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
> last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> When I put node1 in standby everything fails over except DRBD:
> --
> 
> [root@node1 ~]# pcs cluster standby node1
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:53:45 2016Last change: Tue Sep  6 
> 14:53:37 2016 by root via cibadmin on node2
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Node node1: standby
> Online: [ node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):   Started node2
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>  Slaves: [ node2 ]
>  Stopped: [ node1 ]
>  ClusterFS(ocf::heartbeat:Filesystem):Stopped
>  WebSite  (ocf::heartbeat:apache):Started node2
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
> last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> I have pasted the contents of "/var/log/messages" here: 
> http://pastebin.com/0i0FMzGZ 
> Here is my Configuration: http://pastebin.com/HqqBV90p 
> 
> When I unstandby node1, it comes back as the master for the DRBD and 
> everything else stays running on node2 (Which is fine because I haven't setup 
> colocation constraints for that)
> Here is what I have after node1 is back: 
> -
> 
> [root@node1 ~]# pcs cluster unstandby node1
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:57:46 2016Last change: Tue Sep  6 
> 14:57:42 2016 by root via cibadmin on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Online: [ node1 node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):   Started node2
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>  Masters: [ node1 ]
>  Slaves: [ node2 ]
>  ClusterFS(ocf::heartbeat:Filesystem):Started node1
>  WebSite  (ocf::heartbeat:apache):Started node2
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
> last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> Any help would be appreciated, I think there is something dumb that I'm 
> missing.
> 
> Thank you.
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

[ClusterLabs] DRBD failover in Pacemaker

2016-09-07 Thread Devin Ortner

I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have been 
using the "Clusters from Scratch" documentation to create my cluster and I am 
running into a problem where DRBD is not failing over to the other node when 
one goes down. Here is my "pcs status" prior to when it is supposed to fail 
over:

--

[root@node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep  6 14:50:21 2016  Last change: Tue Sep  6 
14:50:17 2016 by root via crm_attribute on node1
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured

Online: [ node1 node2 ]

Full list of resources:

 Cluster_VIP(ocf::heartbeat:IPaddr2):   Started node1
 Master/Slave Set: ClusterDBclone [ClusterDB]
 Masters: [ node1 ]
 Slaves: [ node2 ]
 ClusterFS  (ocf::heartbeat:Filesystem):Started node1
 WebSite(ocf::heartbeat:apache):Started node1

Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
exitreason='none',
last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms


PCSD Status:
  node1: Online
  node2: Online

[root@node1 ~]#

When I put node1 in standby everything fails over except DRBD:
--

[root@node1 ~]# pcs cluster standby node1
[root@node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep  6 14:53:45 2016  Last change: Tue Sep  6 
14:53:37 2016 by root via cibadmin on node2
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured

Node node1: standby
Online: [ node2 ]

Full list of resources:

 Cluster_VIP(ocf::heartbeat:IPaddr2):   Started node2
 Master/Slave Set: ClusterDBclone [ClusterDB]
 Slaves: [ node2 ]
 Stopped: [ node1 ]
 ClusterFS  (ocf::heartbeat:Filesystem):Stopped
 WebSite(ocf::heartbeat:apache):Started node2

Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
exitreason='none',
last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms


PCSD Status:
  node1: Online
  node2: Online

[root@node1 ~]#

I have pasted the contents of "/var/log/messages" here: 
http://pastebin.com/0i0FMzGZ 
Here is my Configuration: http://pastebin.com/HqqBV90p 

When I unstandby node1, it comes back as the master for the DRBD and everything 
else stays running on node2 (Which is fine because I haven't setup colocation 
constraints for that)
Here is what I have after node1 is back: 
-

[root@node1 ~]# pcs cluster unstandby node1
[root@node1 ~]# pcs status
Cluster name: webcluster
Last updated: Tue Sep  6 14:57:46 2016  Last change: Tue Sep  6 
14:57:42 2016 by root via cibadmin on node1
Stack: cman
Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 5 resources configured

Online: [ node1 node2 ]

Full list of resources:

 Cluster_VIP(ocf::heartbeat:IPaddr2):   Started node2
 Master/Slave Set: ClusterDBclone [ClusterDB]
 Masters: [ node1 ]
 Slaves: [ node2 ]
 ClusterFS  (ocf::heartbeat:Filesystem):Started node1
 WebSite(ocf::heartbeat:apache):Started node2

Failed Actions:
* ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
exitreason='none',
last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms


PCSD Status:
  node1: Online
  node2: Online

[root@node1 ~]#

Any help would be appreciated, I think there is something dumb that I'm missing.

Thank you.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD failover in Pacemaker

Re: [ClusterLabs] DRBD failover in Pacemaker

Re: [ClusterLabs] DRBD failover in Pacemaker

Re: [ClusterLabs] DRBD failover in Pacemaker

Re: [ClusterLabs] DRBD failover in Pacemaker

Re: [ClusterLabs] DRBD failover in Pacemaker

Re: [ClusterLabs] DRBD failover in Pacemaker

Re: [ClusterLabs] DRBD failover in Pacemaker

[ClusterLabs] DRBD failover in Pacemaker

9 matches

Site Navigation

Mail list logo

Footer information