Re: [ClusterLabs] DRBD failover in Pacemaker

Digimer Wed, 07 Sep 2016 03:39:57 -0700

> no-quorum-policy: ignore
> stonith-enabled: false

You must have fencing configured.


CentOS 6 uses pacemaker with the cman plugin. So setup cman
(cluster.conf) to use the fence_pcmk passthrough agent, then setup
proper stonith in pacemaker (and test that it works). Finally, tell DRBD
to use 'fencing resource-and-stonith;' and configure the
'crm-{un,}fence-peer.sh' {un,}fence handlers.

See if that gets things working.

On 07/09/16 04:04 AM, Devin Ortner wrote:
> I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have 
> been using the "Clusters from Scratch" documentation to create my cluster and 
> I am running into a problem where DRBD is not failing over to the other node 
> when one goes down. Here is my "pcs status" prior to when it is supposed to 
> fail over:
> 
> ----------------------------------------------------------------------------------------------------------------------
> 
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:50:21 2016                Last change: Tue Sep  6 
> 14:50:17 2016 by root via crm_attribute on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Online: [ node1 node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):       Started node1
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>      Masters: [ node1 ]
>      Slaves: [ node2 ]
>  ClusterFS    (ocf::heartbeat:Filesystem):    Started node1
>  WebSite      (ocf::heartbeat:apache):        Started node1
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
>     last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> When I put node1 in standby everything fails over except DRBD:
> --------------------------------------------------------------------------------------
> 
> [root@node1 ~]# pcs cluster standby node1
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:53:45 2016                Last change: Tue Sep  6 
> 14:53:37 2016 by root via cibadmin on node2
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Node node1: standby
> Online: [ node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):       Started node2
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>      Slaves: [ node2 ]
>      Stopped: [ node1 ]
>  ClusterFS    (ocf::heartbeat:Filesystem):    Stopped
>  WebSite      (ocf::heartbeat:apache):        Started node2
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
>     last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> I have pasted the contents of "/var/log/messages" here: 
> http://pastebin.com/0i0FMzGZ 
> Here is my Configuration: http://pastebin.com/HqqBV90p 
> 
> When I unstandby node1, it comes back as the master for the DRBD and 
> everything else stays running on node2 (Which is fine because I haven't setup 
> colocation constraints for that)
> Here is what I have after node1 is back: 
> -----------------------------------------------------
> 
> [root@node1 ~]# pcs cluster unstandby node1
> [root@node1 ~]# pcs status
> Cluster name: webcluster
> Last updated: Tue Sep  6 14:57:46 2016                Last change: Tue Sep  6 
> 14:57:42 2016 by root via cibadmin on node1
> Stack: cman
> Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
> 2 nodes and 5 resources configured
> 
> Online: [ node1 node2 ]
> 
> Full list of resources:
> 
>  Cluster_VIP  (ocf::heartbeat:IPaddr2):       Started node2
>  Master/Slave Set: ClusterDBclone [ClusterDB]
>      Masters: [ node1 ]
>      Slaves: [ node2 ]
>  ClusterFS    (ocf::heartbeat:Filesystem):    Started node1
>  WebSite      (ocf::heartbeat:apache):        Started node2
> 
> Failed Actions:
> * ClusterFS_start_0 on node2 'unknown error' (1): call=61, status=complete, 
> exitreason='none',
>     last-rc-change='Tue Sep  6 13:15:00 2016', queued=0ms, exec=40ms
> 
> 
> PCSD Status:
>   node1: Online
>   node2: Online
> 
> [root@node1 ~]#
> 
> Any help would be appreciated, I think there is something dumb that I'm 
> missing.
> 
> Thank you.
> 
> _______________________________________________
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] DRBD failover in Pacemaker

Reply via email to