Re: [ClusterLabs] Help needed getting DRBD cluster working

2015-11-30 Thread Lars Ellenberg
On Tue, Oct 06, 2015 at 10:13:00AM -0500, Ken Gaillot wrote:
> > ms ms_drbd0 drbd_disc0 \
> > meta master-max="1" master-node-max="1" clone-max="2" 
> > clone-node-max="1" notify="true" target-role="Started"
> 
> You want to omit target-role, or set it to "Master". Otherwise both
> nodes will start as slaves.

That is incorrect.  "Started" != "Slave"

target-role "Started" actually means "default for the resource being
handled" (the same as if you just removed that target-role attribute),
which in this case means "start up to clone-max instances,
then of those promote up to master-max instances"

target-role Slave would in fact prohibit promotion.

and target-role Master would, back in the day, trigger a pacemaker bug
where it would try to fulfill target-role, and happend to ignore
master-max, trying to promote all instances everywhere ;-)

not set: default behaviour
started: same as not set
slave:   do not promote
master:  nowadays for ms resources same as "Started" or not set,
 but used to trigger some nasty "promote everywhere" bug
 (a few years back)

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help needed getting DRBD cluster working

2015-10-06 Thread Gordon Ross
On 5 Oct 2015, at 15:05, Ken Gaillot  wrote:
> 
> The "rc=6" in the failed actions means the resource's Pacemaker
> configuration is invalid. (For OCF return codes, see
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes
> )
> 
> The "_monitor_0" means that this was the initial probe that Pacemaker
> does before trying to start the resource, to make sure it's not already
> running. As an aside, you probably want to add recurring monitors as
> well, otherwise Pacemaker won't notice if the resource fails. For
> example: op monitor interval="29s" role="Master" op monitor
> interval="31s" role="Slave"
> 
> As to why the probe is failing, it's hard to tell. Double-check your
> configuration to make sure disc0 is the exact DRBD name, Pacemaker can
> read the DRBD configuration file, etc. You can also try running the DRBD
> resource agent's "status" command manually to see if it prints a more
> detailed error message.

I cleated the CIB and re-created most of it with your suggested parameters. It 
now looks like:

node $id="739377522" ct1
node $id="739377523" ct2
node $id="739377524" ct3 \
attributes standby="on"
primitive drbd_disc0 ocf:linbit:drbd \
params drbd_resource="disc0" \
meta target-role="Started" \
op monitor interval="19s" on-fail="restart" role="Master" 
start-delay="10s" timeout="20s" \
op monitor interval="20s" on-fail="restart" role="Slave" 
start-delay="10s" timeout="20s"
ms ms_drbd0 drbd_disc0 \
meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" target-role="Started"
location cli-prefer-drbd_disc0 ms_drbd0 inf: ct2
location cli-prefer-ms_drbd0 ms_drbd0 inf: ct2
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false" \
no-quorum-policy="stop" \
symmetric-cluster="false"


I think I’m missing something basic between the DRBD/Pacemaker hook-up.

As soon as Pacemaker/Corosync start, DRBD on both nodes stop. a “cat 
/proc/drbd” then just returns:

version: 8.4.3 (api:1/proto:86-101)
srcversion: 6551AD2C98F533733BE558C 

and no details on the replicated disc and the drbd block device disappears.

GTG
-- 
Gordon Ross,
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help needed getting DRBD cluster working

2015-10-06 Thread Ken Gaillot
On 10/06/2015 09:38 AM, Gordon Ross wrote:
> On 5 Oct 2015, at 15:05, Ken Gaillot  wrote:
>>
>> The "rc=6" in the failed actions means the resource's Pacemaker
>> configuration is invalid. (For OCF return codes, see
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes
>> )
>>
>> The "_monitor_0" means that this was the initial probe that Pacemaker
>> does before trying to start the resource, to make sure it's not already
>> running. As an aside, you probably want to add recurring monitors as
>> well, otherwise Pacemaker won't notice if the resource fails. For
>> example: op monitor interval="29s" role="Master" op monitor
>> interval="31s" role="Slave"
>>
>> As to why the probe is failing, it's hard to tell. Double-check your
>> configuration to make sure disc0 is the exact DRBD name, Pacemaker can
>> read the DRBD configuration file, etc. You can also try running the DRBD
>> resource agent's "status" command manually to see if it prints a more
>> detailed error message.
> 
> I cleated the CIB and re-created most of it with your suggested parameters. 
> It now looks like:
> 
> node $id="739377522" ct1
> node $id="739377523" ct2
> node $id="739377524" ct3 \
>   attributes standby="on"
> primitive drbd_disc0 ocf:linbit:drbd \
>   params drbd_resource="disc0" \
>   meta target-role="Started" \
>   op monitor interval="19s" on-fail="restart" role="Master" 
> start-delay="10s" timeout="20s" \
>   op monitor interval="20s" on-fail="restart" role="Slave" 
> start-delay="10s" timeout="20s"
> ms ms_drbd0 drbd_disc0 \
>   meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true" target-role="Started"

You want to omit target-role, or set it to "Master". Otherwise both
nodes will start as slaves.

> location cli-prefer-drbd_disc0 ms_drbd0 inf: ct2
> location cli-prefer-ms_drbd0 ms_drbd0 inf: ct2

You've given the above constraints different names, but they are
identical: they both say ms_drbd0 can run on ct2 only.

When you're using clone/ms resources, you generally only ever need to
refer to the clone's name, not the resource being cloned. So you don't
need any constraints for drbd_disc0.

You've set symmetric-cluster=false in the cluster options, which means
that Pacemaker will not start resources on any node unless a location
constaint enables it. Here, you're only enabling ct2. Duplicate the
constraint for ct1 (or set symmetric-cluster=true and use a -INF
location constraint for the third node instead).

> property $id="cib-bootstrap-options" \
>   dc-version="1.1.10-42f2063" \
>   cluster-infrastructure="corosync" \
>   stonith-enabled="false" \

I'm sure you've heard this before, but stonith is the only way to avoid
data corruption in a split-brain situation. It's usually best to make
fencing the first priority rather than save it for last, because some
problems can become more difficult to troubleshoot without fencing. DRBD
in particular needs special configuration to coordinate fencing with
Pacemaker: https://drbd.linbit.com/users-guide/s-pacemaker-fencing.html

>   no-quorum-policy="stop" \
>   symmetric-cluster="false"
> 
> 
> I think I’m missing something basic between the DRBD/Pacemaker hook-up.
> 
> As soon as Pacemaker/Corosync start, DRBD on both nodes stop. a “cat 
> /proc/drbd” then just returns:
> 
> version: 8.4.3 (api:1/proto:86-101)
> srcversion: 6551AD2C98F533733BE558C 
> 
> and no details on the replicated disc and the drbd block device disappears.
> 
> GTG
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help needed getting DRBD cluster working

2015-10-05 Thread Ken Gaillot
On 10/05/2015 08:09 AM, Gordon Ross wrote:
> I’m trying to setup a simple DRBD cluster using Ubuntu 14.04 LTS using 
> Pacemaker & Corosync. My problem is getting the resource to startup.
> 
> I’ve setup the DRBD aspect fine. Checking /proc/drbd I can see that my test 
> DRBD device is all synced and OK.
> 
> Following the examples from the “Clusters From Scratch” document, I built the 
> following cluster configuration:
> 
> property \
>   stonith-enabled="false" \
>   no-quorum-policy="stop" \
>   symmetric-cluster="false"
> node ct1
> node ct2
> node ct3 attributes standby="on"
> primitive drbd_disc0 ocf:linbit:drbd \
>   params drbd_resource="disc0"
> primitive drbd_disc0_fs ocf:heartbeat:Filesystem \
>   params fstype="ext4" device="/dev/drbd0" directory="/replicated/disc0"
> ms ms_drbd0 drbd_disc0 \
>   meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max=“1” \
>notify="true" target-role="Master"
> colocation filesystem_with_disc inf: drbd_disc0_fs ms_drbd0:Master
> 
> ct1 & ct2 are the main DRBD servers, with ct3 being a witness server to avoid 
> split-brain problems.
> 
> When I look at the cluster status, I get:
> 
> crm(live)# status
> Last updated: Mon Oct  5 14:04:12 2015
> Last change: Thu Oct  1 17:31:35 2015 via cibadmin on ct2
> Current DC: ct2 (739377523) - partition with quorum
> 3 Nodes configured
> 3 Resources configured
> 
> 
> Node ct3 (739377524): standby
> Online: [ ct1 ct2 ]
> 
> 
> Failed actions:
> drbd_disc0_monitor_0 (node=ct1, call=5, rc=6, status=complete, 
> last-rc-change=Thu Oct  1 16:42:11 2015
> , queued=60ms, exec=0ms
> ): not configured
> drbd_disc0_monitor_0 (node=ct2, call=5, rc=6, status=complete, 
> last-rc-change=Thu Oct  1 16:17:17 2015
> , queued=67ms, exec=0ms
> ): not configured
> drbd_disc0_monitor_0 (node=ct3, call=5, rc=6, status=complete, 
> last-rc-change=Thu Oct  1 16:42:10 2015
> , queued=54ms, exec=0ms
> ): not configured
> 
> What have I done wrong?

The "rc=6" in the failed actions means the resource's Pacemaker
configuration is invalid. (For OCF return codes, see
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes
)

The "_monitor_0" means that this was the initial probe that Pacemaker
does before trying to start the resource, to make sure it's not already
running. As an aside, you probably want to add recurring monitors as
well, otherwise Pacemaker won't notice if the resource fails. For
example: op monitor interval="29s" role="Master" op monitor
interval="31s" role="Slave"

As to why the probe is failing, it's hard to tell. Double-check your
configuration to make sure disc0 is the exact DRBD name, Pacemaker can
read the DRBD configuration file, etc. You can also try running the DRBD
resource agent's "status" command manually to see if it prints a more
detailed error message.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org