Re: [DRBD-user] Fw: DRBD STONITH - how is Pacemaker constraint cleared?

Bob Schatz Mon, 15 Aug 2011 15:32:12 -0700

Jake,

Thanks for your help!


Answers to questions:

1.(Q) Why do you have LVM defined in the configuration?
(A) I wanted to make sure the LVM volumes were started before I start DRBD (I 
have DRBD configured on top of LVM).   I assume that this should be okay.

2.(Q) Can you clarify what you mean by "DRBD is not started"?
(A) If I do a "cat /proc/drbd", I see "unconfigured".   The DRBD agent START 
routine is never called.   I believe this problem will be fixed once I work 
through my other problems.

3.(Q) Colocation appears to be backwards per the documentation.
(A)  Thanks!   I changed it per your suggestion.   However, the Filesystem 
agent START routine is now called before the DRBD resources enters the MASTER 
state.

I made the changes you suggested.  (I assumed I should not have to specify the 
stopping/demoting sequences but it was the only way I could get it to work.)

After these changes, a timeline of the behavior I see is this sequence logged 
by agent entry point calls:

1.Call LVM start and before LVM start finishes
2.Call Filesystem start

This fails since DRBD volume is readonly

3.LVM start completes
4.Filesystem stop (called because Filesystem start fails)
5.DRBD start called
6.DRBD promote called

My expectation was that the Filesystem start routine would not be called until 
DRBD was MASTER.

My configuration is:

        node cnode-1-3-5
        node cnode-1-3-6
        primitive glance-drbd-p ocf:linbit:drbd \
                params drbd_resource="glance-repos-drbd" \
                op start interval="0" timeout="240" \
                op stop interval="0" timeout="100" \
                op monitor interval="59s" role="Master" timeout="30s" \
                op monitor interval="61s" role="Slave" timeout="30s"
        primitive glance-fs-p ocf:heartbeat:Filesystem \
                params device="/dev/drbd1" directory="/glance-mount" 
fstype="ext4" \
                op start interval="0" timeout="60" \
                op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" \
                op stop interval="0" timeout="120"
        primitive glance-ip-p ocf:heartbeat:IPaddr2 \
                params ip="10.4.0.25" nic="br100" \
                op monitor interval="5s"
        primitive glance-lvm-p ocf:heartbeat:LVM \
                params volgrpname="glance-repos" exclusive="true" \
                op start interval="0" timeout="30" \
                op stop interval="0" timeout="30"
        primitive node-stonith-5-p stonith:external/ipmi \
                op monitor interval="10m" timeout="1m" target_role="Started" \
                params hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.99" 
userid="ADMIN" passwd="foo" interface="lan"
        primitive node-stonith-6-p stonith:external/ipmi \
                op monitor interval="10m" timeout="1m" target_role="Started" \
                params hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.100" 
userid="ADMIN" passwd="foo" interface="lan"
        group group-glance-fs glance-fs-p glance-ip-p \
                meta target-role="Started"
        ms ms-glance-drbd glance-drbd-p \
                meta master-node-max="1" clone-max="2" clone-node-max="1" 
globally-unique="false" notify="true" target-role="Master"

        clone cloneLvm glance-lvm-p

        location loc-node-stonith-5 node-stonith-5-p \
                rule $id="loc-node-stonith-5-rule" -inf: #uname eq cnode-1-3-5
        location loc-node-stonith-6 node-stonith-6-p \
                rule $id="loc-node-stonith-6-rule" -inf: #uname eq cnode-1-3-6

        colocation coloc-fs-group-and-drbd inf: group-glance-fs 
ms-glance-drbd:Master
        order order-glance-lvm-before-drbd inf: cloneLvm:start 
ms-glance-drbd:start

        property $id="cib-bootstrap-options" \
                dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
                cluster-infrastructure="openais" \
                expected-quorum-votes="2" \
                stonith-enabled="true" \
                no-quorum-policy="ignore" \
                last-lrm-refresh="1313440611"
        rsc_defaults $id="rsc-options" \
                resource-stickiness="100"


Thanks again for your help,

Bob


________________________________
From: Jake Smith <[email protected]>
To: Bob Schatz <[email protected]>
Cc: [email protected]
Sent: Thursday, August 11, 2011 11:04 AM
Subject: Re: [DRBD-user] Fw: DRBD STONITH - how is Pacemaker constraint cleared?

Comments in-line.  Also in-line with Pacemaker config at the bottom.

HTH

Jake

----- Original Message ----- 

> From: "Bob Schatz" <[email protected]>
> To: [email protected]
> Sent: Thursday, August 11, 2011 1:09:56 PM
> Subject: [DRBD-user] Fw: DRBD STONITH - how is Pacemaker constraint
> cleared?

> Hi,

> Does anyone know the answer to the question below about DRBD STONITH
> setting Pacemaker location constraints?

> Thanks!

> Bob

> ----- Forwarded Message -----
> From: Bob Schatz <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Tuesday, August 2, 2011 12:21 PM
> Subject: [DRBD-user] DRBD STONITH - how is Pacemaker constraint
> cleared?

> Hi,

> I setup DRBD and Pacemaker using STONITH for DRBD and for Pacemaker.
> (Configs at bottom of email)

> When I reboot the PRIMARY DRBD node (cnode-1-3-6), Pacemaker shows
> this location constraint:

> location drbd-fence-by-handler-ms-glance-drbd ms-glance-drbd \
> rule $id="drbd-fence-by-handler-rule-ms-glance-drbd" $role="Master"
> -inf: #uname ne cnode-1-3-5

> and transitions the SECONDARY to PRIMARY. This makes sense to me.

> However, when I restart cnode-1-3-6 (cnode-1-3-5 still up as PRIMARY)
> the location constraint is not cleared as I would have expected.
> Also, DRBD is not started (I assume because of the location
> constraint). I would expect that since cnode-1-3-5 is still up the
> constraint would be moved and DRBD would change to SECONDARY.

The location constraint would only prevent DRBD from allowing glance-drbd to be 
promoted to Master on cnode-1-3-6.  Basicly it says that the role of 
ms-glance-drbd:Master can only be on node named cnode-1-3-5.  It doesn't care 
about ms-glance-drbd:Secondary.  It would not prevent DRBD from starting either 
(though your ordering could cause it not to start...).  Could you clarify what 
you mean by "DRBD is not stared"?

> Am I correct that this location constraint should be cleared?

> I assumed this would be cleared by the DRBD handler
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh" script but I
> do not believe it is called.

That is the handler that would clear the location constraint. You should see it 
cleared after the resync is complete.  If DRBD isn't running it will never 
resync which means it will never run the after-resync-target commands.  Have 
you checked that cnode-1-3-6 is UpToDate (cat /proc/drbd).  Here's an excerpt 
of how it should look in the logs as the constraint is removed (should log on 
cnode-1-3-6):

kernel: [   77.131564] block drbd4: Resync done (total 1 sec; paused 0 sec; 0 
K/sec)
kernel: [   77.131573] block drbd4: conn( SyncTarget -> Connected ) disk( 
Inconsistent -> UpToDate )
kernel: [   77.131585] block drbd4: helper command: /sbin/drbdadm 
after-resync-target minor-4
crm-unfence-peer.sh[3024]: invoked for bind <-- drbd4
kernel: [   77.261360] block drbd4: helper command: /sbin/drbdadm 
after-resync-target minor-4 exit code 0 (0x0)


> BTW, I am pretty sure I have ordering duplications in my Pacemaker
> configuration (pointed out by Andrew on the Pacemaker mailing list)
> but I am not sure if that is the problem.

> Thanks,

> Bob

> drbd.conf file:

> global {
> usage-count yes;
> }

> common {
> protocol C;
> }

> resource glance-repos-drbd {
> disk {
> fencing resource-and-stonith;
> }
> handlers {
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> }
> on cnode-1-3-5 {
> device /dev/drbd1;
> disk /dev/glance-repos/glance-repos-vol;
> address 10.4.1.29:7789;
> flexible-meta-disk /dev/glance-repos/glance-repos-drbd-meta-vol;
> }
> on cnode-1-3-6 {
> device /dev/drbd1;
> disk /dev/glance-repos/glance-repos-vol;
> address 10.4.1.30:7789;
> flexible-meta-disk /dev/glance-repos/glance-repos-drbd-meta-vol;
> }
> syncer {
> rate 40M;
> }
> }

> Pacemaker configuration:

> node cnode-1-3-5
> node cnode-1-3-6

> primitive glance-drbd-p ocf:linbit:drbd \ params
> drbd_resource="glance-repos-drbd" \ op start interval="0"
> timeout="240" \ op stop interval="0" timeout="100" \ op monitor
> interval="59s" role="Master" timeout="30s" \ op monitor
> interval="61s" role="Slave" timeout="30s"

> primitive glance-fs-p ocf:heartbeat:Filesystem \ params
> device="/dev/drbd1" directory="/glance-mount" fstype="ext4" \ op
> start interval="0" timeout="60" \ op monitor interval="60"
> timeout="60" OCF_CHECK_LEVEL="20" \ op stop interval="0"
> timeout="120"

> primitive glance-ip-p ocf:heartbeat:IPaddr2 \ params ip="10.4.0.25"
> nic="br100" \ op monitor interval="5s"

> primitive glance-lvm-p ocf:heartbeat:LVM \ params
> volgrpname="glance-repos" exclusive="true" \ op start interval="0"
> timeout="30" \ op stop interval="0" timeout="30" \ meta
> target-role="Started"

I don't understand why you have this primitive?

> primitive node-stonith-5-p stonith:external/ipmi \ op monitor
> interval="10m" timeout="1m" target_role="Started" \ params
> hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.99"
> userid="ADMIN" passwd="foo" interface="lan"

> primitive node-stonith-6-p stonith:external/ipmi \ op monitor
> interval="10m" timeout="1m" target_role="Started" \ params
> hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.100"
> userid="ADMIN" passwd="foo" interface="lan"

> group group-glance-fs glance-fs-p glance-ip-p \ meta
> target-role="Started"

> ms ms-glance-drbd glance-drbd-p \ meta master-node-max="1"
> clone-max="2" clone-node-max="1" globally-unique="false"
> notify="true" target-role="Master"

> clone cloneLvm glance-lvm-p

> location drbd-fence-by-handler-ms-glance-drbd ms-glance-drbd \ rule
> $id="drbd-fence-by-handler-rule-ms-glance-drbd" $role="Master" -inf:
> #uname ne cnode-1-3-5

> location loc-node-stonith-5 node-stonith-5-p \ rule
> $id="loc-node-stonith-5-rule" -inf: #uname eq cnode-1-3-5

> location loc-node-stonith-6 node-stonith-6-p \ rule
> $id="loc-node-stonith-6-rule" -inf: #uname eq cnode-1-3-6

> colocation coloc-drbd-and-fs-group inf: ms-glance-drbd:Master
> group-glance-fs

This is backwards I believe... group-glance-fs runs on the 
ms-glance-drbd:Master correct?
Colocation reads x on y so this would say that the ms-glance-drbd:Master has to 
run on the group-glance-fs.  That means if group-glance-fs isn't running then 
ms-glance-drbd:Master can never run on that node.

Quote from Pacemaker Docs:
<rsc_colocation id="colocate" rsc="resource1" with-rsc="resource2" 
score="INFINITY"/>
Remember, because INFINITY was used, if resource2 can't run on any of the 
cluster nodes (for whatever reason) then resource1 will not be allowed to run.


> order order-glance-drbd-demote-before-stop-drbd inf:
> ms-glance-drbd:demote ms-glance-drbd:stop

Not needed

> order order-glance-drbd-promote-before-fs-group inf:
> ms-glance-drbd:promote group-glance-fs:start

Ordering statements are applied in reverse when stopping so the above 
statements handles the demote/stop also making the ordering statements with 
demote unneeded.

> order order-glance-drbd-start-before-drbd-promote inf:
> ms-glance-drbd:start ms-glance-drbd:promote

Not needed - start for ms resources... they should be started normally.

> order order-glance-fs-stop-before-demote-drbd inf:
> group-glance-fs:stop ms-glance-drbd:demote

Not needed

> order order-glance-lvm-before-drbd 0: cloneLvm ms-glance-drbd:start

> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> cluster-infrastructure="openais" \ expected-quorum-votes="2" \
> stonith-enabled="true" \ no-quorum-policy="ignore" \
> last-lrm-refresh="1311899021"

> rsc_defaults $id="rsc-options" \ resource-stickiness="100"

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Fw: DRBD STONITH - how is Pacemaker constraint cleared?

Reply via email to