Jake,
Thanks for your help!
Answers to questions:
1.(Q) Why do you have LVM defined in the configuration?
(A) I wanted to make sure the LVM volumes were started before I start DRBD (I
have DRBD configured on top of LVM). I assume that this should be okay.
2.(Q) Can you clarify what you mean by "DRBD is not started"?
(A) If I do a "cat /proc/drbd", I see "unconfigured". The DRBD agent START
routine is never called. I believe this problem will be fixed once I work
through my other problems.
3.(Q) Colocation appears to be backwards per the documentation.
(A) Thanks! I changed it per your suggestion. However, the Filesystem
agent START routine is now called before the DRBD resources enters the MASTER
state.
I made the changes you suggested. (I assumed I should not have to specify the
stopping/demoting sequences but it was the only way I could get it to work.)
After these changes, a timeline of the behavior I see is this sequence logged
by agent entry point calls:
1.Call LVM start and before LVM start finishes
2.Call Filesystem start
This fails since DRBD volume is readonly
3.LVM start completes
4.Filesystem stop (called because Filesystem start fails)
5.DRBD start called
6.DRBD promote called
My expectation was that the Filesystem start routine would not be called until
DRBD was MASTER.
My configuration is:
node cnode-1-3-5
node cnode-1-3-6
primitive glance-drbd-p ocf:linbit:drbd \
params drbd_resource="glance-repos-drbd" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100" \
op monitor interval="59s" role="Master" timeout="30s" \
op monitor interval="61s" role="Slave" timeout="30s"
primitive glance-fs-p ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/glance-mount"
fstype="ext4" \
op start interval="0" timeout="60" \
op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" \
op stop interval="0" timeout="120"
primitive glance-ip-p ocf:heartbeat:IPaddr2 \
params ip="10.4.0.25" nic="br100" \
op monitor interval="5s"
primitive glance-lvm-p ocf:heartbeat:LVM \
params volgrpname="glance-repos" exclusive="true" \
op start interval="0" timeout="30" \
op stop interval="0" timeout="30"
primitive node-stonith-5-p stonith:external/ipmi \
op monitor interval="10m" timeout="1m" target_role="Started" \
params hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.99"
userid="ADMIN" passwd="foo" interface="lan"
primitive node-stonith-6-p stonith:external/ipmi \
op monitor interval="10m" timeout="1m" target_role="Started" \
params hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.100"
userid="ADMIN" passwd="foo" interface="lan"
group group-glance-fs glance-fs-p glance-ip-p \
meta target-role="Started"
ms ms-glance-drbd glance-drbd-p \
meta master-node-max="1" clone-max="2" clone-node-max="1"
globally-unique="false" notify="true" target-role="Master"
clone cloneLvm glance-lvm-p
location loc-node-stonith-5 node-stonith-5-p \
rule $id="loc-node-stonith-5-rule" -inf: #uname eq cnode-1-3-5
location loc-node-stonith-6 node-stonith-6-p \
rule $id="loc-node-stonith-6-rule" -inf: #uname eq cnode-1-3-6
colocation coloc-fs-group-and-drbd inf: group-glance-fs
ms-glance-drbd:Master
order order-glance-lvm-before-drbd inf: cloneLvm:start
ms-glance-drbd:start
property $id="cib-bootstrap-options" \
dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="true" \
no-quorum-policy="ignore" \
last-lrm-refresh="1313440611"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
Thanks again for your help,
Bob
________________________________
From: Jake Smith <[email protected]>
To: Bob Schatz <[email protected]>
Cc: [email protected]
Sent: Thursday, August 11, 2011 11:04 AM
Subject: Re: [DRBD-user] Fw: DRBD STONITH - how is Pacemaker constraint cleared?
Comments in-line. Also in-line with Pacemaker config at the bottom.
HTH
Jake
----- Original Message -----
> From: "Bob Schatz" <[email protected]>
> To: [email protected]
> Sent: Thursday, August 11, 2011 1:09:56 PM
> Subject: [DRBD-user] Fw: DRBD STONITH - how is Pacemaker constraint
> cleared?
> Hi,
> Does anyone know the answer to the question below about DRBD STONITH
> setting Pacemaker location constraints?
> Thanks!
> Bob
> ----- Forwarded Message -----
> From: Bob Schatz <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Tuesday, August 2, 2011 12:21 PM
> Subject: [DRBD-user] DRBD STONITH - how is Pacemaker constraint
> cleared?
> Hi,
> I setup DRBD and Pacemaker using STONITH for DRBD and for Pacemaker.
> (Configs at bottom of email)
> When I reboot the PRIMARY DRBD node (cnode-1-3-6), Pacemaker shows
> this location constraint:
> location drbd-fence-by-handler-ms-glance-drbd ms-glance-drbd \
> rule $id="drbd-fence-by-handler-rule-ms-glance-drbd" $role="Master"
> -inf: #uname ne cnode-1-3-5
> and transitions the SECONDARY to PRIMARY. This makes sense to me.
> However, when I restart cnode-1-3-6 (cnode-1-3-5 still up as PRIMARY)
> the location constraint is not cleared as I would have expected.
> Also, DRBD is not started (I assume because of the location
> constraint). I would expect that since cnode-1-3-5 is still up the
> constraint would be moved and DRBD would change to SECONDARY.
The location constraint would only prevent DRBD from allowing glance-drbd to be
promoted to Master on cnode-1-3-6. Basicly it says that the role of
ms-glance-drbd:Master can only be on node named cnode-1-3-5. It doesn't care
about ms-glance-drbd:Secondary. It would not prevent DRBD from starting either
(though your ordering could cause it not to start...). Could you clarify what
you mean by "DRBD is not stared"?
> Am I correct that this location constraint should be cleared?
> I assumed this would be cleared by the DRBD handler
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh" script but I
> do not believe it is called.
That is the handler that would clear the location constraint. You should see it
cleared after the resync is complete. If DRBD isn't running it will never
resync which means it will never run the after-resync-target commands. Have
you checked that cnode-1-3-6 is UpToDate (cat /proc/drbd). Here's an excerpt
of how it should look in the logs as the constraint is removed (should log on
cnode-1-3-6):
kernel: [ 77.131564] block drbd4: Resync done (total 1 sec; paused 0 sec; 0
K/sec)
kernel: [ 77.131573] block drbd4: conn( SyncTarget -> Connected ) disk(
Inconsistent -> UpToDate )
kernel: [ 77.131585] block drbd4: helper command: /sbin/drbdadm
after-resync-target minor-4
crm-unfence-peer.sh[3024]: invoked for bind <-- drbd4
kernel: [ 77.261360] block drbd4: helper command: /sbin/drbdadm
after-resync-target minor-4 exit code 0 (0x0)
> BTW, I am pretty sure I have ordering duplications in my Pacemaker
> configuration (pointed out by Andrew on the Pacemaker mailing list)
> but I am not sure if that is the problem.
> Thanks,
> Bob
> drbd.conf file:
> global {
> usage-count yes;
> }
> common {
> protocol C;
> }
> resource glance-repos-drbd {
> disk {
> fencing resource-and-stonith;
> }
> handlers {
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> }
> on cnode-1-3-5 {
> device /dev/drbd1;
> disk /dev/glance-repos/glance-repos-vol;
> address 10.4.1.29:7789;
> flexible-meta-disk /dev/glance-repos/glance-repos-drbd-meta-vol;
> }
> on cnode-1-3-6 {
> device /dev/drbd1;
> disk /dev/glance-repos/glance-repos-vol;
> address 10.4.1.30:7789;
> flexible-meta-disk /dev/glance-repos/glance-repos-drbd-meta-vol;
> }
> syncer {
> rate 40M;
> }
> }
> Pacemaker configuration:
> node cnode-1-3-5
> node cnode-1-3-6
> primitive glance-drbd-p ocf:linbit:drbd \ params
> drbd_resource="glance-repos-drbd" \ op start interval="0"
> timeout="240" \ op stop interval="0" timeout="100" \ op monitor
> interval="59s" role="Master" timeout="30s" \ op monitor
> interval="61s" role="Slave" timeout="30s"
> primitive glance-fs-p ocf:heartbeat:Filesystem \ params
> device="/dev/drbd1" directory="/glance-mount" fstype="ext4" \ op
> start interval="0" timeout="60" \ op monitor interval="60"
> timeout="60" OCF_CHECK_LEVEL="20" \ op stop interval="0"
> timeout="120"
> primitive glance-ip-p ocf:heartbeat:IPaddr2 \ params ip="10.4.0.25"
> nic="br100" \ op monitor interval="5s"
> primitive glance-lvm-p ocf:heartbeat:LVM \ params
> volgrpname="glance-repos" exclusive="true" \ op start interval="0"
> timeout="30" \ op stop interval="0" timeout="30" \ meta
> target-role="Started"
I don't understand why you have this primitive?
> primitive node-stonith-5-p stonith:external/ipmi \ op monitor
> interval="10m" timeout="1m" target_role="Started" \ params
> hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.99"
> userid="ADMIN" passwd="foo" interface="lan"
> primitive node-stonith-6-p stonith:external/ipmi \ op monitor
> interval="10m" timeout="1m" target_role="Started" \ params
> hostname="cnode-1-3-5 cnode-1-3-6" ipaddr="172.23.8.100"
> userid="ADMIN" passwd="foo" interface="lan"
> group group-glance-fs glance-fs-p glance-ip-p \ meta
> target-role="Started"
> ms ms-glance-drbd glance-drbd-p \ meta master-node-max="1"
> clone-max="2" clone-node-max="1" globally-unique="false"
> notify="true" target-role="Master"
> clone cloneLvm glance-lvm-p
> location drbd-fence-by-handler-ms-glance-drbd ms-glance-drbd \ rule
> $id="drbd-fence-by-handler-rule-ms-glance-drbd" $role="Master" -inf:
> #uname ne cnode-1-3-5
> location loc-node-stonith-5 node-stonith-5-p \ rule
> $id="loc-node-stonith-5-rule" -inf: #uname eq cnode-1-3-5
> location loc-node-stonith-6 node-stonith-6-p \ rule
> $id="loc-node-stonith-6-rule" -inf: #uname eq cnode-1-3-6
> colocation coloc-drbd-and-fs-group inf: ms-glance-drbd:Master
> group-glance-fs
This is backwards I believe... group-glance-fs runs on the
ms-glance-drbd:Master correct?
Colocation reads x on y so this would say that the ms-glance-drbd:Master has to
run on the group-glance-fs. That means if group-glance-fs isn't running then
ms-glance-drbd:Master can never run on that node.
Quote from Pacemaker Docs:
<rsc_colocation id="colocate" rsc="resource1" with-rsc="resource2"
score="INFINITY"/>
Remember, because INFINITY was used, if resource2 can't run on any of the
cluster nodes (for whatever reason) then resource1 will not be allowed to run.
> order order-glance-drbd-demote-before-stop-drbd inf:
> ms-glance-drbd:demote ms-glance-drbd:stop
Not needed
> order order-glance-drbd-promote-before-fs-group inf:
> ms-glance-drbd:promote group-glance-fs:start
Ordering statements are applied in reverse when stopping so the above
statements handles the demote/stop also making the ordering statements with
demote unneeded.
> order order-glance-drbd-start-before-drbd-promote inf:
> ms-glance-drbd:start ms-glance-drbd:promote
Not needed - start for ms resources... they should be started normally.
> order order-glance-fs-stop-before-demote-drbd inf:
> group-glance-fs:stop ms-glance-drbd:demote
Not needed
> order order-glance-lvm-before-drbd 0: cloneLvm ms-glance-drbd:start
> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> cluster-infrastructure="openais" \ expected-quorum-votes="2" \
> stonith-enabled="true" \ no-quorum-policy="ignore" \
> last-lrm-refresh="1311899021"
> rsc_defaults $id="rsc-options" \ resource-stickiness="100"_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user