Re: [Linux-HA] cman+pacemaker+drbd fencing problem

Lars Ellenberg Tue, 28 Feb 2012 11:09:30 -0800

On Tue, Feb 28, 2012 at 01:21:51PM -0500, William Seligman wrote:
> On 2/27/12 8:40 PM, Andrew Beekhof wrote:
> 
> > Oh, what does the fence_pcmk file look like?
> 
> This is a standard part of the pacemaker-1.1.6 package. According to
> 
> <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_configuring_cman_fencing.html>
> 
> it causes any fencing requests from cman to be redirected to pacemaker.
> 
> Since you asked, I've attached a copy of the file. I note that if this script 
> is
> used to fence a system it writes to /var/log/messages using logger, and there 
> is
> no such log message in my logs. So I guess cman is off the hook.


You say "fencing resource-only" in drbd.conf.
But you did not show the fencing handler used?
Did you specify one at all?

Besides, for a dual-primary DRBD setup, you must have "fencing
resource-and-stonith;", and you should use a DRBD fencing handler
that really fences off the peer. It may additionally set constraints.

Also, maybe that post helps to realize some of the problems involved:
http://www.gossamer-threads.com/lists/linuxha/pacemaker/62927#62927

Especially the part about
 But just because you can shoot someone
 does not mean you have the bi^Wbetter data. 

Because of the increased complexity, I strongly recommend against dual
primary DRBD, unless you have a very good reason to want it.

"Because it can be done" does not count as good reason in that context

;-)

More comments below.

> > On Tue, Feb 28, 2012 at 11:49 AM, William Seligman
> > <selig...@nevis.columbia.edu> wrote:
> >> I'm trying to set up an active/active HA cluster as explained in Clusters 
> >> From
> >> Scratch (which I just re-read after my last problem).
> >>
> >> I'll give versions and config files below, but I'll start with what 
> >> happens. I
> >> start with an active/active cman+pacemaker+drbd+gfs2 cluster, with fencing
> >> enabled. My fencing mechanism cuts power to a node by turning the load off 
> >> in
> >> its UPS. The two nodes are hypatia-tb and orestes-tb.
> >>
> >> I want to test fencing and recovery. I start with both nodes running, and
> >> resources properly running on both nodes. Then I simulate failure on one 
> >> node,
> >> e.g., orestes-tb. I've done this with "crm node standby", "service 
> >> pacemaker
> >> off", or by pulling the plug. As expected, all the resources move to 
> >> hypatia-tb,
> >> with the drbd resource as Primary.
> >>
> >> When I try to bring orestes-tb back into the cluster with "crm node 
> >> online" or
> >> "service pacemaker on" (the inverse of how I removed it), orestes-tb is 
> >> fenced.
> >> OK, that makes sense, I guess; there's a potential split-brain situation.
> > 
> > Not really, that should only happen if the two nodes can't see each
> > other.  Which should not be the case.
> > Only when you pull the plug should orestes-tb be fenced.
> > 
> > Or if you're using a fencing device that requires the node to have
> > power, then I can imagine that turning it on again might result in
> > fencing.
> > But not for the other cases.
> 
> I ran a test: I turned off pacemaker (and so DRBD) on orestes-tb. I "touch"ed 
> a
> file on the hypatia-tb DRBD partition, to make it the "newer" one.
> Then I turned
> off pacemaker on hypatia-tb. Finally I turned on just drbd on hypatia-tb, then
> on orestes-tb.
> 
> From /var/log/messages on hypatia-tb:
> 
> Feb 28 11:39:19 hypatia-tb kernel: d-con admin: Starting worker thread (from
> drbdsetup [21822])
> Feb 28 11:39:19 hypatia-tb kernel: block drbd0: disk( Diskless -> Attaching )
> Feb 28 11:39:19 hypatia-tb kernel: d-con admin: Method to ensure write 
> ordering:
> barrier
> Feb 28 11:39:19 hypatia-tb kernel: block drbd0: max BIO size = 130560
> Feb 28 11:39:19 hypatia-tb kernel: block drbd0: Adjusting my ra_pages to 
> backing
> device's (32 -> 768)
> Feb 28 11:39:19 hypatia-tb kernel: block drbd0: drbd_bm_resize called with
> capacity == 5611549368
> Feb 28 11:39:19 hypatia-tb kernel: block drbd0: resync bitmap: bits=701443671
> words=10960058 pages=21407
> Feb 28 11:39:19 hypatia-tb kernel: block drbd0: size = 2676 GB (2805774684 KB)
> Feb 28 11:39:19 hypatia-tb kernel: block drbd0: bitmap READ of 21407 pages 
> took
> 576 jiffies
> Feb 28 11:39:20 hypatia-tb kernel: block drbd0: recounting of set bits took
> additional 87 jiffies
> Feb 28 11:39:20 hypatia-tb kernel: block drbd0: 55 MB (14114 bits) marked
> out-of-sync by on disk bit-map.
> Feb 28 11:39:20 hypatia-tb kernel: block drbd0: disk( Attaching -> UpToDate )
> pdsk( DUnknown -> Outdated )
> Feb 28 11:39:20 hypatia-tb kernel: block drbd0: attached to UUIDs
> 862A336609FD27CD:BFFB722D5E3E15D7:6E63EC4258C86AF2:6E62EC4258C86AF2
> Feb 28 11:39:20 hypatia-tb kernel: d-con admin: conn( StandAlone -> 
> Unconnected )
> Feb 28 11:39:20 hypatia-tb kernel: d-con admin: Starting receiver thread (from
> drbd_w_admin [21824])
> Feb 28 11:39:20 hypatia-tb kernel: d-con admin: receiver (re)started
> Feb 28 11:39:20 hypatia-tb kernel: d-con admin: conn( Unconnected -> 
> WFConnection )
> 
> 
> From /var/log/messages on orestes-tb:
> 
> Feb 28 11:39:51 orestes-tb kernel: d-con admin: Starting worker thread (from
> drbdsetup [17827])
> Feb 28 11:39:51 orestes-tb kernel: block drbd0: disk( Diskless -> Attaching )
> Feb 28 11:39:51 orestes-tb kernel: d-con admin: Method to ensure write 
> ordering:
> barrier
> Feb 28 11:39:51 orestes-tb kernel: block drbd0: max BIO size = 130560
> Feb 28 11:39:51 orestes-tb kernel: block drbd0: Adjusting my ra_pages to 
> backing
> device's (32 -> 768)
> Feb 28 11:39:51 orestes-tb kernel: block drbd0: drbd_bm_resize called with
> capacity == 5611549368
> Feb 28 11:39:51 orestes-tb kernel: block drbd0: resync bitmap: bits=701443671
> words=10960058 pages=21407
> Feb 28 11:39:51 orestes-tb kernel: block drbd0: size = 2676 GB (2805774684 KB)
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: bitmap READ of 21407 pages 
> took
> 735 jiffies
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: recounting of set bits took
> additional 93 jiffies
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: 0 KB (0 bits) marked 
> out-of-sync
> by on disk bit-map.
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: disk( Attaching -> Outdated )
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: attached to UUIDs
> BFFB722D5E3E15D6:0000000000000000:6E63EC4258C86AF2:6E62EC4258C86AF2
> Feb 28 11:39:52 orestes-tb kernel: d-con admin: conn( StandAlone -> 
> Unconnected )
> Feb 28 11:39:52 orestes-tb kernel: d-con admin: Starting receiver thread (from
> drbd_w_admin [17829])
> Feb 28 11:39:52 orestes-tb kernel: d-con admin: receiver (re)started
> Feb 28 11:39:52 orestes-tb kernel: d-con admin: conn( Unconnected -> 
> WFConnection )
> Feb 28 11:39:52 orestes-tb kernel: d-con admin: Handshake successful: Agreed
> network protocol version 100
> Feb 28 11:39:52 orestes-tb kernel: d-con admin: conn( WFConnection ->
> WFReportParams )
> Feb 28 11:39:52 orestes-tb kernel: d-con admin: Starting asender thread (from
> drbd_r_admin [17835])
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: drbd_sync_handshake:
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: self
> BFFB722D5E3E15D6:0000000000000000:6E63EC4258C86AF2:6E62EC4258C86AF2 bits:0 
> flags:0
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: peer
> 862A336609FD27CC:BFFB722D5E3E15D7:6E63EC4258C86AF2:6E62EC4258C86AF2 bits:14114
> flags:0
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: uuid_compare()=-1 by rule 50
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: peer( Unknown -> Secondary )
> conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: receive bitmap stats
> [Bytes(packets)]: plain 0(0), RLE 176(1), total 176; compression: 100.0%
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: send bitmap stats
> [Bytes(packets)]: plain 0(0), RLE 176(1), total 176; compression: 100.0%
> Feb 28 11:39:52 orestes-tb kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID 
> )
> Feb 28 11:40:01 orestes-tb corosync[2193]:   [TOTEM ] A processor failed,
> forming new configuration.
> Feb 28 11:40:03 orestes-tb corosync[2193]:   [QUORUM] Members[1]: 2
> Feb 28 11:40:03 orestes-tb corosync[2193]:   [TOTEM ] A processor joined or 
> left
> the membership and a new membership was formed.
> Feb 28 11:40:03 orestes-tb kernel: dlm: closing connection to node 1
> Feb 28 11:40:03 orestes-tb corosync[2193]:   [CPG   ] chosen downlist: sender
> r(0) ip(129.236.252.14) r(1) ip(192.168.100.6) ; members(old:2 left:1)
> Feb 28 11:40:03 orestes-tb corosync[2193]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Feb 28 11:40:03 orestes-tb fenced[2247]: fencing node 
> hypatia-tb.nevis.columbia.edu
> 
> 
> As far as I can tell, hypatia-tb's drbd comes up, says "I'm UpToDate" and 
> waits
> for a connection from orestes-tb. orestes-tb's drbd comes up, says "I'm
> UpToDate"

No, it clearly says "I'm Outdated" from the logs above:
 | Feb 28 11:39:52 orestes-tb kernel: block drbd0: disk( Attaching -> Outdated )

It outdated itself voluntarily when it was told to disconnect from a
still running primary, because of your "fencing resource-only" configuration.

Don't rely on that: in a real incident, the replication link will just
fail, in which case you really need the "fencing resource-and-stonith",
and a suitable fence-peer handler.

> and starts the sync process with hypatia-tb. Then cman+corosync steps
> in on orestes-tb and fences hypatia-tb, before the sync can proceed.
> 
> I ran another test. I did the same thing as the previous paragraph, except 
> that
> I made sure both cman and pacemaker were off (I had to reboot to make sure) 
> and
> just started drbd on both nodes. Sure enough, drbd was able to sync without
> split-brain or fencing. So this is a cman/corosync issue, not a drbd issue.

You still may retry the whole thing with drbd 8.3.12,
just to make sure there is no hidden DRBD 8.4.1 instability.

> While I was setting up the test for the previous paragraph, there was a 
> problem
> with another resource (ocf:heartbeat:exportfs) that couldn't be properly
> monitored on either node. This led to a cycle of fencing where each node would
> successively fence the other because the exportfs resource couldn't run on
> either node. I had to quickly change my configuration to turn off monitoring 
> on
> the resource.
> 
> So it seems like cman+corosync is the issue. It's as if I"m "over-fencing."
> 
> Any ideas?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] cman+pacemaker+drbd fencing problem

Reply via email to