On Tue, Feb 28, 2012 at 01:21:51PM -0500, William Seligman wrote: > On 2/27/12 8:40 PM, Andrew Beekhof wrote: > > > Oh, what does the fence_pcmk file look like? > > This is a standard part of the pacemaker-1.1.6 package. According to > > <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_configuring_cman_fencing.html> > > it causes any fencing requests from cman to be redirected to pacemaker. > > Since you asked, I've attached a copy of the file. I note that if this script > is > used to fence a system it writes to /var/log/messages using logger, and there > is > no such log message in my logs. So I guess cman is off the hook.
You say "fencing resource-only" in drbd.conf. But you did not show the fencing handler used? Did you specify one at all? Besides, for a dual-primary DRBD setup, you must have "fencing resource-and-stonith;", and you should use a DRBD fencing handler that really fences off the peer. It may additionally set constraints. Also, maybe that post helps to realize some of the problems involved: http://www.gossamer-threads.com/lists/linuxha/pacemaker/62927#62927 Especially the part about But just because you can shoot someone does not mean you have the bi^Wbetter data. Because of the increased complexity, I strongly recommend against dual primary DRBD, unless you have a very good reason to want it. "Because it can be done" does not count as good reason in that context ;-) More comments below. > > On Tue, Feb 28, 2012 at 11:49 AM, William Seligman > > <selig...@nevis.columbia.edu> wrote: > >> I'm trying to set up an active/active HA cluster as explained in Clusters > >> From > >> Scratch (which I just re-read after my last problem). > >> > >> I'll give versions and config files below, but I'll start with what > >> happens. I > >> start with an active/active cman+pacemaker+drbd+gfs2 cluster, with fencing > >> enabled. My fencing mechanism cuts power to a node by turning the load off > >> in > >> its UPS. The two nodes are hypatia-tb and orestes-tb. > >> > >> I want to test fencing and recovery. I start with both nodes running, and > >> resources properly running on both nodes. Then I simulate failure on one > >> node, > >> e.g., orestes-tb. I've done this with "crm node standby", "service > >> pacemaker > >> off", or by pulling the plug. As expected, all the resources move to > >> hypatia-tb, > >> with the drbd resource as Primary. > >> > >> When I try to bring orestes-tb back into the cluster with "crm node > >> online" or > >> "service pacemaker on" (the inverse of how I removed it), orestes-tb is > >> fenced. > >> OK, that makes sense, I guess; there's a potential split-brain situation. > > > > Not really, that should only happen if the two nodes can't see each > > other. Which should not be the case. > > Only when you pull the plug should orestes-tb be fenced. > > > > Or if you're using a fencing device that requires the node to have > > power, then I can imagine that turning it on again might result in > > fencing. > > But not for the other cases. > > I ran a test: I turned off pacemaker (and so DRBD) on orestes-tb. I "touch"ed > a > file on the hypatia-tb DRBD partition, to make it the "newer" one. > Then I turned > off pacemaker on hypatia-tb. Finally I turned on just drbd on hypatia-tb, then > on orestes-tb. > > From /var/log/messages on hypatia-tb: > > Feb 28 11:39:19 hypatia-tb kernel: d-con admin: Starting worker thread (from > drbdsetup [21822]) > Feb 28 11:39:19 hypatia-tb kernel: block drbd0: disk( Diskless -> Attaching ) > Feb 28 11:39:19 hypatia-tb kernel: d-con admin: Method to ensure write > ordering: > barrier > Feb 28 11:39:19 hypatia-tb kernel: block drbd0: max BIO size = 130560 > Feb 28 11:39:19 hypatia-tb kernel: block drbd0: Adjusting my ra_pages to > backing > device's (32 -> 768) > Feb 28 11:39:19 hypatia-tb kernel: block drbd0: drbd_bm_resize called with > capacity == 5611549368 > Feb 28 11:39:19 hypatia-tb kernel: block drbd0: resync bitmap: bits=701443671 > words=10960058 pages=21407 > Feb 28 11:39:19 hypatia-tb kernel: block drbd0: size = 2676 GB (2805774684 KB) > Feb 28 11:39:19 hypatia-tb kernel: block drbd0: bitmap READ of 21407 pages > took > 576 jiffies > Feb 28 11:39:20 hypatia-tb kernel: block drbd0: recounting of set bits took > additional 87 jiffies > Feb 28 11:39:20 hypatia-tb kernel: block drbd0: 55 MB (14114 bits) marked > out-of-sync by on disk bit-map. > Feb 28 11:39:20 hypatia-tb kernel: block drbd0: disk( Attaching -> UpToDate ) > pdsk( DUnknown -> Outdated ) > Feb 28 11:39:20 hypatia-tb kernel: block drbd0: attached to UUIDs > 862A336609FD27CD:BFFB722D5E3E15D7:6E63EC4258C86AF2:6E62EC4258C86AF2 > Feb 28 11:39:20 hypatia-tb kernel: d-con admin: conn( StandAlone -> > Unconnected ) > Feb 28 11:39:20 hypatia-tb kernel: d-con admin: Starting receiver thread (from > drbd_w_admin [21824]) > Feb 28 11:39:20 hypatia-tb kernel: d-con admin: receiver (re)started > Feb 28 11:39:20 hypatia-tb kernel: d-con admin: conn( Unconnected -> > WFConnection ) > > > From /var/log/messages on orestes-tb: > > Feb 28 11:39:51 orestes-tb kernel: d-con admin: Starting worker thread (from > drbdsetup [17827]) > Feb 28 11:39:51 orestes-tb kernel: block drbd0: disk( Diskless -> Attaching ) > Feb 28 11:39:51 orestes-tb kernel: d-con admin: Method to ensure write > ordering: > barrier > Feb 28 11:39:51 orestes-tb kernel: block drbd0: max BIO size = 130560 > Feb 28 11:39:51 orestes-tb kernel: block drbd0: Adjusting my ra_pages to > backing > device's (32 -> 768) > Feb 28 11:39:51 orestes-tb kernel: block drbd0: drbd_bm_resize called with > capacity == 5611549368 > Feb 28 11:39:51 orestes-tb kernel: block drbd0: resync bitmap: bits=701443671 > words=10960058 pages=21407 > Feb 28 11:39:51 orestes-tb kernel: block drbd0: size = 2676 GB (2805774684 KB) > Feb 28 11:39:52 orestes-tb kernel: block drbd0: bitmap READ of 21407 pages > took > 735 jiffies > Feb 28 11:39:52 orestes-tb kernel: block drbd0: recounting of set bits took > additional 93 jiffies > Feb 28 11:39:52 orestes-tb kernel: block drbd0: 0 KB (0 bits) marked > out-of-sync > by on disk bit-map. > Feb 28 11:39:52 orestes-tb kernel: block drbd0: disk( Attaching -> Outdated ) > Feb 28 11:39:52 orestes-tb kernel: block drbd0: attached to UUIDs > BFFB722D5E3E15D6:0000000000000000:6E63EC4258C86AF2:6E62EC4258C86AF2 > Feb 28 11:39:52 orestes-tb kernel: d-con admin: conn( StandAlone -> > Unconnected ) > Feb 28 11:39:52 orestes-tb kernel: d-con admin: Starting receiver thread (from > drbd_w_admin [17829]) > Feb 28 11:39:52 orestes-tb kernel: d-con admin: receiver (re)started > Feb 28 11:39:52 orestes-tb kernel: d-con admin: conn( Unconnected -> > WFConnection ) > Feb 28 11:39:52 orestes-tb kernel: d-con admin: Handshake successful: Agreed > network protocol version 100 > Feb 28 11:39:52 orestes-tb kernel: d-con admin: conn( WFConnection -> > WFReportParams ) > Feb 28 11:39:52 orestes-tb kernel: d-con admin: Starting asender thread (from > drbd_r_admin [17835]) > Feb 28 11:39:52 orestes-tb kernel: block drbd0: drbd_sync_handshake: > Feb 28 11:39:52 orestes-tb kernel: block drbd0: self > BFFB722D5E3E15D6:0000000000000000:6E63EC4258C86AF2:6E62EC4258C86AF2 bits:0 > flags:0 > Feb 28 11:39:52 orestes-tb kernel: block drbd0: peer > 862A336609FD27CC:BFFB722D5E3E15D7:6E63EC4258C86AF2:6E62EC4258C86AF2 bits:14114 > flags:0 > Feb 28 11:39:52 orestes-tb kernel: block drbd0: uuid_compare()=-1 by rule 50 > Feb 28 11:39:52 orestes-tb kernel: block drbd0: peer( Unknown -> Secondary ) > conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) > Feb 28 11:39:52 orestes-tb kernel: block drbd0: receive bitmap stats > [Bytes(packets)]: plain 0(0), RLE 176(1), total 176; compression: 100.0% > Feb 28 11:39:52 orestes-tb kernel: block drbd0: send bitmap stats > [Bytes(packets)]: plain 0(0), RLE 176(1), total 176; compression: 100.0% > Feb 28 11:39:52 orestes-tb kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID > ) > Feb 28 11:40:01 orestes-tb corosync[2193]: [TOTEM ] A processor failed, > forming new configuration. > Feb 28 11:40:03 orestes-tb corosync[2193]: [QUORUM] Members[1]: 2 > Feb 28 11:40:03 orestes-tb corosync[2193]: [TOTEM ] A processor joined or > left > the membership and a new membership was formed. > Feb 28 11:40:03 orestes-tb kernel: dlm: closing connection to node 1 > Feb 28 11:40:03 orestes-tb corosync[2193]: [CPG ] chosen downlist: sender > r(0) ip(129.236.252.14) r(1) ip(192.168.100.6) ; members(old:2 left:1) > Feb 28 11:40:03 orestes-tb corosync[2193]: [MAIN ] Completed service > synchronization, ready to provide service. > Feb 28 11:40:03 orestes-tb fenced[2247]: fencing node > hypatia-tb.nevis.columbia.edu > > > As far as I can tell, hypatia-tb's drbd comes up, says "I'm UpToDate" and > waits > for a connection from orestes-tb. orestes-tb's drbd comes up, says "I'm > UpToDate" No, it clearly says "I'm Outdated" from the logs above: | Feb 28 11:39:52 orestes-tb kernel: block drbd0: disk( Attaching -> Outdated ) It outdated itself voluntarily when it was told to disconnect from a still running primary, because of your "fencing resource-only" configuration. Don't rely on that: in a real incident, the replication link will just fail, in which case you really need the "fencing resource-and-stonith", and a suitable fence-peer handler. > and starts the sync process with hypatia-tb. Then cman+corosync steps > in on orestes-tb and fences hypatia-tb, before the sync can proceed. > > I ran another test. I did the same thing as the previous paragraph, except > that > I made sure both cman and pacemaker were off (I had to reboot to make sure) > and > just started drbd on both nodes. Sure enough, drbd was able to sync without > split-brain or fencing. So this is a cman/corosync issue, not a drbd issue. You still may retry the whole thing with drbd 8.3.12, just to make sure there is no hidden DRBD 8.4.1 instability. > While I was setting up the test for the previous paragraph, there was a > problem > with another resource (ocf:heartbeat:exportfs) that couldn't be properly > monitored on either node. This led to a cycle of fencing where each node would > successively fence the other because the exportfs resource couldn't run on > either node. I had to quickly change my configuration to turn off monitoring > on > the resource. > > So it seems like cman+corosync is the issue. It's as if I"m "over-fencing." > > Any ideas? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems