On 3/22/12 2:43 PM, William Seligman wrote: > On 3/20/12 4:55 PM, Lars Ellenberg wrote: >> On Fri, Mar 16, 2012 at 05:06:04PM -0400, William Seligman wrote: >>> On 3/16/12 12:12 PM, William Seligman wrote: >>>> On 3/16/12 7:02 AM, Andreas Kurz wrote: >>>>> On 03/15/2012 11:50 PM, William Seligman wrote: >>>>>> On 3/15/12 6:07 PM, William Seligman wrote: >>>>>>> On 3/15/12 6:05 PM, William Seligman wrote: >>>>>>>> On 3/15/12 4:57 PM, emmanuel segura wrote: >>>>>>>> >>>>>>>>> we can try to understand what happen when clvm hang >>>>>>>>> >>>>>>>>> edit the /etc/lvm/lvm.conf and change level = 7 in the log session >>>>>>>>> and >>>>>>>>> uncomment this line >>>>>>>>> >>>>>>>>> file = "/var/log/lvm2.log" >>>>>>>> >>>>>>>> Here's the tail end of the file (the original is 1.6M). Because there >>>>>>>> no times >>>>>>>> in the log, it's hard for me to point you to the point where I crashed >>>>>>>> the other >>>>>>>> system. I think (though I'm not sure) that the crash happened after >>>>>>>> the last >>>>>>>> occurrence of >>>>>>>> >>>>>>>> cache/lvmcache.c:1484 Wiping internal VG cache >>>>>>>> >>>>>>>> Honestly, it looks like a wall of text to me. Does it suggest anything >>>>>>>> to you? >>>>>>> >>>>>>> Maybe it would help if I included the link to the pastebin where I put >>>>>>> the >>>>>>> output: <http://pastebin.com/8pgW3Muw> >>>>>> >>>>>> Could the problem be with lvm+drbd? >>>>>> >>>>>> In lvm2.conf, I see this sequence of lines pre-crash: >>>>>> >>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT >>>>>> device/dev-io.c:271 /dev/md0: size is 1027968 sectors >>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes >>>>>> device/dev-io.c:588 Closed /dev/md0 >>>>>> device/dev-io.c:271 /dev/md0: size is 1027968 sectors >>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT >>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes >>>>>> device/dev-io.c:588 Closed /dev/md0 >>>>>> filters/filter-composite.c:31 Using /dev/md0 >>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT >>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes >>>>>> label/label.c:186 /dev/md0: No label detected >>>>>> device/dev-io.c:588 Closed /dev/md0 >>>>>> device/dev-io.c:535 Opened /dev/drbd0 RO O_DIRECT >>>>>> device/dev-io.c:271 /dev/drbd0: size is 5611549368 sectors >>>>>> device/dev-io.c:137 /dev/drbd0: block size is 4096 bytes >>>>>> device/dev-io.c:588 Closed /dev/drbd0 >>>>>> device/dev-io.c:271 /dev/drbd0: size is 5611549368 sectors >>>>>> device/dev-io.c:535 Opened /dev/drbd0 RO O_DIRECT >>>>>> device/dev-io.c:137 /dev/drbd0: block size is 4096 bytes >>>>>> device/dev-io.c:588 Closed /dev/drbd0 >>>>>> >>>>>> I interpret this: Look at /dev/md0, get some info, close; look at >>>>>> /dev/drbd0, >>>>>> get some info, close. >>>>>> >>>>>> Post-crash, I see: >>>>>> >>>>>> evice/dev-io.c:535 Opened /dev/md0 RO O_DIRECT >>>>>> device/dev-io.c:271 /dev/md0: size is 1027968 sectors >>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes >>>>>> device/dev-io.c:588 Closed /dev/md0 >>>>>> device/dev-io.c:271 /dev/md0: size is 1027968 sectors >>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT >>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes >>>>>> device/dev-io.c:588 Closed /dev/md0 >>>>>> filters/filter-composite.c:31 Using /dev/md0 >>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT >>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes >>>>>> label/label.c:186 /dev/md0: No label detected >>>>>> device/dev-io.c:588 Closed /dev/md0 >>>>>> device/dev-io.c:535 Opened /dev/drbd0 RO O_DIRECT >>>>>> device/dev-io.c:271 /dev/drbd0: size is 5611549368 sectors >>>>>> device/dev-io.c:137 /dev/drbd0: block size is 4096 bytes >>>>>> >>>>>> ... and then it hangs. Comparing the two, it looks like it can't close >>>>>> /dev/drbd0. >>>>>> >>>>>> If I look at /proc/drbd when I crash one node, I see this: >>>>>> >>>>>> # cat /proc/drbd >>>>>> version: 8.3.12 (api:88/proto:86-96) >>>>>> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by >>>>>> r...@hypatia-tb.nevis.columbia.edu, 2012-02-28 18:01:34 >>>>>> 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C s----- >>>>>> ns:7000064 nr:0 dw:0 dr:7049728 al:0 bm:516 lo:0 pe:0 ua:0 ap:0 ep:1 >>>>>> wo:b oos:0 >>>>> >>>>> s----- ... DRBD suspended io, most likely because of it's >>>>> fencing-policy. For valid dual-primary setups you have to use >>>>> "resource-and-stonith" policy and a working "fence-peer" handler. In >>>>> this mode I/O is suspended until fencing of peer was succesful. Question >>>>> is, why the peer does _not_ also suspend its I/O because obviously >>>>> fencing was not successful ..... >>>>> >>>>> So with a correct DRBD configuration one of your nodes should already >>>>> have been fenced because of connection loss between nodes (on drbd >>>>> replication link). >>>>> >>>>> You can use e.g. that nice fencing script: >>>>> >>>>> http://goo.gl/O4N8f >>>> >>>> This is the output of "drbdadm dump admin": <http://pastebin.com/kTxvHCtx> >>>> >>>> So I've got resource-and-stonith. I gather from an earlier thread that >>>> obliterate-peer.sh is more-or-less equivalent in functionality with >>>> stonith_admin_fence_peer.sh: >>>> >>>> <http://www.gossamer-threads.com/lists/linuxha/users/78504#78504> >>>> >>>> At the moment I'm pursuing the possibility that I'm returning the wrong >>>> return >>>> codes from my fencing agent: >>>> >>>> <http://www.gossamer-threads.com/lists/linuxha/users/78572> >>> >>> I cleaned up my fencing agent, making sure its return code matched those >>> returned by other agents in /usr/sbin/fence_, and allowing for some delay >>> issues >>> in reading the UPS status. But... >>> >>>> After that, I'll look at another suggestion with lvm.conf: >>>> >>>> <http://www.gossamer-threads.com/lists/linuxha/users/78796#78796> >>>> >>>> Then I'll try DRBD 8.4.1. Hopefully one of these is the source of the >>>> issue. >>> >>> Failure on all three counts. >> >> May I suggest you double check the permissions on your fence peer script? >> I suspect you may simply have forgotten the "chmod +x" . >> >> Test with "drbdadm fence-peer minor-0" from the command line. > > I still haven't solved the problem, but this advice has gotten me further than > before. > > First, Lars was correct: I did not have execute permissions set on my fence > peer > scripts. (D'oh!) I turned them on, but that did not change anything: > cman+clvmd > still hung on the vgdisplay command if I crashed the peer node. > > I started up both nodes again (cman+pacemaker+drbd+clvmd) and tried Lars' > suggested command. I didn't save the response for this message (d'oh again!) > but > it said that the fence-peer script had failed. > > Hmm. The peer was definitely shutting down, so my fencing script is working. I > went over it, comparing the return codes to those of the existing scripts, and > made some changes. Here's my current script: <http://pastebin.com/nUnYVcBK>. > > Up until now my fence-peer scripts had either been Lon Hohberger's > obliterate-peer.sh or Digimer's rhcs_fence. I decided to try > stonith_admin-fence-peer.sh that Andreas Kurz recommended; unlike the first > two > scripts, which fence using fence_node, the latter script just calls > stonith_admin. > > When I tried the stonith_admin-fence-peer.sh script, it worked: > > # drbdadm fence-peer minor-0 > stonith_admin-fence-peer.sh[10886]: stonith_admin successfully fenced peer > orestes-corosync.nevis.columbia.edu. > > Power was cut on the peer, the remaining node stayed up. Then I brought up the > peer with: > > stonith_admin -U orestes-corosync.nevis.columbia.edu > > BUT: When the restored peer came up and started to run cman, the clvmd hung on > the main node again. > > After cycling through some more tests, I found that if I brought down the peer > with drbdadm, then brought up with the peer with no HA services, then started > drbd and then cman, the cluster remained intact. > > If I crashed the peer, the scheme in the previous paragraph didn't work. I > bring > up drbd, check that the disks are both UpToDate, then bring up cman. At that > point the vgdisplay on the main node takes so long to run that clvmd will > time out: > > vgdisplay Error locking on node orestes-corosync.nevis.columbia.edu: Command > timed out > > I timed how long it took vgdisplay to run. I might be able to work around this > by setting the timeout on my clvmd resource to 300s, but that seems to be a > band-aid for an underlying problem. Any suggestions on what else I could > check?
I've done some more tests. Still no solution, just an observation: The "death mode" appears to be: - Two nodes running cman+pacemaker+drbd+clvmd - Take one node down = one remaining node w/cman+pacemaker+drbd+clvmd - Start up dead node. If it ever gets into a state in which it's running cman but not clvmd, clvmd on the uncrashed node hangs. - Conversely, if I bring up drbd, make it primary, start cman+clvmd, there's no problem on the uncrashed node. My guess is that clvmd is getting the number of nodes it expects from cman. When the formally-dead node starts running cman, the number of cluster nodes goes to 2 (I checked with 'cman_tool status') but the number of nodes running clvmd is still 1, hence the crash. Does this guess make sense? -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems