Hello Willian So if you using cman why you use lsb::clvmd
I think you are very confused Il giorno 13 marzo 2012 22:42, William Seligman <selig...@nevis.columbia.edu > ha scritto: > On 3/13/12 12:29 PM, William Seligman wrote: > > I'm not sure if this is a "Linux-HA" question; please direct me to the > > appropriate list if it's not. > > > > I'm setting up a two-node cman+pacemaker+gfs2 cluster as described in > "Clusters > > From Scratch." Fencing is through forcibly rebooting a node by cutting > and > > restoring its power via UPS. > > > > My fencing/failover tests have revealed a problem. If I gracefully turn > off one > > node ("crm node standby"; "service pacemaker stop"; "shutdown -r now") > all the > > resources transfer to the other node with no problems. If I cut power to > one > > node (as would happen if it were fenced), the lsb::clvmd resource on the > > remaining node eventually fails. Since all the other resources depend on > clvmd, > > all the resources on the remaining node stop and the cluster is left with > > nothing running. > > > > I've traced why the lsb::clvmd fails: The monitor/status command includes > > "vgdisplay", which hangs indefinitely. Therefore the monitor will always > time-out. > > > > So this isn't a problem with pacemaker, but with clvmd/dlm: If a node is > cut > > off, the cluster isn't handling it properly. Has anyone on this list > seen this > > before? Any ideas? > > > > Details: > > > > versions: > > Redhat Linux 6.2 (kernel 2.6.32) > > cman-3.0.12.1 > > corosync-1.4.1 > > pacemaker-1.1.6 > > lvm2-2.02.87 > > lvm2-cluster-2.02.87 > > This may be a Linux-HA question after all! > > I ran a few more tests. Here's the output from a typical test of > > grep -E "(dlm|gfs2}clvmd|fenc|syslogd)" /var/log/messages > > <http://pastebin.com/uqC6bc1b> > > It looks like what's happening is that the fence agent (one I wrote) is not > returning the proper error code when a node crashes. According to this > page, if > a fencing agent fails GFS2 will freeze to protect the data: > > < > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html > > > > As a test, I tried to fence my test node via standard means: > > stonith_admin -F orestes-corosync.nevis.columbia.edu > > These were the log messages, which show that stonith_admin did its job and > CMAN > was notified of the fencing: <http://pastebin.com/jaH820Bv>. > > Unfortunately, I still got the gfs2 freeze, so this is not the complete > story. > > First things first. I vaguely recall a web page that went over the STONITH > return codes, but I can't locate it again. Is there any reference to the > return > codes expected from a fencing agent, perhaps as function of the state of > the > fencing device? > -- > Bill Seligman | Phone: (914) 591-2823 > Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu > PO Box 137 | > Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ > > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- esta es mi vida e me la vivo hasta que dios quiera _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems