yes william Now try clvmd -d and see what happen
locking_type = 3 it's lvm cluster lock type Il giorno 15 marzo 2012 16:15, William Seligman <selig...@nevis.columbia.edu > ha scritto: > On 3/15/12 5:18 AM, emmanuel segura wrote: > > > The first thing i seen in your clvmd log it's this > > > > ============================================= > > WARNING: Locking disabled. Be careful! This could corrupt your metadata. > > ============================================= > > I saw that too, and thought the same as you did. I did some checks (see > below), > but some web searches suggest that this message is a normal consequence of > clvmd > initialization; e.g., > > <http://markmail.org/message/vmy53pcv52wu7ghx> > > > use this command > > > > lvmconf --enable-cluster > > > > and remember for cman+pacemaker you don't need qdisk > > Before I tried your lvmconf suggestion, here was my /etc/lvm/lvm.conf: > <http://pastebin.com/841VZRzW> and the output of "lvm dumpconfig": > <http://pastebin.com/rtw8c3Pf>. > > Then I did as you suggested, but with a check to see if anything changed: > > # cd /etc/lvm/ > # cp lvm.conf lvm.conf.cluster > # lvmconf --enable-cluster > # diff lvm.conf lvm.conf.cluster > # > > So the key lines have been there all along: > locking_type = 3 > fallback_to_local_locking = 0 > > > > Il giorno 14 marzo 2012 23:17, William Seligman < > selig...@nevis.columbia.edu > >> ha scritto: > > > >> On 3/14/12 9:20 AM, emmanuel segura wrote: > >>> Hello William > >>> > >>> i did new you are using drbd and i dont't know what type of > configuration > >>> you using > >>> > >>> But it's better you try to start clvm with clvmd -d > >>> > >>> like thak we can see what it's the problem > >> > >> For what it's worth, here's the output of running clvmd -d on the node > that > >> stays up: <http://pastebin.com/sWjaxAEF> > >> > >> What's probably important in that big mass of output are the last two > >> lines. Up > >> to that point, I have both nodes up and running cman + clvmd; > cluster.conf > >> is > >> here: <http://pastebin.com/w5XNYyAX> > >> > >> At the time of the next-to-the-last line, I cut power to the other node. > >> > >> At the time of the last line, I run "vgdisplay" on the remaining node, > >> which > >> hangs forever. > >> > >> After a lot of web searching, I found that I'm not the only one with > this > >> problem. Here's one case that doesn't seem relevant to me, since I don't > >> use > >> qdisk: > >> < > http://www.redhat.com/archives/linux-cluster/2007-October/msg00212.html>. > >> Here's one with the same problem with the same OS: > >> <http://bugs.centos.org/view.php?id=5229>, but with no resolution. > >> > >> Out of curiosity, has anyone on this list made a two-node cman+clvmd > >> cluster > >> work for them? > >> > >>> Il giorno 14 marzo 2012 14:02, William Seligman < > >> selig...@nevis.columbia.edu > >>>> ha scritto: > >>> > >>>> On 3/14/12 6:02 AM, emmanuel segura wrote: > >>>> > >>>> I think it's better you make clvmd start at boot > >>>>> > >>>>> chkconfig cman on ; chkconfig clvmd on > >>>>> > >>>> > >>>> I've already tried it. It doesn't work. The problem is that my LVM > >>>> information is on the drbd. If I start up clvmd before drbd, it won't > >> find > >>>> the logical volumes. > >>>> > >>>> I also don't see why that would make a difference (although this could > >> be > >>>> part of the confusion): a service is a service. I've tried starting up > >>>> clvmd inside and outside pacemaker control, with the same problem. Why > >>>> would starting clvmd at boot make a difference? > >>>> > >>>> Il giorno 13 marzo 2012 23:29, William Seligman<seligman@nevis.** > >>>>> columbia.edu <selig...@nevis.columbia.edu> > >>>>> > >>>>>> ha scritto: > >>>>>> > >>>>> > >>>>> On 3/13/12 5:50 PM, emmanuel segura wrote: > >>>>>> > >>>>>> So if you using cman why you use lsb::clvmd > >>>>>>> > >>>>>>> I think you are very confused > >>>>>>> > >>>>>> > >>>>>> I don't dispute that I may be very confused! > >>>>>> > >>>>>> However, from what I can tell, I still need to run clvmd even if > >>>>>> I'm running cman (I'm not using rgmanager). If I just run cman, > >>>>>> gfs2 and any other form of mount fails. If I run cman, then clvmd, > >>>>>> then gfs2, everything behaves normally. > >>>>>> > >>>>>> Going by these instructions: > >>>>>> > >>>>>> <https://alteeve.com/w/2-Node_**Red_Hat_KVM_Cluster_Tutorial< > >> https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial> > >>>>>>> > >>>>>> > >>>>>> the resources he puts under "cluster control" (rgmanager) I have to > >>>>>> put under pacemaker control. Those include drbd, clvmd, and gfs2. > >>>>>> > >>>>>> The difference between what I've got, and what's in "Clusters From > >>>>>> Scratch", is in CFS they assign one DRBD volume to a single > >>>>>> filesystem. I create an LVM physical volume on my DRBD resource, > >>>>>> as in the above tutorial, and so I have to start clvmd or the > >>>>>> logical volumes in the DRBD partition won't be recognized.>> Is > >>>>>> there some way to get logical volumes recognized automatically by > >>>>>> cman without rgmanager that I've missed? > >>>>>> > >>>>> > >>>>> Il giorno 13 marzo 2012 22:42, William Seligman< > >>>>>>> > >>>>>> selig...@nevis.columbia.edu > >>>>>> > >>>>>>> ha scritto: > >>>>>>>> > >>>>>>> > >>>>>>> On 3/13/12 12:29 PM, William Seligman wrote: > >>>>>>>> > >>>>>>>>> I'm not sure if this is a "Linux-HA" question; please direct > >>>>>>>>> me to the appropriate list if it's not. > >>>>>>>>> > >>>>>>>>> I'm setting up a two-node cman+pacemaker+gfs2 cluster as > >>>>>>>>> described in "Clusters From Scratch." Fencing is through > >>>>>>>>> forcibly rebooting a node by cutting and restoring its power > >>>>>>>>> via UPS. > >>>>>>>>> > >>>>>>>>> My fencing/failover tests have revealed a problem. If I > >>>>>>>>> gracefully turn off one node ("crm node standby"; "service > >>>>>>>>> pacemaker stop"; "shutdown -r now") all the resources > >>>>>>>>> transfer to the other node with no problems. If I cut power > >>>>>>>>> to one node (as would happen if it were fenced), the > >>>>>>>>> lsb::clvmd resource on the remaining node eventually fails. > >>>>>>>>> Since all the other resources depend on clvmd, all the > >>>>>>>>> resources on the remaining node stop and the cluster is left > >>>>>>>>> with nothing running. > >>>>>>>>> > >>>>>>>>> I've traced why the lsb::clvmd fails: The monitor/status > >>>>>>>>> command includes "vgdisplay", which hangs indefinitely. > >>>>>>>>> Therefore the monitor will always time-out. > >>>>>>>>> > >>>>>>>>> So this isn't a problem with pacemaker, but with clvmd/dlm: > >>>>>>>>> If a node is cut off, the cluster isn't handling it properly. > >>>>>>>>> Has anyone on this list seen this before? Any ideas? > >>>>>>>>> > >>>>>>>>>> Details: > >>>>>>> > >>>>>>>> > >>>>>>>>> versions: > >>>>>>>>> Redhat Linux 6.2 (kernel 2.6.32) > >>>>>>>>> cman-3.0.12.1 > >>>>>>>>> corosync-1.4.1 > >>>>>>>>> pacemaker-1.1.6 > >>>>>>>>> lvm2-2.02.87 > >>>>>>>>> lvm2-cluster-2.02.87 > >>>>>>>>> > >>>>>>>> > >>>>>>>> This may be a Linux-HA question after all! > >>>>>>>> > >>>>>>>> I ran a few more tests. Here's the output from a typical test of > >>>>>>>> > >>>>>>>> grep -E "(dlm|gfs2}clvmd|fenc|syslogd)**" /var/log/messages > >>>>>>>> > >>>>>>>> <http://pastebin.com/uqC6bc1b> > >>>>>>>> > >>>>>>>> It looks like what's happening is that the fence agent (one I > >>>>>>>> wrote) is not returning the proper error code when a node > >>>>>>>> crashes. According to this page, if a fencing agent fails GFS2 > >>>>>>>> will freeze to protect the data: > >>>>>>>> > >>>>>>>> <http://docs.redhat.com/docs/**en-US/Red_Hat_Enterprise_** > >>>>>>>> Linux/6/html/Global_File_**System_2/s1-gfs2hand-allnodes.**html< > >> > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html > >>>> > >>>>>>>> > >>>>>>>> As a test, I tried to fence my test node via standard means: > >>>>>>>> > >>>>>>>> stonith_admin -F orestes-corosync.nevis.**columbia.edu< > >> http://orestes-corosync.nevis.columbia.edu> > >>>>>>>> > >>>>>>>> These were the log messages, which show that stonith_admin did > >>>>>>>> its job and CMAN was notified of the > >>>>>>>> fencing:<http://pastebin.com/**jaH820Bv < > >> http://pastebin.com/jaH820Bv> > >>>>>>>>> . > >>>>>>>> > >>>>>>>> Unfortunately, I still got the gfs2 freeze, so this is not the > >>>>>>>> complete story. > >>>>>>>> > >>>>>>>> First things first. I vaguely recall a web page that went over > >>>>>>>> the STONITH return codes, but I can't locate it again. Is there > >>>>>>>> any reference to the return codes expected from a fencing > >>>>>>>> agent, perhaps as function of the state of the fencing device? > > > > -- > Bill Seligman | Phone: (914) 591-2823 > Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu > PO Box 137 | > Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ > > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- esta es mi vida e me la vivo hasta que dios quiera _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems