On 3/15/12 12:15 PM, emmanuel segura wrote: > Ho did you created your volume group
pvcreate /dev/drbd0 vgcreate -c y ADMIN /dev/drbd0 lvcreate -L 200G -n usr ADMIN # ... and so on # "Nevis-HA" is the cluster name I used in cluster.conf mkfs.gfs2 -p lock_dlm -j 2 -t Nevis_HA:usr /dev/ADMIN/usr # ... and so on > give me the output of vgs command when the cluster it's up Here it is: Logging initialised at Thu Mar 15 12:40:39 2012 Set umask from 0022 to 0077 Finding all volume groups Finding volume group "ROOT" Finding volume group "ADMIN" VG #PV #LV #SN Attr VSize VFree ADMIN 1 5 0 wz--nc 2.61t 765.79g ROOT 1 2 0 wz--n- 117.16g 0 Wiping internal VG cache I assume the "c" in the ADMIN attributes means that clustering is turned on? > Il giorno 15 marzo 2012 17:06, William Seligman <selig...@nevis.columbia.edu >> ha scritto: > >> On 3/15/12 11:50 AM, emmanuel segura wrote: >>> yes william >>> >>> Now try clvmd -d and see what happen >>> >>> locking_type = 3 it's lvm cluster lock type >> >> Since you asked for confirmation, here it is: the output of 'clvmd -d' >> just now. >> <http://pastebin.com/bne8piEw>. I crashed the other node at Mar 15 >> 12:02:35, >> when you see the only additional line of output. >> >> I don't see any particular difference between this and the previous result >> <http://pastebin.com/sWjaxAEF>, which suggests that I had cluster locking >> enabled before, and still do now. >> >>> Il giorno 15 marzo 2012 16:15, William Seligman < >> selig...@nevis.columbia.edu >>>> ha scritto: >>> >>>> On 3/15/12 5:18 AM, emmanuel segura wrote: >>>> >>>>> The first thing i seen in your clvmd log it's this >>>>> >>>>> ============================================= >>>>> WARNING: Locking disabled. Be careful! This could corrupt your >> metadata. >>>>> ============================================= >>>> >>>> I saw that too, and thought the same as you did. I did some checks (see >>>> below), >>>> but some web searches suggest that this message is a normal consequence >> of >>>> clvmd >>>> initialization; e.g., >>>> >>>> <http://markmail.org/message/vmy53pcv52wu7ghx> >>>> >>>>> use this command >>>>> >>>>> lvmconf --enable-cluster >>>>> >>>>> and remember for cman+pacemaker you don't need qdisk >>>> >>>> Before I tried your lvmconf suggestion, here was my /etc/lvm/lvm.conf: >>>> <http://pastebin.com/841VZRzW> and the output of "lvm dumpconfig": >>>> <http://pastebin.com/rtw8c3Pf>. >>>> >>>> Then I did as you suggested, but with a check to see if anything >> changed: >>>> >>>> # cd /etc/lvm/ >>>> # cp lvm.conf lvm.conf.cluster >>>> # lvmconf --enable-cluster >>>> # diff lvm.conf lvm.conf.cluster >>>> # >>>> >>>> So the key lines have been there all along: >>>> locking_type = 3 >>>> fallback_to_local_locking = 0 >>>> >>>> >>>>> Il giorno 14 marzo 2012 23:17, William Seligman < >>>> selig...@nevis.columbia.edu >>>>>> ha scritto: >>>>> >>>>>> On 3/14/12 9:20 AM, emmanuel segura wrote: >>>>>>> Hello William >>>>>>> >>>>>>> i did new you are using drbd and i dont't know what type of >>>> configuration >>>>>>> you using >>>>>>> >>>>>>> But it's better you try to start clvm with clvmd -d >>>>>>> >>>>>>> like thak we can see what it's the problem >>>>>> >>>>>> For what it's worth, here's the output of running clvmd -d on the node >>>> that >>>>>> stays up: <http://pastebin.com/sWjaxAEF> >>>>>> >>>>>> What's probably important in that big mass of output are the last two >>>>>> lines. Up >>>>>> to that point, I have both nodes up and running cman + clvmd; >>>> cluster.conf >>>>>> is >>>>>> here: <http://pastebin.com/w5XNYyAX> >>>>>> >>>>>> At the time of the next-to-the-last line, I cut power to the other >> node. >>>>>> >>>>>> At the time of the last line, I run "vgdisplay" on the remaining node, >>>>>> which >>>>>> hangs forever. >>>>>> >>>>>> After a lot of web searching, I found that I'm not the only one with >>>> this >>>>>> problem. Here's one case that doesn't seem relevant to me, since I >> don't >>>>>> use >>>>>> qdisk: >>>>>> < >>>> http://www.redhat.com/archives/linux-cluster/2007-October/msg00212.html >>> . >>>>>> Here's one with the same problem with the same OS: >>>>>> <http://bugs.centos.org/view.php?id=5229>, but with no resolution. >>>>>> >>>>>> Out of curiosity, has anyone on this list made a two-node cman+clvmd >>>>>> cluster >>>>>> work for them? >>>>>> >>>>>>> Il giorno 14 marzo 2012 14:02, William Seligman < >>>>>> selig...@nevis.columbia.edu >>>>>>>> ha scritto: >>>>>>> >>>>>>>> On 3/14/12 6:02 AM, emmanuel segura wrote: >>>>>>>> >>>>>>>> I think it's better you make clvmd start at boot >>>>>>>>> >>>>>>>>> chkconfig cman on ; chkconfig clvmd on >>>>>>>>> >>>>>>>> >>>>>>>> I've already tried it. It doesn't work. The problem is that my LVM >>>>>>>> information is on the drbd. If I start up clvmd before drbd, it >> won't >>>>>> find >>>>>>>> the logical volumes. >>>>>>>> >>>>>>>> I also don't see why that would make a difference (although this >> could >>>>>> be >>>>>>>> part of the confusion): a service is a service. I've tried starting >> up >>>>>>>> clvmd inside and outside pacemaker control, with the same problem. >> Why >>>>>>>> would starting clvmd at boot make a difference? >>>>>>>> >>>>>>>> Il giorno 13 marzo 2012 23:29, William Seligman<seligman@nevis.** >>>>>>>>> columbia.edu <selig...@nevis.columbia.edu> >>>>>>>>> >>>>>>>>>> ha scritto: >>>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/13/12 5:50 PM, emmanuel segura wrote: >>>>>>>>>> >>>>>>>>>> So if you using cman why you use lsb::clvmd >>>>>>>>>>> >>>>>>>>>>> I think you are very confused >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I don't dispute that I may be very confused! >>>>>>>>>> >>>>>>>>>> However, from what I can tell, I still need to run clvmd even if >>>>>>>>>> I'm running cman (I'm not using rgmanager). If I just run cman, >>>>>>>>>> gfs2 and any other form of mount fails. If I run cman, then clvmd, >>>>>>>>>> then gfs2, everything behaves normally. >>>>>>>>>> >>>>>>>>>> Going by these instructions: >>>>>>>>>> >>>>>>>>>> <https://alteeve.com/w/2-Node_**Red_Hat_KVM_Cluster_Tutorial< >>>>>> https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> the resources he puts under "cluster control" (rgmanager) I have >> to >>>>>>>>>> put under pacemaker control. Those include drbd, clvmd, and gfs2. >>>>>>>>>> >>>>>>>>>> The difference between what I've got, and what's in "Clusters From >>>>>>>>>> Scratch", is in CFS they assign one DRBD volume to a single >>>>>>>>>> filesystem. I create an LVM physical volume on my DRBD resource, >>>>>>>>>> as in the above tutorial, and so I have to start clvmd or the >>>>>>>>>> logical volumes in the DRBD partition won't be recognized.>> Is >>>>>>>>>> there some way to get logical volumes recognized automatically by >>>>>>>>>> cman without rgmanager that I've missed? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Il giorno 13 marzo 2012 22:42, William Seligman< >>>>>>>>>>> >>>>>>>>>> selig...@nevis.columbia.edu >>>>>>>>>> >>>>>>>>>>> ha scritto: >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3/13/12 12:29 PM, William Seligman wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I'm not sure if this is a "Linux-HA" question; please direct >>>>>>>>>>>>> me to the appropriate list if it's not. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm setting up a two-node cman+pacemaker+gfs2 cluster as >>>>>>>>>>>>> described in "Clusters From Scratch." Fencing is through >>>>>>>>>>>>> forcibly rebooting a node by cutting and restoring its power >>>>>>>>>>>>> via UPS. >>>>>>>>>>>>> >>>>>>>>>>>>> My fencing/failover tests have revealed a problem. If I >>>>>>>>>>>>> gracefully turn off one node ("crm node standby"; "service >>>>>>>>>>>>> pacemaker stop"; "shutdown -r now") all the resources >>>>>>>>>>>>> transfer to the other node with no problems. If I cut power >>>>>>>>>>>>> to one node (as would happen if it were fenced), the >>>>>>>>>>>>> lsb::clvmd resource on the remaining node eventually fails. >>>>>>>>>>>>> Since all the other resources depend on clvmd, all the >>>>>>>>>>>>> resources on the remaining node stop and the cluster is left >>>>>>>>>>>>> with nothing running. >>>>>>>>>>>>> >>>>>>>>>>>>> I've traced why the lsb::clvmd fails: The monitor/status >>>>>>>>>>>>> command includes "vgdisplay", which hangs indefinitely. >>>>>>>>>>>>> Therefore the monitor will always time-out. >>>>>>>>>>>>> >>>>>>>>>>>>> So this isn't a problem with pacemaker, but with clvmd/dlm: >>>>>>>>>>>>> If a node is cut off, the cluster isn't handling it properly. >>>>>>>>>>>>> Has anyone on this list seen this before? Any ideas? >>>>>>>>>>>>> >>>>>>>>>>>>>> Details: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> versions: >>>>>>>>>>>>> Redhat Linux 6.2 (kernel 2.6.32) >>>>>>>>>>>>> cman-3.0.12.1 >>>>>>>>>>>>> corosync-1.4.1 >>>>>>>>>>>>> pacemaker-1.1.6 >>>>>>>>>>>>> lvm2-2.02.87 >>>>>>>>>>>>> lvm2-cluster-2.02.87 >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This may be a Linux-HA question after all! >>>>>>>>>>>> >>>>>>>>>>>> I ran a few more tests. Here's the output from a typical test of >>>>>>>>>>>> >>>>>>>>>>>> grep -E "(dlm|gfs2}clvmd|fenc|syslogd)**" /var/log/messages >>>>>>>>>>>> >>>>>>>>>>>> <http://pastebin.com/uqC6bc1b> >>>>>>>>>>>> >>>>>>>>>>>> It looks like what's happening is that the fence agent (one I >>>>>>>>>>>> wrote) is not returning the proper error code when a node >>>>>>>>>>>> crashes. According to this page, if a fencing agent fails GFS2 >>>>>>>>>>>> will freeze to protect the data: >>>>>>>>>>>> >>>>>>>>>>>> <http://docs.redhat.com/docs/**en-US/Red_Hat_Enterprise_** >>>>>>>>>>>> Linux/6/html/Global_File_**System_2/s1-gfs2hand-allnodes.**html< >>>>>> >>>> >> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html >>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> As a test, I tried to fence my test node via standard means: >>>>>>>>>>>> >>>>>>>>>>>> stonith_admin -F orestes-corosync.nevis.**columbia.edu< >>>>>> http://orestes-corosync.nevis.columbia.edu> >>>>>>>>>>>> >>>>>>>>>>>> These were the log messages, which show that stonith_admin did >>>>>>>>>>>> its job and CMAN was notified of the >>>>>>>>>>>> fencing:<http://pastebin.com/**jaH820Bv < >>>>>> http://pastebin.com/jaH820Bv> >>>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately, I still got the gfs2 freeze, so this is not the >>>>>>>>>>>> complete story. >>>>>>>>>>>> >>>>>>>>>>>> First things first. I vaguely recall a web page that went over >>>>>>>>>>>> the STONITH return codes, but I can't locate it again. Is there >>>>>>>>>>>> any reference to the return codes expected from a fencing >>>>>>>>>>>> agent, perhaps as function of the state of the fencing device? -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems