gfs2 hangs if a node crashes

emmanuel segura Wed, 14 Mar 2012 03:02:45 -0700

Hello William

I think it's better you make clvmd start at boot


chkconfig cman on ; chkconfig clvmd on



Il giorno 13 marzo 2012 23:29, William Seligman <selig...@nevis.columbia.edu
> ha scritto:

> On 3/13/12 5:50 PM, emmanuel segura wrote:
>
> > So if you using cman why you use lsb::clvmd
> >
> > I think you are very confused
>
> I don't dispute that I may be very confused!
>
> However, from what I can tell, I still need to run clvmd even if I'm
> running
> cman (I'm not using rgmanager). If I just run cman, gfs2 and any other
> form of
> mount fails. If I run cman, then clvmd, then gfs2, everything behaves
> normally.
>
> Going by these instructions:
>
> <https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial>
>
> the resources he puts under "cluster control" (rgmanager) I have to put
> under
> pacemaker control. Those include drbd, clvmd, and gfs2.
>
> The difference between what I've got, and what's in "Clusters From
> Scratch", is
> in CFS they assign one DRBD volume to a single filesystem. I create an LVM
> physical volume on my DRBD resource, as in the above tutorial, and so I
> have to
> start clvmd or the logical volumes in the DRBD partition won't be
> recognized.
>
> Is there some way to get logical volumes recognized automatically by cman
> without rgmanager that I've missed?
>
> > Il giorno 13 marzo 2012 22:42, William Seligman <
> selig...@nevis.columbia.edu
> >> ha scritto:
> >
> >> On 3/13/12 12:29 PM, William Seligman wrote:
> >>> I'm not sure if this is a "Linux-HA" question; please direct me to the
> >>> appropriate list if it's not.
> >>>
> >>> I'm setting up a two-node cman+pacemaker+gfs2 cluster as described in
> >>> "Clusters From Scratch." Fencing is through forcibly rebooting a node
> by
> >>> cutting and restoring its power via UPS.
> >>>
> >>> My fencing/failover tests have revealed a problem. If I gracefully turn
> >>> off one node ("crm node standby"; "service pacemaker stop"; "shutdown
> -r
> >>> now") all the resources transfer to the other node with no problems.
> If I
> >>> cut power to one node (as would happen if it were fenced), the
> lsb::clvmd
> >>> resource on the remaining node eventually fails. Since all the other
> >>> resources depend on clvmd, all the resources on the remaining node stop
> >>> and the cluster is left with nothing running.
> >>>
> >>> I've traced why the lsb::clvmd fails: The monitor/status command
> >>> includes "vgdisplay", which hangs indefinitely. Therefore the monitor
> >>> will always time-out.
> >>>
> >>> So this isn't a problem with pacemaker, but with clvmd/dlm: If a node
> is
> >>> cut off, the cluster isn't handling it properly. Has anyone on this
> list
> >>> seen this before? Any ideas?>>>
> >>> Details:
> >>>
> >>> versions:
> >>> Redhat Linux 6.2 (kernel 2.6.32)
> >>> cman-3.0.12.1
> >>> corosync-1.4.1
> >>> pacemaker-1.1.6
> >>> lvm2-2.02.87
> >>> lvm2-cluster-2.02.87
> >>
> >> This may be a Linux-HA question after all!
> >>
> >> I ran a few more tests. Here's the output from a typical test of
> >>
> >> grep -E "(dlm|gfs2}clvmd|fenc|syslogd)" /var/log/messages
> >>
> >> <http://pastebin.com/uqC6bc1b>
> >>
> >> It looks like what's happening is that the fence agent (one I wrote) is
> >> not returning the proper error code when a node crashes. According to
> this
> >> page, if a fencing agent fails GFS2 will freeze to protect the data:
> >>
> >> <
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html
> >
> >>
> >> As a test, I tried to fence my test node via standard means:
> >>
> >> stonith_admin -F orestes-corosync.nevis.columbia.edu
> >>
> >> These were the log messages, which show that stonith_admin did its job
> and
> >> CMAN was notified of the fencing: <http://pastebin.com/jaH820Bv>.
> >>
> >> Unfortunately, I still got the gfs2 freeze, so this is not the complete
> >> story.
> >>
> >> First things first. I vaguely recall a web page that went over the
> STONITH
> >> return codes, but I can't locate it again. Is there any reference to the
> >> return codes expected from a fencing agent, perhaps as function of the
> >> state of the fencing device?
>
> --
> Bill Seligman             | Phone: (914) 591-2823
> Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
> PO Box 137                |
> Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes

Reply via email to