gfs2 hangs if a node crashes

emmanuel segura Tue, 13 Mar 2012 14:51:11 -0700

Hello Willian

So if you using cman why you use lsb::clvmd


I think you are very confused



Il giorno 13 marzo 2012 22:42, William Seligman <selig...@nevis.columbia.edu
> ha scritto:

> On 3/13/12 12:29 PM, William Seligman wrote:
> > I'm not sure if this is a "Linux-HA" question; please direct me to the
> > appropriate list if it's not.
> >
> > I'm setting up a two-node cman+pacemaker+gfs2 cluster as described in
> "Clusters
> > From Scratch." Fencing is through forcibly rebooting a node by cutting
> and
> > restoring its power via UPS.
> >
> > My fencing/failover tests have revealed a problem. If I gracefully turn
> off one
> > node ("crm node standby"; "service pacemaker stop"; "shutdown -r now")
> all the
> > resources transfer to the other node with no problems. If I cut power to
> one
> > node (as would happen if it were fenced), the lsb::clvmd resource on the
> > remaining node eventually fails. Since all the other resources depend on
> clvmd,
> > all the resources on the remaining node stop and the cluster is left with
> > nothing running.
> >
> > I've traced why the lsb::clvmd fails: The monitor/status command includes
> > "vgdisplay", which hangs indefinitely. Therefore the monitor will always
> time-out.
> >
> > So this isn't a problem with pacemaker, but with clvmd/dlm: If a node is
> cut
> > off, the cluster isn't handling it properly. Has anyone on this list
> seen this
> > before? Any ideas?
> >
> > Details:
> >
> > versions:
> > Redhat Linux 6.2 (kernel 2.6.32)
> > cman-3.0.12.1
> > corosync-1.4.1
> > pacemaker-1.1.6
> > lvm2-2.02.87
> > lvm2-cluster-2.02.87
>
> This may be a Linux-HA question after all!
>
> I ran a few more tests. Here's the output from a typical test of
>
> grep -E "(dlm|gfs2}clvmd|fenc|syslogd)" /var/log/messages
>
> <http://pastebin.com/uqC6bc1b>
>
> It looks like what's happening is that the fence agent (one I wrote) is not
> returning the proper error code when a node crashes. According to this
> page, if
> a fencing agent fails GFS2 will freeze to protect the data:
>
> <
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html
> >
>
> As a test, I tried to fence my test node via standard means:
>
> stonith_admin -F orestes-corosync.nevis.columbia.edu
>
> These were the log messages, which show that stonith_admin did its job and
> CMAN
> was notified of the fencing: <http://pastebin.com/jaH820Bv>.
>
> Unfortunately, I still got the gfs2 freeze, so this is not the complete
> story.
>
> First things first. I vaguely recall a web page that went over the STONITH
> return codes, but I can't locate it again. Is there any reference to the
> return
> codes expected from a fencing agent, perhaps as function of the state of
> the
> fencing device?
> --
> Bill Seligman             | Phone: (914) 591-2823
> Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
> PO Box 137                |
> Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes

Reply via email to