gfs2 hangs if a node crashes

emmanuel segura Wed, 14 Mar 2012 06:20:35 -0700

Hello William

i did new you are using drbd and i dont't know what type of configuration
you using


But it's better you try to start clvm with clvmd -d

like thak we can see what it's the problem

Il giorno 14 marzo 2012 14:02, William Seligman <selig...@nevis.columbia.edu
> ha scritto:

> On 3/14/12 6:02 AM, emmanuel segura wrote:
>
>  I think it's better you make clvmd start at boot
>>
>> chkconfig cman on ; chkconfig clvmd on
>>
>
> I've already tried it. It doesn't work. The problem is that my LVM
> information is on the drbd. If I start up clvmd before drbd, it won't find
> the logical volumes.
>
> I also don't see why that would make a difference (although this could be
> part of the confusion): a service is a service. I've tried starting up
> clvmd inside and outside pacemaker control, with the same problem. Why
> would starting clvmd at boot make a difference?
>
>  Il giorno 13 marzo 2012 23:29, William Seligman<seligman@nevis.**
>> columbia.edu <selig...@nevis.columbia.edu>
>>
>>> ha scritto:
>>>
>>
>>  On 3/13/12 5:50 PM, emmanuel segura wrote:
>>>
>>>  So if you using cman why you use lsb::clvmd
>>>>
>>>> I think you are very confused
>>>>
>>>
>>> I don't dispute that I may be very confused!
>>>
>>> However, from what I can tell, I still need to run clvmd even if
>>> I'm running cman (I'm not using rgmanager). If I just run cman,
>>> gfs2 and any other form of mount fails. If I run cman, then clvmd,
>>> then gfs2, everything behaves normally.
>>>
>>> Going by these instructions:
>>>
>>> <https://alteeve.com/w/2-Node_**Red_Hat_KVM_Cluster_Tutorial<https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial>
>>> >
>>>
>>> the resources he puts under "cluster control" (rgmanager) I have to
>>> put under pacemaker control. Those include drbd, clvmd, and gfs2.
>>>
>>> The difference between what I've got, and what's in "Clusters From
>>> Scratch", is in CFS they assign one DRBD volume to a single
>>> filesystem. I create an LVM physical volume on my DRBD resource,
>>> as in the above tutorial, and so I have to start clvmd or the
>>> logical volumes in the DRBD partition won't be recognized.>> Is
>>> there some way to get logical volumes recognized automatically by
>>> cman without rgmanager that I've missed?
>>>
>>
>>  Il giorno 13 marzo 2012 22:42, William Seligman<
>>>>
>>> selig...@nevis.columbia.edu
>>>
>>>> ha scritto:
>>>>>
>>>>
>>>>  On 3/13/12 12:29 PM, William Seligman wrote:
>>>>>
>>>>>> I'm not sure if this is a "Linux-HA" question; please direct
>>>>>> me to the appropriate list if it's not.
>>>>>>
>>>>>> I'm setting up a two-node cman+pacemaker+gfs2 cluster as
>>>>>> described in "Clusters From Scratch." Fencing is through
>>>>>> forcibly rebooting a node by cutting and restoring its power
>>>>>> via UPS.
>>>>>>
>>>>>> My fencing/failover tests have revealed a problem. If I
>>>>>> gracefully turn off one node ("crm node standby"; "service
>>>>>> pacemaker stop"; "shutdown -r now") all the resources
>>>>>> transfer to the other node with no problems. If I cut power
>>>>>> to one node (as would happen if it were fenced), the
>>>>>> lsb::clvmd resource on the remaining node eventually fails.
>>>>>> Since all the other resources depend on clvmd, all the
>>>>>> resources on the remaining node stop and the cluster is left
>>>>>> with nothing running.
>>>>>>
>>>>>> I've traced why the lsb::clvmd fails: The monitor/status
>>>>>> command includes "vgdisplay", which hangs indefinitely.
>>>>>> Therefore the monitor will always time-out.
>>>>>>
>>>>>> So this isn't a problem with pacemaker, but with clvmd/dlm:
>>>>>> If a node is cut off, the cluster isn't handling it properly.
>>>>>> Has anyone on this list seen this before? Any ideas?
>>>>>>
>>>>> >> Details:
>>>>
>>>>>
>>>>>> versions:
>>>>>> Redhat Linux 6.2 (kernel 2.6.32)
>>>>>> cman-3.0.12.1
>>>>>> corosync-1.4.1
>>>>>> pacemaker-1.1.6
>>>>>> lvm2-2.02.87
>>>>>> lvm2-cluster-2.02.87
>>>>>>
>>>>>
>>>>> This may be a Linux-HA question after all!
>>>>>
>>>>> I ran a few more tests. Here's the output from a typical test of
>>>>>
>>>>> grep -E "(dlm|gfs2}clvmd|fenc|syslogd)**" /var/log/messages
>>>>>
>>>>> <http://pastebin.com/uqC6bc1b>
>>>>>
>>>>> It looks like what's happening is that the fence agent (one I
>>>>> wrote) is not returning the proper error code when a node
>>>>> crashes. According to this page, if a fencing agent fails GFS2
>>>>> will freeze to protect the data:
>>>>>
>>>>> <http://docs.redhat.com/docs/**en-US/Red_Hat_Enterprise_**
>>>>> Linux/6/html/Global_File_**System_2/s1-gfs2hand-allnodes.**html<http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html>>
>>>>>
>>>>> As a test, I tried to fence my test node via standard means:
>>>>>
>>>>> stonith_admin -F 
>>>>> orestes-corosync.nevis.**columbia.edu<http://orestes-corosync.nevis.columbia.edu>
>>>>>
>>>>> These were the log messages, which show that stonith_admin did
>>>>> its job and CMAN was notified of the
>>>>> fencing:<http://pastebin.com/**jaH820Bv <http://pastebin.com/jaH820Bv>
>>>>> >.
>>>>>
>>>>> Unfortunately, I still got the gfs2 freeze, so this is not the
>>>>> complete story.
>>>>>
>>>>> First things first. I vaguely recall a web page that went over
>>>>> the STONITH return codes, but I can't locate it again. Is there
>>>>> any reference to the return codes expected from a fencing
>>>>> agent, perhaps as function of the state of the fencing device?
>>>>>
>>>>
> --
> Bill Seligman             | 
> mailto://seligman@nevis.**columbia.edu<selig...@nevis.columbia.edu>
> Nevis Labs, Columbia Univ | 
> http://www.nevis.columbia.edu/**~seligman/<http://www.nevis.columbia.edu/%7Eseligman/>
> PO Box 137                |
> Irvington NY 10533  USA   | Phone: (914) 591-2823
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes

Reply via email to