Re: [ClusterLabs] Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout
On 10/13/2016 03:36 AM, Ulrich Windl wrote: > That's what I'm talking about: If 1 of 3 nodes is rebooting (or the cluster > is split-brain 1:2), the single node CANNOT continue due to lack of quorum, > while the remaining two nodes can. Is it still necessary to wait for > completion of stonith? If the 2 nodes have working communication with the 1 node, then the 1 node will leave the cluster in an orderly way, and fencing will not be involved. In that case, yes, quorum is used to prevent the 1 node from starting services until it rejoins the cluster. However, if the 2 nodes lose communication with the 1 node, they cannot be sure it is functioning well enough to respect quorum. In this case, they have to fence it. DLM has to wait for the fencing to succeed to be sure the 1 node is not messing with shared resources. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout
Hi, On 10/13/2016 04:36 PM, Ulrich Windl wrote: Eric Ren schrieb am 13.10.2016 um 09:48 in Nachricht <73f764d0-75e7-122f-ff4e-d0b27dbdd...@suse.com>: [...] When assuming node h01 still lived when communication failed, wouldn't quorum prevent h01 from doing anything with DLM and OCFS2 anyway? Not sure I understand you correctly. By default, loosing quorum will make DLM stop service. That's what I'm talking about: If 1 of 3 nodes is rebooting (or the cluster is split-brain 1:2), the single node CANNOT continue due to lack of quorum, while the remaining two nodes can. Is it still necessary to wait for completion of stonith? quorum and fencing completion are different conditions to be checked before starting providing service again. FYI, https://github.com/renzhengeek/libdlm/blob/master/dlm_controld/cpg.c#L603 See `man dlm_controld`: ``` --enable_quorum_lockspace 0|1 enable/disable quorum requirement for lockspace operations ``` Does not exist in SLES11 SP4... Well, I think it's better to keeps the default behavior. Otherwise, it's dangerous when brain-split happens. Eric Ulrich ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout
>>> Eric Ren schrieb am 13.10.2016 um 09:48 in Nachricht <73f764d0-75e7-122f-ff4e-d0b27dbdd...@suse.com>: [...] >> When assuming node h01 still lived when communication failed, wouldn't > quorum prevent h01 from doing anything with DLM and OCFS2 anyway? > Not sure I understand you correctly. By default, loosing quorum will make > DLM stop service. That's what I'm talking about: If 1 of 3 nodes is rebooting (or the cluster is split-brain 1:2), the single node CANNOT continue due to lack of quorum, while the remaining two nodes can. Is it still necessary to wait for completion of stonith? > See `man dlm_controld`: > ``` > --enable_quorum_lockspace 0|1 > enable/disable quorum requirement for lockspace operations > ``` Does not exist in SLES11 SP4... Ulrich ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org