Re: [Openais] cpg behavior on transitional membership change
Hi Jiaju, 03.09.2011 19:52, Jiaju Zhang wrote: > On Fri, Sep 02, 2011 at 10:12:11PM +0300, Vladislav Bogdanov wrote: >> 02.09.2011 20:55, David Teigland wrote: >> [snip] >>> >>> I really can't make any sense of the report, sorry. Maybe reproduce it >>> without pacemaker, and then describe the specific steps to create the >>> issue and resulting symptoms. After that we can determine what logs, if >>> any, would be useful. >>> >> >> I just tried to ask a question about cluster components logic based on >> information I discovered from both logs and code analysis. I'm sorry if >> I was unclear in that, probably some language barrier still exists. >> >> Please see my previous mail, I tried to add some explanations why I >> think current logic is not complete. > > Hi Vladislav, I guess I have known the problem what you described;) > I'd like to give a example to make the things more clear. > > 3-node cluster, for whatever reason, especially on heavy workload, (my case) > corosync may detect one node disappear and reappear again. So the BTW, could this be prevented (at least for majority of cases) by some corosync timeout params? Steve? > membership information changes are as follows: > membership 1: nodeA, nodeB, nodeC > membership 2: nodeB, nodeC > membership 3: nodeA, nodeB, nodeC Exactly. > > From the membership change 1 -> 2, dlm_controld konws nodeA is down, > and have many things to do, like check_fs_done, check_fencing_done ... > The key point here is dlm need to wait the fencing is really done > before it proceed. If we employ a cluster filesystem here, like ocfs2, > it also needs the fencing is really done. I believe in the normal > cases, pacemaker will fence nodeA and then everything should be OK. > > However, there is a possibility here that pacemaker won't fence nodeA. > Say nodeA is the original DC of the cluster, when nodeA is down, the > cluster should elect a new DC. But if the time window where membership > change 2 -> 3 is too small, node A is up again and attend the election > too, then node A is elected to be the DC again and it won't fence > itself. Ahm, I think you are perfectly right in what exactly happens. This addresses my case I think, just because that node which left cluster for a short moment is usually 1. Started earlier on a cold boot (it is a VM, others are bare-metal which boot diskless via PXE, and there is 1 minute timeout before that bare-metal systems get their boot image, just because of cisco implementation of ether-channel) 2. Upgraded (and rebooted if needed) first, before bare-metal nodes are booted with new image. So that node is usually a DC. > Andrew, correct me if my understanding on pacemaker is wrong;) > > So I think the membership change should be like a transaction in > database or filesystem field, that is, for the membership change > 1 -> 2, every thing should be done (e.g. fencing nodeA), no matter the > following change 2 -> 3 will happen or not. For the situation where a > node magically disappear and reappear, and the situation where a node > normally down and then up, ocfs2 and dlm should not be able to see any > difference between them, what they can do is just waiting the fencing > to be done. > > Any comments? thoughts? I'd have protection at as many layers as possible. So, if you are right about pacemaker, then it is great if it could be fixed in pacemaker. But, I'd prefer to be safe and have DLM et all schedule fencing as well if they notice unrecoverable problem which leads to cluster subsystem freeze if fencing is not done for whatever reason. I suppose that current behavior is just a artifact from cluster2, where groupd was on duty for that event (I may be wrong, because I didn't look at groupd code closely, but comments in other daemons code make me think so). Thank you for your comments, Vladislav ___ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais
[Openais] Fwd: cpg behavior on transitional membership change
Seems this mail being truncated while sending, so post it again;) Also this time I CCed pacemaker mailing list as well. -- Forwarded message -- From: Jiaju Zhang Date: Sun, Sep 4, 2011 at 12:52 AM Subject: Re: [Openais] cpg behavior on transitional membership change To: Vladislav Bogdanov Cc: David Teigland , "Openais@lists.linux-foundation.org" On Fri, Sep 02, 2011 at 10:12:11PM +0300, Vladislav Bogdanov wrote: > 02.09.2011 20:55, David Teigland wrote: > [snip] > > > > I really can't make any sense of the report, sorry. Maybe reproduce it > > without pacemaker, and then describe the specific steps to create the > > issue and resulting symptoms. After that we can determine what logs, if > > any, would be useful. > > > > I just tried to ask a question about cluster components logic based on > information I discovered from both logs and code analysis. I'm sorry if > I was unclear in that, probably some language barrier still exists. > > Please see my previous mail, I tried to add some explanations why I > think current logic is not complete. Hi Vladislav, I guess I have known the problem what you described;) I'd like to give a example to make the things more clear. 3-node cluster, for whatever reason, especially on heavy workload, corosync may detect one node disappear and reappear again. So the membership information changes are as follows: membership 1: nodeA, nodeB, nodeC membership 2: nodeB, nodeC membership 3: nodeA, nodeB, nodeC From the membership change 1 -> 2, dlm_controld konws nodeA is down, and have many things to do, like check_fs_done, check_fencing_done ... The key point here is dlm need to wait the fencing is really done before it proceed. If we employ a cluster filesystem here, like ocfs2, it also needs the fencing is really done. I believe in the normal cases, pacemaker will fence nodeA and then everything should be OK. However, there is a possibility here that pacemaker won't fence nodeA. Say nodeA is the original DC of the cluster, when nodeA is down, the cluster should elect a new DC. But if the time window where membership change 2 -> 3 is too small, node A is up again and attend the election too, then node A is elected to be the DC again and it won't fence itself. Andrew, correct me if my understanding on pacemaker is wrong;) So I think the membership change should be like a transaction in database or filesystem field, that is, for the membership change 1 -> 2, every thing should be done (e.g. fencing nodeA), no matter the following change 2 -> 3 will happen or not. For the situation where a node magically disappear and reappear, and the situation where a node normally down and then up, ocfs2 and dlm should not be able to see any difference between them, what they can do is just waiting the fencing to be done. Any comments? thoughts? Thanks, Jiaju ___ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais
Re: [Openais] cpg behavior on transitional membership change
On Fri, Sep 02, 2011 at 10:12:11PM +0300, Vladislav Bogdanov wrote: > 02.09.2011 20:55, David Teigland wrote: > [snip] > > > > I really can't make any sense of the report, sorry. Maybe reproduce it > > without pacemaker, and then describe the specific steps to create the > > issue and resulting symptoms. After that we can determine what logs, if > > any, would be useful. > > > > I just tried to ask a question about cluster components logic based on > information I discovered from both logs and code analysis. I'm sorry if > I was unclear in that, probably some language barrier still exists. > > Please see my previous mail, I tried to add some explanations why I > think current logic is not complete. Hi Vladislav, I guess I have known the problem what you described;) I'd like to give a example to make the things more clear. 3-node cluster, for whatever reason, especially on heavy workload, corosync may detect one node disappear and reappear again. So the membership information changes are as follows: membership 1: nodeA, nodeB, nodeC membership 2: nodeB, nodeC membership 3: nodeA, nodeB, nodeC >From the membership change 1 -> 2, dlm_controld konws nodeA is down, and have many things to do, like check_fs_done, check_fencing_done ... The key point here is dlm need to wait the fencing is really done before it proceed. If we employ a cluster filesystem here, like ocfs2, it also needs the fencing is really done. I believe in the normal cases, pacemaker will fence nodeA and then everything should be OK. However, there is a possibility here that pacemaker won't fence nodeA. Say nodeA is the original DC of the cluster, when nodeA is down, the cluster should elect a new DC. But if the time window where membership change 2 -> 3 is too small, node A is up again and attend the election too, then node A is elected to be the DC again and it won't fence itself. Andrew, correct me if my understanding on pacemaker is wrong;) So I think the membership change should be like a transaction in database or filesystem field, that is, for the membership change 1 -> 2, every thing should be done (e.g. fencing nodeA), no matter the following change 2 -> 3 will happen or not. For the situation where a node magically disappear and reappear, and the situation where a node normally down and then up, ocfs2 and dlm should not be able to see any difference between them, what they can do is just waiting the fencing to be done. Any comments? thoughts? Thanks, Jiaju ___ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais