Hi Steve, Please comment on the below.
Regards, Ranjith On Fri, Oct 1, 2010 at 12:04 AM, Ranjith <ranjith.nath...@gmail.com> wrote: > Hi steve, > > Network is like this: > A (block all packets from src C) > B > C (block all packets from src A) > > > > Nodes > A,B,C > A sends join (multicast) > Only B receives. (C drops it because of ACL) > B sends join (multicast) (with A,B) > > A,C receive join > C sends join (with A,B,C) > Only B receives the above > > B sends join (with A,B,C) > A, C sends join (with A,B,C) > B gets consensus but suppose A is the smallest Id > > But A never gets consensus as A cannot get join from C > > Am I correct till this point? > > Regards, > Ranjith > > > > > On Thu, Sep 30, 2010 at 11:49 PM, Steven Dake <sd...@redhat.com> wrote: > >> On 09/30/2010 10:40 AM, Ranjith wrote: >> >>> Hi Steve, >>> >>> I believe you mean to say that the same acl rules should be applied in >>> the outgoing side also. >>> But since here the nodes are not receiving any packet (both multicast >>> and unicast) from the other, i believe it will also not send to the >>> other....Is that right? >>> >>> >>> >> That assumption is incorrect. Example: >> >> Nodes >> A,B,C >> A sends join (multicast) >> B,C receive join >> B sends join (multicast) >> A,C receive join >> C sends join (with A,B,C) >> now A rejects that message. >> >> As a result, the nodes can never come to consensus. >> >> Regards >> -steve >> >> Regards, >>> Ranjith >>> >>> On Thu, Sep 30, 2010 at 10:41 PM, Steven Dake <sd...@redhat.com >>> <mailto:sd...@redhat.com>> wrote: >>> >>> On 09/30/2010 03:47 AM, Ranjith wrote: >>> >>> Hi all, >>> >>> Kindly let know whether corosync considers the below network as >>> byzantine failure i.e the case where N1 and N3 does not have >>> connectivity? >>> I am testing such scenarios as i believe such a behaviour can >>> happen due >>> to some misbehaviour in switch (stale arp entries). >>> >>> >>> >>> What makes the fault byzantine is that only incoming packets are >>> blocked. If you block both incoming and outgoing packets on the >>> nodes, the fault is not byzantine and totem will behave properly. >>> >>> Regards >>> -steve >>> >>> Regards, >>> Ranjith >>> >>> >>> Untitled.png >>> On Sat, Sep 25, 2010 at 9:47 AM, Ranjith >>> <ranjith.nath...@gmail.com <mailto:ranjith.nath...@gmail.com> >>> <mailto:ranjith.nath...@gmail.com >>> <mailto:ranjith.nath...@gmail.com>>> wrote: >>> >>> Hi Steve, >>> Just to make it clear. Do you mean that in the above case If >>> N3 is >>> part of the network, it should have connectivity to both N2 >>> and N1 >>> and if it happens so >>> that N3 has connectivity to N2 only, corosync doesnot take >>> care of >>> the same. >>> Regards, >>> Ranjith >>> On Sat, Sep 25, 2010 at 9:39 AM, Steven Dake >>> <sd...@redhat.com <mailto:sd...@redhat.com> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com>>> wrote: >>> >>> On 09/24/2010 08:20 PM, Ranjith wrote: >>> >>> Hi , >>> It is hard to tell what is happening without logs >>> from all 3 >>> nodes. Does >>> this only happen at system start, or can you duplicate >>> 5 >>> minutes after >>> systems have started? >>> >>> >> The cluster is never stabilizing. It keeps on >>> switching between the >>> >>> membership and operational state. >>> Below is the test network which i am using: >>> >>> Untitled.png >>> >>> >> N1 and N3 does not reveive any packets from each >>> other. Here what i >>> >>> expected was that either (N1,N2) or (N2, N3) forms a >>> two >>> node cluster >>> and stabilizes. But the cluster is never stabilizing >>> even >>> though 2 node >>> clusters are forming, it is going back to membership >>> [I >>> checked the logs >>> and it looks like because of the steps i mentioned >>> in the >>> previous mail, >>> this seems to be happening] >>> >>> >>> >>> ...... Where did you say you were testing a byzantine >>> fault in >>> your original bug report? Please be more forthcoming in >>> the >>> future. Corosync does not protect against byzantine >>> faults. >>> Allowing one way connectivity in network connection = >>> this >>> fault scenario. You can try coro-netctl (the attached >>> script) >>> which will atomically block a network ip in the network >>> to test >>> split brain scenarios without actually pulling network >>> cables. >>> >>> Regards >>> -steve >>> >>> >>> Regards, >>> Ranjith >>> On Fri, Sep 24, 2010 at 11:36 PM, Steven Dake >>> <sd...@redhat.com <mailto:sd...@redhat.com> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com>> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com>>>> wrote: >>> >>> It is hard to tell what is happening without >>> logs from >>> all 3 nodes. >>> Does this only happen at system start, or can you >>> duplicate 5 >>> minutes after systems have started? >>> >>> If it is at system start, you may need to enable >>> "fast >>> STP" on your >>> switch. It looks to me like node 3 gets some >>> messages >>> through but >>> then is blocked. STP will do this in it's >>> default state >>> on most >>> switches. >>> >>> Another option if you can't enable STP is to use >>> broadcast mode (man >>> openais.conf for details). >>> >>> Also verify firewalls are properly configured on >>> all >>> nodes. You can >>> join us on the irc server freenode on >>> #linux-cluster for >>> real-time >>> assistance. >>> >>> Regards >>> -steve >>> >>> >>> On 09/22/2010 11:33 PM, Ranjith wrote: >>> >>> Hi Steve, >>> I am running corosync 1.2.8 >>> I didn't get what u meant by blackbox. I >>> suppose it is >>> logs/debugs. >>> I just checked logs/debugs and I am able to >>> understand the below: >>> >>> 1--------------2--------------3 >>> 1) Node1 and Node2 are already in a 2node >>> cluster >>> 2) Now Node3 sends join with ({1} , {} ) >>> (proc_list/fail_list) >>> 3) Node2 sends join ({1,2,3} , {}) and Node >>> 1/3 >>> updates to >>> ({1,2,3}, {}) >>> 4) Now Node 2 gets consensus after some >>> messages >>> [But 1 is the rep] >>> 5) Consensus timeout fires at node 1 for node >>> 3, >>> node1 sends join as >>> ({1,2}, {3}) >>> 6) Node2 updates because of the above message >>> to >>> ({1,2}, {3}) >>> and sends >>> out join. This join received by node 3 >>> causes it to >>> update >>> ({1,3}, {2}) >>> 7) Node1and Node2 enter operational (fail list >>> cleared by node2) but >>> node 3 join timeout fires and again >>> membership state. >>> 8) This will continue to happen until >>> consensus >>> fires at node3 >>> for node1 >>> and it moves to ({3}, {1,2}) >>> 9) Now Node1and Node2 from 2 node cluster and >>> 3 >>> forms a single >>> node cluster >>> 10) Now node 2 broadcast a Normal message >>> 11) This message is received by Node3 as a >>> foreign >>> message which >>> forces >>> it to go to gather state >>> 12) Again above steps .... >>> The cluster is never stabilizing. >>> I have attached the debugs for Node2: >>> (1 - 10.102.33.115, 2 - 10.102.33.150, 3 >>> -10.102.33.180) >>> Regards, >>> Ranjith >>> >>> On Wed, Sep 22, 2010 at 10:53 PM, Steven Dake >>> <sd...@redhat.com <mailto:sd...@redhat.com> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com>> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com>>> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com>> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com> >>> <mailto:sd...@redhat.com <mailto:sd...@redhat.com>>>>> wrote: >>> >>> On 09/21/2010 11:15 PM, Ranjith wrote: >>> >>> Hi all, >>> Kindly comment on the above behaviour >>> Regards, >>> Ranjith >>> >>> On Tue, Sep 21, 2010 at 9:52 PM, >>> Ranjith >>> <ranjith.nath...@gmail.com <mailto:ranjith.nath...@gmail.com> >>> <mailto:ranjith.nath...@gmail.com >>> <mailto:ranjith.nath...@gmail.com>> >>> <mailto:ranjith.nath...@gmail.com <mailto: >>> ranjith.nath...@gmail.com> >>> <mailto:ranjith.nath...@gmail.com >>> <mailto:ranjith.nath...@gmail.com>>> >>> <mailto:ranjith.nath...@gmail.com <mailto: >>> ranjith.nath...@gmail.com> >>> <mailto:ranjith.nath...@gmail.com >>> <mailto:ranjith.nath...@gmail.com>> >>> <mailto:ranjith.nath...@gmail.com <mailto: >>> ranjith.nath...@gmail.com> >>> <mailto:ranjith.nath...@gmail.com >>> <mailto:ranjith.nath...@gmail.com>>>> >>> <mailto:ranjith.nath...@gmail.com <mailto: >>> ranjith.nath...@gmail.com> >>> <mailto:ranjith.nath...@gmail.com >>> <mailto:ranjith.nath...@gmail.com>> >>> <mailto:ranjith.nath...@gmail.com <mailto: >>> ranjith.nath...@gmail.com> >>> <mailto:ranjith.nath...@gmail.com >>> <mailto:ranjith.nath...@gmail.com>>> >>> <mailto:ranjith.nath...@gmail.com <mailto: >>> ranjith.nath...@gmail.com> >>> <mailto:ranjith.nath...@gmail.com >>> <mailto:ranjith.nath...@gmail.com>> >>> <mailto:ranjith.nath...@gmail.com <mailto: >>> ranjith.nath...@gmail.com> >>> <mailto:ranjith.nath...@gmail.com >>> <mailto:ranjith.nath...@gmail.com>>>>>> wrote: >>> >>> Hi all, >>> I was testing the corosync cluster >>> engine by using the >>> testcpg exec >>> provided along with the release. >>> I am >>> getting the below >>> behaviour >>> while testing some specific >>> scenarios. >>> Kindly >>> comment on the >>> expected behaviour. >>> 1) 3 Node cluster >>> >>> 1---------2---------3 >>> a) suppose I bring the >>> nodes 1&2 >>> up, it will form a >>> ring (1,2) >>> b) now bring up 3 >>> c) 3 sends join which >>> restarts the >>> membership >>> process >>> d) (1,2) again forms the >>> ring , 3 >>> forms self >>> cluster >>> e) now 3 sends a join (due >>> to join >>> or other >>> timeout) >>> f) again membership protocol >>> is >>> started as 2 >>> responds >>> to this >>> by going to gather state ( i >>> believe 2 >>> should not accept >>> this as 2 >>> would have earlier decided that >>> 3 is failed) >>> I am seeing a continuous >>> loop of >>> the above >>> behaviour ( >>> operational -> membership -> >>> operational >>> -> ) due to >>> which the >>> cluster is not becoming stabilized >>> 2) 3 Node Cluster >>> >>> 1---------2-----------3 >>> a) bring up all the three >>> nodes at >>> the same >>> time (None >>> of the >>> nodes have seen each other >>> before this) >>> b) Now each node forms a >>> cluster >>> by itself .. >>> (Here i >>> think it >>> should from either a (1,2) or >>> (2,3) ring ) >>> Regards, >>> Ranjith >>> >>> >>> >>> >>> Ranjith, >>> >>> Which version of corosync are you running? >>> >>> can you run corosync-blackbox and attach >>> the output? >>> >>> Thanks >>> -steve >>> >>> >>> >>> _______________________________________________ >>> Openais mailing list >>> Openais@lists.linux-foundation.org >>> <mailto:Openais@lists.linux-foundation.org> >>> <mailto:Openais@lists.linux-foundation.org >>> <mailto:Openais@lists.linux-foundation.org>> >>> <mailto:Openais@lists.linux-foundation.org >>> <mailto:Openais@lists.linux-foundation.org> >>> <mailto:Openais@lists.linux-foundation.org >>> <mailto:Openais@lists.linux-foundation.org>>> >>> <mailto:Openais@lists.linux-foundation.org >>> <mailto:Openais@lists.linux-foundation.org> >>> <mailto:Openais@lists.linux-foundation.org >>> <mailto:Openais@lists.linux-foundation.org>> >>> <mailto:Openais@lists.linux-foundation.org >>> <mailto:Openais@lists.linux-foundation.org> >>> <mailto:Openais@lists.linux-foundation.org >>> <mailto:Openais@lists.linux-foundation.org>>>> >>> >>> https://lists.linux-foundation.org/mailman/listinfo/openais >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >
_______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais