Hi Asitha/Asanka, I think it is clear that the issue we have here is mostly related to Hazelcast.
Now to solve that problem I think it would be better to go ahead with a generic leader election system for the entire platform rather than writing one specific to MB. This requirement is there in several other products and for some a database driven approach might not work. Therefore it would be better if we can decouple this from the product and use an interface to talk to a leader election module. This module can either be implemented as a separate component or utilize an existing system such as etcd. To start with I think it would be better to evaluate what etcd and consul has to offer and check whether they fit to our requirement. Thanks On Fri, Aug 5, 2016 at 10:12 AM, Asanka Abeyweera <asank...@wso2.com> wrote: > Hi Imesh, > > On Fri, Aug 5, 2016 at 7:33 AM, Imesh Gunaratne <im...@wso2.com> wrote: > >> >> >> On Fri, Aug 5, 2016 at 7:31 AM, Imesh Gunaratne <im...@wso2.com> wrote: >>> >>> >>> You can see here [3] how K8S has implemented leader election feature for >>> the products deployed on top of that to utilize. >>> >> >> Correction: Please refer [4]. >> >> >>> >>> >>>> On Thu, Aug 4, 2016 at 7:27 PM, Asanka Abeyweera <asank...@wso2.com> >>>> wrote: >>>> >>>>> Hi Imesh, >>>>> >>>>> We are not implementing this to overcome a limitation in the >>>>> coordination algorithm available in the Hazlecast. We are implementing >>>>> this >>>>> since we need an RDBMS based coordination algorithm (not a network based >>>>> algorithm). >>>>> >>>> >>> Are you saying that database connections do not use the same network >>> used by Hazelcast? >>> >> > Yes, This is most problematic when two interfaces are used for Hazelcast > communication and RDBMS communication. Additionally there is an edge case > even when a single interface is used for both Hazelcast and RDBMS > communication. When a cluster merge after a network segmentation, there can > be a delay in Hazelcast detecting the cluster merge. If a database is > accessed by multiple coordinators during this time, there can be message > delivery issues like message duplication. Therefore we cannot ignore this > issue even when the same network is used for Hazelcast and database > connections. > > > >> The reason is, a network based election algorithm will always elect >>>>> multiple leaders when the network is partitioned. But if we use a RDBMS >>>>> based algorithm this will not happen. >>>>> >>>> >>> I do not think your argument is correct. If there is a problem with the >>> network, it may apply to both Hazelcast based solution and database based >>> solution. >>> >>> [4] http://blog.kubernetes.io/2016/01/simple-leader-election >>> -with-Kubernetes.html >>> >>> Thanks >>> >>>> >>>>> >>>>> On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne <im...@wso2.com> >>>>> wrote: >>>>> >>>>>> Hi Asanka, >>>>>> >>>>>> Do we really need to implement a leader election algorithm on our >>>>>> own? AFAIU this is a complex problem which has been already solved by >>>>>> several algorithms [1]. IMO it would be better to go ahead with an >>>>>> existing >>>>>> well established implementation on etcd [1] or Consul [2]. >>>>>> >>>>>> Those provide HTTP APIs for clients to make leader election calls. >>>>>> [3] is a client library written in Node.js for etcd based leader >>>>>> election. >>>>>> >>>>>> [1] https://www.projectcalico.org/using-etcd-for-elections >>>>>> [2] https://www.consul.io/docs/guides/leader-election.html >>>>>> [3] https://www.npmjs.com/package/etcd-leader >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <asank...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Maninda, >>>>>>> >>>>>>> Since we are using RDBMS to poll the node status, the cluster will >>>>>>> not end up in situation 1,2 or 3. With this approach we consider a node >>>>>>> unreachable when it cannot access the database. Therefore an unreachable >>>>>>> node can never be the leader. >>>>>>> >>>>>>> As you have mentioned, we are currently using the RDBMS as an atomic >>>>>>> global variable to create the coordinator entry. >>>>>>> >>>>>>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya < >>>>>>> mani...@wso2.com> wrote: >>>>>>> >>>>>>>> Hi Asanka, >>>>>>>> >>>>>>>> As I understand the accuracy of electing the leader correctly is >>>>>>>> dependent on the election mechanism with RDBMS because there can be >>>>>>>> edge >>>>>>>> cases like, >>>>>>>> >>>>>>>> 1. Unreachable leader activates during the election process: Then >>>>>>>> who becomes the leader? >>>>>>>> 2. The elected leader becomes unreachable before the election is >>>>>>>> completed: Then will there be a situation where there is no leader? >>>>>>>> 3. A leader and a set of nodes are disconnected from the other part >>>>>>>> of the cluster and while the leader is trying to remove unreachable >>>>>>>> members >>>>>>>> other part is calling an election to make a leader: Who will win? >>>>>>>> >>>>>>>> RDBMS based election algorithm should handle such cases without >>>>>>>> bringing the cluster to an inconsistent state or dead lock in all >>>>>>>> concurrent cases. If all these kind of cases cannot be handled isn't it >>>>>>>> better to keep the current hazelcast clustering and use the RDBMS only >>>>>>>> to >>>>>>>> handle the split brain scenario? In other words when a new hazelcast >>>>>>>> leader >>>>>>>> is elected it should be updated in the RDBMS. If another split party >>>>>>>> has >>>>>>>> already elected a leader, the node who is going to write it to RDBMS >>>>>>>> should >>>>>>>> avoid updating it. Simply, the RDBMS can be used as an atomic global >>>>>>>> variable to keep the leader name by modifying the hazelcast clustering. >>>>>>>> WDYT? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> >>>>>>>> *Maninda Edirisooriya* >>>>>>>> Senior Software Engineer >>>>>>>> >>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>> >>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>> *E-mail* : mani...@wso2.com >>>>>>>> *Skype* : @manindae >>>>>>>> *Twitter* : @maninda >>>>>>>> >>>>>>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera < >>>>>>>> asank...@wso2.com> wrote: >>>>>>>> >>>>>>>>> Hi Akila, >>>>>>>>> >>>>>>>>> Let me explain the issue in a different way. Let's assume the MB >>>>>>>>> nodes are using two different network interfaces for Hazelcast >>>>>>>>> communication and database communication. With such a configuration, >>>>>>>>> there >>>>>>>>> can be failures only in the network interface used for Hazelcast >>>>>>>>> communication in some nodes. When this happens, there will be two or >>>>>>>>> more >>>>>>>>> Hazelcast clusters due to the network segmentation, and as a result >>>>>>>>> there >>>>>>>>> will be multiple coordinators. Since every node still have access to >>>>>>>>> the >>>>>>>>> database, multiple coordinators can affect the correctness of the data >>>>>>>>> stored in the DB. But if we used a RDBMS based approach we won't have >>>>>>>>> multiple coordinators due to a network partition in Hazelcast. This >>>>>>>>> is one >>>>>>>>> advantage we get from this approach. >>>>>>>>> >>>>>>>>> Even when we use Zookeeper or RAFT the same issue will be there >>>>>>>>> since we are using different interfaces for Hazelcast communication >>>>>>>>> and DB >>>>>>>>> communication. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera < >>>>>>>>> raviha...@wso2.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> What's the advantage of using RDBMS (even as an alternative) to >>>>>>>>>> implement a leader/coordinator election? If the network connection >>>>>>>>>> to DB >>>>>>>>>> fails then this will be a single point of failure. I don't think we >>>>>>>>>> can >>>>>>>>>> scale RDBMS instances and expect the election algorithm to work. >>>>>>>>>> That would >>>>>>>>>> be reducing this problem to another problem (electing coordinator >>>>>>>>>> RDBMS >>>>>>>>>> instance). >>>>>>>>>> >>>>>>>>>> IMHO it would be better to look at Zookeeper Atomic Broadcast >>>>>>>>>> (ZAB) [1] or RAFT leader election [2] algorithms which have already >>>>>>>>>> proven >>>>>>>>>> results. >>>>>>>>>> >>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0 >>>>>>>>>> [2] http://libraft.io/ >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana < >>>>>>>>>> nand...@wso2.com> wrote: >>>>>>>>>> >>>>>>>>>>> +1 to make it a common component . We have the clustering >>>>>>>>>>> implementation for BPEL component based on hazelcast. If the >>>>>>>>>>> coordination >>>>>>>>>>> is available at RDBMS level, we can remove hazelcast dependancy. >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> Nandika >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda < >>>>>>>>>>> hasi...@wso2.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Can we make it a common component, which is not hard coupled >>>>>>>>>>>> with MB. BPS has the same requirement. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Hasitha. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera < >>>>>>>>>>>> asank...@wso2.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi All, >>>>>>>>>>>>> >>>>>>>>>>>>> In MB, we have used a coordinator based approach to manage >>>>>>>>>>>>> distributed messaging algorithm in the cluster. Currently >>>>>>>>>>>>> Hazelcast is used >>>>>>>>>>>>> to elect the coordinator. But one issue we faced with Hazelcast >>>>>>>>>>>>> is, during >>>>>>>>>>>>> a network segmentation (split brain), Hazelcast can elect two or >>>>>>>>>>>>> more >>>>>>>>>>>>> coordinators in the cluster. This affects the correctness of the >>>>>>>>>>>>> distributed messaging algorithm since there are some tables in >>>>>>>>>>>>> the database >>>>>>>>>>>>> that should only be edited by a single node (i.e. coordinator). >>>>>>>>>>>>> >>>>>>>>>>>>> As a solution to this problem we have implemented minimum node >>>>>>>>>>>>> count based approach [1] to deactivate set of partitioned nodes >>>>>>>>>>>>> to stop >>>>>>>>>>>>> multiple nodes becoming coordinators until the network >>>>>>>>>>>>> segmentation issue >>>>>>>>>>>>> is fixed. >>>>>>>>>>>>> >>>>>>>>>>>>> As an alternative solution, we are thinking of implementing an >>>>>>>>>>>>> RDBMS based approach to elect the coordinator node in the >>>>>>>>>>>>> cluster. By doing >>>>>>>>>>>>> this we can make sure that even during a network segmentation >>>>>>>>>>>>> only one node >>>>>>>>>>>>> will be elected as the coordinator node since the election is >>>>>>>>>>>>> happening >>>>>>>>>>>>> through the database. >>>>>>>>>>>>> >>>>>>>>>>>>> The algorithm will use a polling mechanism to check the >>>>>>>>>>>>> validity of the nodes. To make the election algorithm scalable, >>>>>>>>>>>>> only the >>>>>>>>>>>>> coordinator node will be checking status of all the nodes in the >>>>>>>>>>>>> cluster >>>>>>>>>>>>> and it will inform other nodes through database when a member is >>>>>>>>>>>>> added/left. The nodes will be only checking for the status of the >>>>>>>>>>>>> coordinator node. When a node detect that coordinator is invalid >>>>>>>>>>>>> it will go >>>>>>>>>>>>> for a election to elect a new coordinator. >>>>>>>>>>>>> >>>>>>>>>>>>> We are currently working on a POC to test how this works with >>>>>>>>>>>>> MB's slot based messaging algorithm. >>>>>>>>>>>>> >>>>>>>>>>>>> thoughts? >>>>>>>>>>>>> >>>>>>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664 >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Asanka Abeyweera >>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>> >>>>>>>>>>>>> Phone: +94 712228648 >>>>>>>>>>>>> Blog: a5anka.github.io >>>>>>>>>>>>> >>>>>>>>>>>>> <https://wso2.com/signature> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> Hasitha Aravinda, >>>>>>>>>>>> Associate Technical Lead, >>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>> Email: hasi...@wso2.com >>>>>>>>>>>> Mobile : +94 718 210 200 >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nandika Jayawardana >>>>>>>>>>> WSO2 Inc ; http://wso2.com >>>>>>>>>>> lean.enterprise.middleware >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Architecture mailing list >>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Akila Ravihansa Perera >>>>>>>>>> WSO2 Inc.; http://wso2.com/ >>>>>>>>>> >>>>>>>>>> Blog: http://ravihansa3000.blogspot.com >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Architecture mailing list >>>>>>>>>> Architecture@wso2.org >>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Asanka Abeyweera >>>>>>>>> Senior Software Engineer >>>>>>>>> WSO2 Inc. >>>>>>>>> >>>>>>>>> Phone: +94 712228648 >>>>>>>>> Blog: a5anka.github.io >>>>>>>>> >>>>>>>>> <https://wso2.com/signature> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Architecture mailing list >>>>>>>>> Architecture@wso2.org >>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Asanka Abeyweera >>>>>>> Senior Software Engineer >>>>>>> WSO2 Inc. >>>>>>> >>>>>>> Phone: +94 712228648 >>>>>>> Blog: a5anka.github.io >>>>>>> >>>>>>> <https://wso2.com/signature> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Architecture mailing list >>>>>>> Architecture@wso2.org >>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Imesh Gunaratne* >>>>>> Software Architect >>>>>> WSO2 Inc: http://wso2.com >>>>>> T: +94 11 214 5345 M: +94 77 374 2057 >>>>>> W: https://medium.com/@imesh TW: @imesh >>>>>> lean. enterprise. middleware >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> Architecture@wso2.org >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Asanka Abeyweera >>>>> Senior Software Engineer >>>>> WSO2 Inc. >>>>> >>>>> Phone: +94 712228648 >>>>> Blog: a5anka.github.io >>>>> >>>>> <https://wso2.com/signature> >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> Architecture@wso2.org >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> Ramith Jayasinghe >>>> Technical Lead >>>> WSO2 Inc., http://wso2.com >>>> lean.enterprise.middleware >>>> >>>> E: ram...@wso2.com >>>> P: +94 772534930 >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> Architecture@wso2.org >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> *Imesh Gunaratne* >>> Software Architect >>> WSO2 Inc: http://wso2.com >>> T: +94 11 214 5345 M: +94 77 374 2057 >>> W: https://medium.com/@imesh TW: @imesh >>> lean. enterprise. middleware >>> >>> >> >> >> -- >> *Imesh Gunaratne* >> Software Architect >> WSO2 Inc: http://wso2.com >> T: +94 11 214 5345 M: +94 77 374 2057 >> W: https://medium.com/@imesh TW: @imesh >> lean. enterprise. middleware >> >> >> _______________________________________________ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Asanka Abeyweera > Senior Software Engineer > WSO2 Inc. > > Phone: +94 712228648 > Blog: a5anka.github.io > > <https://wso2.com/signature> > > _______________________________________________ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Imesh Gunaratne* Software Architect WSO2 Inc: http://wso2.com T: +94 11 214 5345 M: +94 77 374 2057 W: https://medium.com/@imesh TW: @imesh lean. enterprise. middleware
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture