Hi, I think the original problem here is that MB needs to absolutely guarantee the integrity of the data written to the database. And if I understood correctly, only the coordinator can write specific entries to the database which is a unique scenario for MB. Any network based leader election algorithm (even database based one) can cause split brain due to network/latency issues, but using a database based leader election algorithm is safe here because if it cannot connect to the database then no harm can be done to the data integrity. Am I correct to say that?
However, on a separate node, it would be better to look into a new design where MB doesn't depend on a coordinator to write data into the db. This is a major bottleneck to scale. Thanks. On Fri, Aug 5, 2016 at 11:21 AM, Imesh Gunaratne <im...@wso2.com> wrote: > Hi Asitha/Asanka, > > I think it is clear that the issue we have here is mostly related to > Hazelcast. > > Now to solve that problem I think it would be better to go ahead with a > generic leader election system for the entire platform rather than writing > one specific to MB. This requirement is there in several other products and > for some a database driven approach might not work. > > Therefore it would be better if we can decouple this from the product and > use an interface to talk to a leader election module. This module can > either be implemented as a separate component or utilize an existing system > such as etcd. > > To start with I think it would be better to evaluate what etcd and consul > has to offer and check whether they fit to our requirement. > > Thanks > > On Fri, Aug 5, 2016 at 10:12 AM, Asanka Abeyweera <asank...@wso2.com> > wrote: > >> Hi Imesh, >> >> On Fri, Aug 5, 2016 at 7:33 AM, Imesh Gunaratne <im...@wso2.com> wrote: >> >>> >>> >>> On Fri, Aug 5, 2016 at 7:31 AM, Imesh Gunaratne <im...@wso2.com> wrote: >>>> >>>> >>>> You can see here [3] how K8S has implemented leader election feature >>>> for the products deployed on top of that to utilize. >>>> >>> >>> Correction: Please refer [4]. >>> >>> >>>> >>>> >>>>> On Thu, Aug 4, 2016 at 7:27 PM, Asanka Abeyweera <asank...@wso2.com> >>>>> wrote: >>>>> >>>>>> Hi Imesh, >>>>>> >>>>>> We are not implementing this to overcome a limitation in the >>>>>> coordination algorithm available in the Hazlecast. We are implementing >>>>>> this >>>>>> since we need an RDBMS based coordination algorithm (not a network based >>>>>> algorithm). >>>>>> >>>>> >>>> Are you saying that database connections do not use the same network >>>> used by Hazelcast? >>>> >>> >> Yes, This is most problematic when two interfaces are used for Hazelcast >> communication and RDBMS communication. Additionally there is an edge case >> even when a single interface is used for both Hazelcast and RDBMS >> communication. When a cluster merge after a network segmentation, there can >> be a delay in Hazelcast detecting the cluster merge. If a database is >> accessed by multiple coordinators during this time, there can be message >> delivery issues like message duplication. Therefore we cannot ignore this >> issue even when the same network is used for Hazelcast and database >> connections. >> >> >> >>> The reason is, a network based election algorithm will always elect >>>>>> multiple leaders when the network is partitioned. But if we use a RDBMS >>>>>> based algorithm this will not happen. >>>>>> >>>>> >>>> I do not think your argument is correct. If there is a problem with >>>> the network, it may apply to both Hazelcast based solution and database >>>> based solution. >>>> >>>> [4] http://blog.kubernetes.io/2016/01/simple-leader-election >>>> -with-Kubernetes.html >>>> >>>> Thanks >>>> >>>>> >>>>>> >>>>>> On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne <im...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Asanka, >>>>>>> >>>>>>> Do we really need to implement a leader election algorithm on our >>>>>>> own? AFAIU this is a complex problem which has been already solved by >>>>>>> several algorithms [1]. IMO it would be better to go ahead with an >>>>>>> existing >>>>>>> well established implementation on etcd [1] or Consul [2]. >>>>>>> >>>>>>> Those provide HTTP APIs for clients to make leader election calls. >>>>>>> [3] is a client library written in Node.js for etcd based leader >>>>>>> election. >>>>>>> >>>>>>> [1] https://www.projectcalico.org/using-etcd-for-elections >>>>>>> [2] https://www.consul.io/docs/guides/leader-election.html >>>>>>> [3] https://www.npmjs.com/package/etcd-leader >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <asank...@wso2.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Maninda, >>>>>>>> >>>>>>>> Since we are using RDBMS to poll the node status, the cluster will >>>>>>>> not end up in situation 1,2 or 3. With this approach we consider a node >>>>>>>> unreachable when it cannot access the database. Therefore an >>>>>>>> unreachable >>>>>>>> node can never be the leader. >>>>>>>> >>>>>>>> As you have mentioned, we are currently using the RDBMS as an >>>>>>>> atomic global variable to create the coordinator entry. >>>>>>>> >>>>>>>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya < >>>>>>>> mani...@wso2.com> wrote: >>>>>>>> >>>>>>>>> Hi Asanka, >>>>>>>>> >>>>>>>>> As I understand the accuracy of electing the leader correctly is >>>>>>>>> dependent on the election mechanism with RDBMS because there can be >>>>>>>>> edge >>>>>>>>> cases like, >>>>>>>>> >>>>>>>>> 1. Unreachable leader activates during the election process: Then >>>>>>>>> who becomes the leader? >>>>>>>>> 2. The elected leader becomes unreachable before the election is >>>>>>>>> completed: Then will there be a situation where there is no leader? >>>>>>>>> 3. A leader and a set of nodes are disconnected from the other >>>>>>>>> part of the cluster and while the leader is trying to remove >>>>>>>>> unreachable >>>>>>>>> members other part is calling an election to make a leader: Who will >>>>>>>>> win? >>>>>>>>> >>>>>>>>> RDBMS based election algorithm should handle such cases without >>>>>>>>> bringing the cluster to an inconsistent state or dead lock in all >>>>>>>>> concurrent cases. If all these kind of cases cannot be handled isn't >>>>>>>>> it >>>>>>>>> better to keep the current hazelcast clustering and use the RDBMS >>>>>>>>> only to >>>>>>>>> handle the split brain scenario? In other words when a new hazelcast >>>>>>>>> leader >>>>>>>>> is elected it should be updated in the RDBMS. If another split party >>>>>>>>> has >>>>>>>>> already elected a leader, the node who is going to write it to RDBMS >>>>>>>>> should >>>>>>>>> avoid updating it. Simply, the RDBMS can be used as an atomic global >>>>>>>>> variable to keep the leader name by modifying the hazelcast >>>>>>>>> clustering. >>>>>>>>> WDYT? >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> *Maninda Edirisooriya* >>>>>>>>> Senior Software Engineer >>>>>>>>> >>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>>> >>>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>>> *E-mail* : mani...@wso2.com >>>>>>>>> *Skype* : @manindae >>>>>>>>> *Twitter* : @maninda >>>>>>>>> >>>>>>>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera < >>>>>>>>> asank...@wso2.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Akila, >>>>>>>>>> >>>>>>>>>> Let me explain the issue in a different way. Let's assume the MB >>>>>>>>>> nodes are using two different network interfaces for Hazelcast >>>>>>>>>> communication and database communication. With such a configuration, >>>>>>>>>> there >>>>>>>>>> can be failures only in the network interface used for Hazelcast >>>>>>>>>> communication in some nodes. When this happens, there will be two or >>>>>>>>>> more >>>>>>>>>> Hazelcast clusters due to the network segmentation, and as a result >>>>>>>>>> there >>>>>>>>>> will be multiple coordinators. Since every node still have access to >>>>>>>>>> the >>>>>>>>>> database, multiple coordinators can affect the correctness of the >>>>>>>>>> data >>>>>>>>>> stored in the DB. But if we used a RDBMS based approach we won't have >>>>>>>>>> multiple coordinators due to a network partition in Hazelcast. This >>>>>>>>>> is one >>>>>>>>>> advantage we get from this approach. >>>>>>>>>> >>>>>>>>>> Even when we use Zookeeper or RAFT the same issue will be there >>>>>>>>>> since we are using different interfaces for Hazelcast communication >>>>>>>>>> and DB >>>>>>>>>> communication. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera < >>>>>>>>>> raviha...@wso2.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> What's the advantage of using RDBMS (even as an alternative) to >>>>>>>>>>> implement a leader/coordinator election? If the network connection >>>>>>>>>>> to DB >>>>>>>>>>> fails then this will be a single point of failure. I don't think we >>>>>>>>>>> can >>>>>>>>>>> scale RDBMS instances and expect the election algorithm to work. >>>>>>>>>>> That would >>>>>>>>>>> be reducing this problem to another problem (electing coordinator >>>>>>>>>>> RDBMS >>>>>>>>>>> instance). >>>>>>>>>>> >>>>>>>>>>> IMHO it would be better to look at Zookeeper Atomic Broadcast >>>>>>>>>>> (ZAB) [1] or RAFT leader election [2] algorithms which have already >>>>>>>>>>> proven >>>>>>>>>>> results. >>>>>>>>>>> >>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0 >>>>>>>>>>> [2] http://libraft.io/ >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana < >>>>>>>>>>> nand...@wso2.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> +1 to make it a common component . We have the clustering >>>>>>>>>>>> implementation for BPEL component based on hazelcast. If the >>>>>>>>>>>> coordination >>>>>>>>>>>> is available at RDBMS level, we can remove hazelcast dependancy. >>>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> Nandika >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda < >>>>>>>>>>>> hasi...@wso2.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Can we make it a common component, which is not hard coupled >>>>>>>>>>>>> with MB. BPS has the same requirement. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Hasitha. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera < >>>>>>>>>>>>> asank...@wso2.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> In MB, we have used a coordinator based approach to manage >>>>>>>>>>>>>> distributed messaging algorithm in the cluster. Currently >>>>>>>>>>>>>> Hazelcast is used >>>>>>>>>>>>>> to elect the coordinator. But one issue we faced with Hazelcast >>>>>>>>>>>>>> is, during >>>>>>>>>>>>>> a network segmentation (split brain), Hazelcast can elect two or >>>>>>>>>>>>>> more >>>>>>>>>>>>>> coordinators in the cluster. This affects the correctness of the >>>>>>>>>>>>>> distributed messaging algorithm since there are some tables in >>>>>>>>>>>>>> the database >>>>>>>>>>>>>> that should only be edited by a single node (i.e. coordinator). >>>>>>>>>>>>>> >>>>>>>>>>>>>> As a solution to this problem we have implemented minimum >>>>>>>>>>>>>> node count based approach [1] to deactivate set of partitioned >>>>>>>>>>>>>> nodes to >>>>>>>>>>>>>> stop multiple nodes becoming coordinators until the network >>>>>>>>>>>>>> segmentation >>>>>>>>>>>>>> issue is fixed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As an alternative solution, we are thinking of implementing >>>>>>>>>>>>>> an RDBMS based approach to elect the coordinator node in the >>>>>>>>>>>>>> cluster. By >>>>>>>>>>>>>> doing this we can make sure that even during a network >>>>>>>>>>>>>> segmentation only >>>>>>>>>>>>>> one node will be elected as the coordinator node since the >>>>>>>>>>>>>> election is >>>>>>>>>>>>>> happening through the database. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The algorithm will use a polling mechanism to check the >>>>>>>>>>>>>> validity of the nodes. To make the election algorithm scalable, >>>>>>>>>>>>>> only the >>>>>>>>>>>>>> coordinator node will be checking status of all the nodes in the >>>>>>>>>>>>>> cluster >>>>>>>>>>>>>> and it will inform other nodes through database when a member is >>>>>>>>>>>>>> added/left. The nodes will be only checking for the status of the >>>>>>>>>>>>>> coordinator node. When a node detect that coordinator is invalid >>>>>>>>>>>>>> it will go >>>>>>>>>>>>>> for a election to elect a new coordinator. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We are currently working on a POC to test how this works with >>>>>>>>>>>>>> MB's slot based messaging algorithm. >>>>>>>>>>>>>> >>>>>>>>>>>>>> thoughts? >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664 >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Asanka Abeyweera >>>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Phone: +94 712228648 >>>>>>>>>>>>>> Blog: a5anka.github.io >>>>>>>>>>>>>> >>>>>>>>>>>>>> <https://wso2.com/signature> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> Hasitha Aravinda, >>>>>>>>>>>>> Associate Technical Lead, >>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>> Email: hasi...@wso2.com >>>>>>>>>>>>> Mobile : +94 718 210 200 >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Nandika Jayawardana >>>>>>>>>>>> WSO2 Inc ; http://wso2.com >>>>>>>>>>>> lean.enterprise.middleware >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Akila Ravihansa Perera >>>>>>>>>>> WSO2 Inc.; http://wso2.com/ >>>>>>>>>>> >>>>>>>>>>> Blog: http://ravihansa3000.blogspot.com >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Architecture mailing list >>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Asanka Abeyweera >>>>>>>>>> Senior Software Engineer >>>>>>>>>> WSO2 Inc. >>>>>>>>>> >>>>>>>>>> Phone: +94 712228648 >>>>>>>>>> Blog: a5anka.github.io >>>>>>>>>> >>>>>>>>>> <https://wso2.com/signature> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Architecture mailing list >>>>>>>>>> Architecture@wso2.org >>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Asanka Abeyweera >>>>>>>> Senior Software Engineer >>>>>>>> WSO2 Inc. >>>>>>>> >>>>>>>> Phone: +94 712228648 >>>>>>>> Blog: a5anka.github.io >>>>>>>> >>>>>>>> <https://wso2.com/signature> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Architecture mailing list >>>>>>>> Architecture@wso2.org >>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Imesh Gunaratne* >>>>>>> Software Architect >>>>>>> WSO2 Inc: http://wso2.com >>>>>>> T: +94 11 214 5345 M: +94 77 374 2057 >>>>>>> W: https://medium.com/@imesh TW: @imesh >>>>>>> lean. enterprise. middleware >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Architecture mailing list >>>>>>> Architecture@wso2.org >>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Asanka Abeyweera >>>>>> Senior Software Engineer >>>>>> WSO2 Inc. >>>>>> >>>>>> Phone: +94 712228648 >>>>>> Blog: a5anka.github.io >>>>>> >>>>>> <https://wso2.com/signature> >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> Architecture@wso2.org >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ramith Jayasinghe >>>>> Technical Lead >>>>> WSO2 Inc., http://wso2.com >>>>> lean.enterprise.middleware >>>>> >>>>> E: ram...@wso2.com >>>>> P: +94 772534930 >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> Architecture@wso2.org >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Imesh Gunaratne* >>>> Software Architect >>>> WSO2 Inc: http://wso2.com >>>> T: +94 11 214 5345 M: +94 77 374 2057 >>>> W: https://medium.com/@imesh TW: @imesh >>>> lean. enterprise. middleware >>>> >>>> >>> >>> >>> -- >>> *Imesh Gunaratne* >>> Software Architect >>> WSO2 Inc: http://wso2.com >>> T: +94 11 214 5345 M: +94 77 374 2057 >>> W: https://medium.com/@imesh TW: @imesh >>> lean. enterprise. middleware >>> >>> >>> _______________________________________________ >>> Architecture mailing list >>> Architecture@wso2.org >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Asanka Abeyweera >> Senior Software Engineer >> WSO2 Inc. >> >> Phone: +94 712228648 >> Blog: a5anka.github.io >> >> <https://wso2.com/signature> >> >> _______________________________________________ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > *Imesh Gunaratne* > Software Architect > WSO2 Inc: http://wso2.com > T: +94 11 214 5345 M: +94 77 374 2057 > W: https://medium.com/@imesh TW: @imesh > lean. enterprise. middleware > > > _______________________________________________ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Akila Ravihansa Perera WSO2 Inc.; http://wso2.com/ Blog: http://ravihansa3000.blogspot.com
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture