Hi,
On Fri, Aug 5, 2016 at 11:31 AM, Akila Ravihansa Perera <raviha...@wso2.com> wrote: > Hi, > > I think the original problem here is that MB needs to absolutely guarantee > the integrity of the data written to the database. And if I understood > correctly, only the coordinator can write specific entries to the database > which is a unique scenario for MB. Any network based leader election > algorithm (even database based one) can cause split brain due to > network/latency issues, but using a database based leader election > algorithm is safe here because if it cannot connect to the database then no > harm can be done to the data integrity. Am I correct to say that? > > However, on a separate node, it would be better to look into a new design > where MB doesn't depend on a coordinator to write data into the db. This is > a major bottleneck to scale. > True. But MB is naturally stateful. There are tasks only one node should perform on behalf of the cluster for integrity of data. I think CAP theorem is in action here. We can only achieve only two out of three factors (Consistency/Availability and Partition tolerance). [1] [1]. https://dzone.com/articles/better-explaining-cap-theorem > > Thanks. > > On Fri, Aug 5, 2016 at 11:21 AM, Imesh Gunaratne <im...@wso2.com> wrote: > >> Hi Asitha/Asanka, >> >> I think it is clear that the issue we have here is mostly related to >> Hazelcast. >> >> Now to solve that problem I think it would be better to go ahead with a >> generic leader election system for the entire platform rather than writing >> one specific to MB. This requirement is there in several other products and >> for some a database driven approach might not work. >> >> Therefore it would be better if we can decouple this from the product and >> use an interface to talk to a leader election module. This module can >> either be implemented as a separate component or utilize an existing system >> such as etcd. >> >> To start with I think it would be better to evaluate what etcd and consul >> has to offer and check whether they fit to our requirement. >> >> Thanks >> >> On Fri, Aug 5, 2016 at 10:12 AM, Asanka Abeyweera <asank...@wso2.com> >> wrote: >> >>> Hi Imesh, >>> >>> On Fri, Aug 5, 2016 at 7:33 AM, Imesh Gunaratne <im...@wso2.com> wrote: >>> >>>> >>>> >>>> On Fri, Aug 5, 2016 at 7:31 AM, Imesh Gunaratne <im...@wso2.com> wrote: >>>>> >>>>> >>>>> You can see here [3] how K8S has implemented leader election feature >>>>> for the products deployed on top of that to utilize. >>>>> >>>> >>>> Correction: Please refer [4]. >>>> >>>> >>>>> >>>>> >>>>>> On Thu, Aug 4, 2016 at 7:27 PM, Asanka Abeyweera <asank...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Imesh, >>>>>>> >>>>>>> We are not implementing this to overcome a limitation in the >>>>>>> coordination algorithm available in the Hazlecast. We are implementing >>>>>>> this >>>>>>> since we need an RDBMS based coordination algorithm (not a network based >>>>>>> algorithm). >>>>>>> >>>>>> >>>>> Are you saying that database connections do not use the same network >>>>> used by Hazelcast? >>>>> >>>> >>> Yes, This is most problematic when two interfaces are used for Hazelcast >>> communication and RDBMS communication. Additionally there is an edge case >>> even when a single interface is used for both Hazelcast and RDBMS >>> communication. When a cluster merge after a network segmentation, there can >>> be a delay in Hazelcast detecting the cluster merge. If a database is >>> accessed by multiple coordinators during this time, there can be message >>> delivery issues like message duplication. Therefore we cannot ignore this >>> issue even when the same network is used for Hazelcast and database >>> connections. >>> >>> >>> >>>> The reason is, a network based election algorithm will always elect >>>>>>> multiple leaders when the network is partitioned. But if we use a RDBMS >>>>>>> based algorithm this will not happen. >>>>>>> >>>>>> >>>>> I do not think your argument is correct. If there is a problem with >>>>> the network, it may apply to both Hazelcast based solution and database >>>>> based solution. >>>>> >>>>> [4] http://blog.kubernetes.io/2016/01/simple-leader-election >>>>> -with-Kubernetes.html >>>>> >>>>> Thanks >>>>> >>>>>> >>>>>>> >>>>>>> On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne <im...@wso2.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Asanka, >>>>>>>> >>>>>>>> Do we really need to implement a leader election algorithm on our >>>>>>>> own? AFAIU this is a complex problem which has been already solved by >>>>>>>> several algorithms [1]. IMO it would be better to go ahead with an >>>>>>>> existing >>>>>>>> well established implementation on etcd [1] or Consul [2]. >>>>>>>> >>>>>>>> Those provide HTTP APIs for clients to make leader election calls. >>>>>>>> [3] is a client library written in Node.js for etcd based leader >>>>>>>> election. >>>>>>>> >>>>>>>> [1] https://www.projectcalico.org/using-etcd-for-elections >>>>>>>> [2] https://www.consul.io/docs/guides/leader-election.html >>>>>>>> [3] https://www.npmjs.com/package/etcd-leader >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <asank...@wso2.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Hi Maninda, >>>>>>>>> >>>>>>>>> Since we are using RDBMS to poll the node status, the cluster will >>>>>>>>> not end up in situation 1,2 or 3. With this approach we consider a >>>>>>>>> node >>>>>>>>> unreachable when it cannot access the database. Therefore an >>>>>>>>> unreachable >>>>>>>>> node can never be the leader. >>>>>>>>> >>>>>>>>> As you have mentioned, we are currently using the RDBMS as an >>>>>>>>> atomic global variable to create the coordinator entry. >>>>>>>>> >>>>>>>>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya < >>>>>>>>> mani...@wso2.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Asanka, >>>>>>>>>> >>>>>>>>>> As I understand the accuracy of electing the leader correctly is >>>>>>>>>> dependent on the election mechanism with RDBMS because there can be >>>>>>>>>> edge >>>>>>>>>> cases like, >>>>>>>>>> >>>>>>>>>> 1. Unreachable leader activates during the election process: Then >>>>>>>>>> who becomes the leader? >>>>>>>>>> 2. The elected leader becomes unreachable before the election is >>>>>>>>>> completed: Then will there be a situation where there is no leader? >>>>>>>>>> 3. A leader and a set of nodes are disconnected from the other >>>>>>>>>> part of the cluster and while the leader is trying to remove >>>>>>>>>> unreachable >>>>>>>>>> members other part is calling an election to make a leader: Who will >>>>>>>>>> win? >>>>>>>>>> >>>>>>>>>> RDBMS based election algorithm should handle such cases without >>>>>>>>>> bringing the cluster to an inconsistent state or dead lock in all >>>>>>>>>> concurrent cases. If all these kind of cases cannot be handled isn't >>>>>>>>>> it >>>>>>>>>> better to keep the current hazelcast clustering and use the RDBMS >>>>>>>>>> only to >>>>>>>>>> handle the split brain scenario? In other words when a new hazelcast >>>>>>>>>> leader >>>>>>>>>> is elected it should be updated in the RDBMS. If another split party >>>>>>>>>> has >>>>>>>>>> already elected a leader, the node who is going to write it to RDBMS >>>>>>>>>> should >>>>>>>>>> avoid updating it. Simply, the RDBMS can be used as an atomic global >>>>>>>>>> variable to keep the leader name by modifying the hazelcast >>>>>>>>>> clustering. >>>>>>>>>> WDYT? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Maninda Edirisooriya* >>>>>>>>>> Senior Software Engineer >>>>>>>>>> >>>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>>>> >>>>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>>>> *E-mail* : mani...@wso2.com >>>>>>>>>> *Skype* : @manindae >>>>>>>>>> *Twitter* : @maninda >>>>>>>>>> >>>>>>>>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera < >>>>>>>>>> asank...@wso2.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Akila, >>>>>>>>>>> >>>>>>>>>>> Let me explain the issue in a different way. Let's assume the MB >>>>>>>>>>> nodes are using two different network interfaces for Hazelcast >>>>>>>>>>> communication and database communication. With such a >>>>>>>>>>> configuration, there >>>>>>>>>>> can be failures only in the network interface used for Hazelcast >>>>>>>>>>> communication in some nodes. When this happens, there will be two >>>>>>>>>>> or more >>>>>>>>>>> Hazelcast clusters due to the network segmentation, and as a result >>>>>>>>>>> there >>>>>>>>>>> will be multiple coordinators. Since every node still have access >>>>>>>>>>> to the >>>>>>>>>>> database, multiple coordinators can affect the correctness of the >>>>>>>>>>> data >>>>>>>>>>> stored in the DB. But if we used a RDBMS based approach we won't >>>>>>>>>>> have >>>>>>>>>>> multiple coordinators due to a network partition in Hazelcast. This >>>>>>>>>>> is one >>>>>>>>>>> advantage we get from this approach. >>>>>>>>>>> >>>>>>>>>>> Even when we use Zookeeper or RAFT the same issue will be there >>>>>>>>>>> since we are using different interfaces for Hazelcast communication >>>>>>>>>>> and DB >>>>>>>>>>> communication. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera < >>>>>>>>>>> raviha...@wso2.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> What's the advantage of using RDBMS (even as an alternative) to >>>>>>>>>>>> implement a leader/coordinator election? If the network connection >>>>>>>>>>>> to DB >>>>>>>>>>>> fails then this will be a single point of failure. I don't think >>>>>>>>>>>> we can >>>>>>>>>>>> scale RDBMS instances and expect the election algorithm to work. >>>>>>>>>>>> That would >>>>>>>>>>>> be reducing this problem to another problem (electing coordinator >>>>>>>>>>>> RDBMS >>>>>>>>>>>> instance). >>>>>>>>>>>> >>>>>>>>>>>> IMHO it would be better to look at Zookeeper Atomic Broadcast >>>>>>>>>>>> (ZAB) [1] or RAFT leader election [2] algorithms which have >>>>>>>>>>>> already proven >>>>>>>>>>>> results. >>>>>>>>>>>> >>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Za >>>>>>>>>>>> b1.0 >>>>>>>>>>>> [2] http://libraft.io/ >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana < >>>>>>>>>>>> nand...@wso2.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> +1 to make it a common component . We have the clustering >>>>>>>>>>>>> implementation for BPEL component based on hazelcast. If the >>>>>>>>>>>>> coordination >>>>>>>>>>>>> is available at RDBMS level, we can remove hazelcast dependancy. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards >>>>>>>>>>>>> Nandika >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda < >>>>>>>>>>>>> hasi...@wso2.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Can we make it a common component, which is not hard coupled >>>>>>>>>>>>>> with MB. BPS has the same requirement. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Hasitha. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera < >>>>>>>>>>>>>> asank...@wso2.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In MB, we have used a coordinator based approach to manage >>>>>>>>>>>>>>> distributed messaging algorithm in the cluster. Currently >>>>>>>>>>>>>>> Hazelcast is used >>>>>>>>>>>>>>> to elect the coordinator. But one issue we faced with Hazelcast >>>>>>>>>>>>>>> is, during >>>>>>>>>>>>>>> a network segmentation (split brain), Hazelcast can elect two >>>>>>>>>>>>>>> or more >>>>>>>>>>>>>>> coordinators in the cluster. This affects the correctness of the >>>>>>>>>>>>>>> distributed messaging algorithm since there are some tables in >>>>>>>>>>>>>>> the database >>>>>>>>>>>>>>> that should only be edited by a single node (i.e. coordinator). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As a solution to this problem we have implemented minimum >>>>>>>>>>>>>>> node count based approach [1] to deactivate set of partitioned >>>>>>>>>>>>>>> nodes to >>>>>>>>>>>>>>> stop multiple nodes becoming coordinators until the network >>>>>>>>>>>>>>> segmentation >>>>>>>>>>>>>>> issue is fixed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As an alternative solution, we are thinking of implementing >>>>>>>>>>>>>>> an RDBMS based approach to elect the coordinator node in the >>>>>>>>>>>>>>> cluster. By >>>>>>>>>>>>>>> doing this we can make sure that even during a network >>>>>>>>>>>>>>> segmentation only >>>>>>>>>>>>>>> one node will be elected as the coordinator node since the >>>>>>>>>>>>>>> election is >>>>>>>>>>>>>>> happening through the database. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The algorithm will use a polling mechanism to check the >>>>>>>>>>>>>>> validity of the nodes. To make the election algorithm scalable, >>>>>>>>>>>>>>> only the >>>>>>>>>>>>>>> coordinator node will be checking status of all the nodes in >>>>>>>>>>>>>>> the cluster >>>>>>>>>>>>>>> and it will inform other nodes through database when a member is >>>>>>>>>>>>>>> added/left. The nodes will be only checking for the status of >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> coordinator node. When a node detect that coordinator is >>>>>>>>>>>>>>> invalid it will go >>>>>>>>>>>>>>> for a election to elect a new coordinator. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We are currently working on a POC to test how this works >>>>>>>>>>>>>>> with MB's slot based messaging algorithm. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> thoughts? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Asanka Abeyweera >>>>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Phone: +94 712228648 >>>>>>>>>>>>>>> Blog: a5anka.github.io >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> <https://wso2.com/signature> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Hasitha Aravinda, >>>>>>>>>>>>>> Associate Technical Lead, >>>>>>>>>>>>>> WSO2 Inc. >>>>>>>>>>>>>> Email: hasi...@wso2.com >>>>>>>>>>>>>> Mobile : +94 718 210 200 >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nandika Jayawardana >>>>>>>>>>>>> WSO2 Inc ; http://wso2.com >>>>>>>>>>>>> lean.enterprise.middleware >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Akila Ravihansa Perera >>>>>>>>>>>> WSO2 Inc.; http://wso2.com/ >>>>>>>>>>>> >>>>>>>>>>>> Blog: http://ravihansa3000.blogspot.com >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Asanka Abeyweera >>>>>>>>>>> Senior Software Engineer >>>>>>>>>>> WSO2 Inc. >>>>>>>>>>> >>>>>>>>>>> Phone: +94 712228648 >>>>>>>>>>> Blog: a5anka.github.io >>>>>>>>>>> >>>>>>>>>>> <https://wso2.com/signature> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Architecture mailing list >>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Asanka Abeyweera >>>>>>>>> Senior Software Engineer >>>>>>>>> WSO2 Inc. >>>>>>>>> >>>>>>>>> Phone: +94 712228648 >>>>>>>>> Blog: a5anka.github.io >>>>>>>>> >>>>>>>>> <https://wso2.com/signature> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Architecture mailing list >>>>>>>>> Architecture@wso2.org >>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Imesh Gunaratne* >>>>>>>> Software Architect >>>>>>>> WSO2 Inc: http://wso2.com >>>>>>>> T: +94 11 214 5345 M: +94 77 374 2057 >>>>>>>> W: https://medium.com/@imesh TW: @imesh >>>>>>>> lean. enterprise. middleware >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Architecture mailing list >>>>>>>> Architecture@wso2.org >>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Asanka Abeyweera >>>>>>> Senior Software Engineer >>>>>>> WSO2 Inc. >>>>>>> >>>>>>> Phone: +94 712228648 >>>>>>> Blog: a5anka.github.io >>>>>>> >>>>>>> <https://wso2.com/signature> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Architecture mailing list >>>>>>> Architecture@wso2.org >>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ramith Jayasinghe >>>>>> Technical Lead >>>>>> WSO2 Inc., http://wso2.com >>>>>> lean.enterprise.middleware >>>>>> >>>>>> E: ram...@wso2.com >>>>>> P: +94 772534930 >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> Architecture@wso2.org >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Imesh Gunaratne* >>>>> Software Architect >>>>> WSO2 Inc: http://wso2.com >>>>> T: +94 11 214 5345 M: +94 77 374 2057 >>>>> W: https://medium.com/@imesh TW: @imesh >>>>> lean. enterprise. middleware >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Imesh Gunaratne* >>>> Software Architect >>>> WSO2 Inc: http://wso2.com >>>> T: +94 11 214 5345 M: +94 77 374 2057 >>>> W: https://medium.com/@imesh TW: @imesh >>>> lean. enterprise. middleware >>>> >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> Architecture@wso2.org >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> Asanka Abeyweera >>> Senior Software Engineer >>> WSO2 Inc. >>> >>> Phone: +94 712228648 >>> Blog: a5anka.github.io >>> >>> <https://wso2.com/signature> >>> >>> _______________________________________________ >>> Architecture mailing list >>> Architecture@wso2.org >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> *Imesh Gunaratne* >> Software Architect >> WSO2 Inc: http://wso2.com >> T: +94 11 214 5345 M: +94 77 374 2057 >> W: https://medium.com/@imesh TW: @imesh >> lean. enterprise. middleware >> >> >> _______________________________________________ >> Architecture mailing list >> Architecture@wso2.org >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Akila Ravihansa Perera > WSO2 Inc.; http://wso2.com/ > > Blog: http://ravihansa3000.blogspot.com > > _______________________________________________ > Architecture mailing list > Architecture@wso2.org > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- *Hasitha Abeykoon* Senior Software Engineer; WSO2, Inc.; http://wso2.com *cell:* *+94 719363063* *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture