Re: [Architecture] RDBMS based coordinator election algorithm for MB

Akila Ravihansa Perera Thu, 04 Aug 2016 23:03:06 -0700

Hi,

I think the original problem here is that MB needs to absolutely guarantee
the integrity of the data written to the database. And if I understood
correctly, only the coordinator can write specific entries to the database
which is a unique scenario for MB. Any network based leader election
algorithm (even database based one) can cause split brain due to
network/latency issues, but using a database based leader election
algorithm is safe here because if it cannot connect to the database then no
harm can be done to the data integrity. Am I correct to say that?


However, on a separate node, it would be better to look into a new design
where MB doesn't depend on a coordinator to write data into the db. This is
a major bottleneck to scale.

Thanks.

On Fri, Aug 5, 2016 at 11:21 AM, Imesh Gunaratne <im...@wso2.com> wrote:

> Hi Asitha/Asanka,
>
> I think it is clear that the issue we have here is mostly related to
> Hazelcast.
>
> Now to solve that problem I think it would be better to go ahead with a
> generic leader election system for the entire platform rather than writing
> one specific to MB. This requirement is there in several other products and
> for some a database driven approach might not work.
>
> Therefore it would be better if we can decouple this from the product and
> use an interface to talk to a leader election module. This module can
> either be implemented as a separate component or utilize an existing system
> such as etcd.
>
> To start with I think it would be better to evaluate what etcd and consul
> has to offer and check whether they fit to our requirement.
>
> Thanks
>
> On Fri, Aug 5, 2016 at 10:12 AM, Asanka Abeyweera <asank...@wso2.com>
> wrote:
>
>> Hi Imesh,
>>
>> On Fri, Aug 5, 2016 at 7:33 AM, Imesh Gunaratne <im...@wso2.com> wrote:
>>
>>>
>>>
>>> On Fri, Aug 5, 2016 at 7:31 AM, Imesh Gunaratne <im...@wso2.com> wrote:
>>>>
>>>>
>>>> You can see here [3] how K8S has implemented leader election feature
>>>> for the products deployed on top of that to utilize.
>>>>
>>>
>>> Correction: Please refer [4].
>>>
>>>
>>>>
>>>>
>>>>> On Thu, Aug 4, 2016 at 7:27 PM, Asanka Abeyweera <asank...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Imesh,
>>>>>>
>>>>>> We are not implementing this to overcome a limitation in the
>>>>>> coordination algorithm available in the Hazlecast. We are implementing 
>>>>>> this
>>>>>> since we need an RDBMS based coordination algorithm (not a network based
>>>>>> algorithm).
>>>>>>
>>>>>
>>>> Are you saying that database connections do not use the same network
>>>> used by Hazelcast?
>>>>
>>>
>> Yes, This is most problematic when two interfaces are used for Hazelcast
>> communication and RDBMS communication. Additionally there is an edge case
>> even when a single interface is used for both Hazelcast and RDBMS
>> communication. When a cluster merge after a network segmentation, there can
>> be a delay in Hazelcast detecting the cluster merge. If a database is
>> accessed by multiple coordinators during this time, there can be message
>> delivery issues like message duplication. Therefore we cannot ignore this
>> issue even when the same network is used for Hazelcast and database
>> connections.
>> 
>>
>>
>>> The reason is, a network based election algorithm will always elect
>>>>>> multiple leaders when the network is partitioned. But if we use a RDBMS
>>>>>> based algorithm this will not happen.
>>>>>>
>>>>>
>>>> I do not think your argument is correct. If there is a problem with
>>>> the network, it may apply to both Hazelcast based solution and database
>>>> based solution.
>>>>
>>>> [4] http://blog.kubernetes.io/2016/01/simple-leader-election
>>>> -with-Kubernetes.html
>>>>
>>>> Thanks
>>>>
>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne <im...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Asanka,
>>>>>>>
>>>>>>> Do we really need to implement a leader election algorithm on our
>>>>>>> own? AFAIU this is a complex problem which has been already solved by
>>>>>>> several algorithms [1]. IMO it would be better to go ahead with an 
>>>>>>> existing
>>>>>>> well established implementation on etcd [1] or Consul [2].
>>>>>>>
>>>>>>> Those provide HTTP APIs for clients to make leader election calls.
>>>>>>> [3] is a client library written in Node.js for etcd based leader 
>>>>>>> election.
>>>>>>>
>>>>>>> [1] https://www.projectcalico.org/using-etcd-for-elections
>>>>>>> [2] https://www.consul.io/docs/guides/leader-election.html
>>>>>>> [3] https://www.npmjs.com/package/etcd-leader
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <asank...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Maninda,
>>>>>>>>
>>>>>>>> Since we are using RDBMS to poll the node status, the cluster will
>>>>>>>> not end up in situation 1,2 or 3. With this approach we consider a node
>>>>>>>> unreachable when it cannot access the database. Therefore an 
>>>>>>>> unreachable
>>>>>>>> node can never be the leader.
>>>>>>>>
>>>>>>>> As you have mentioned, we are currently using the RDBMS as an
>>>>>>>> atomic global variable to create the coordinator entry.
>>>>>>>>
>>>>>>>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya <
>>>>>>>> mani...@wso2.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Asanka,
>>>>>>>>>
>>>>>>>>> As I understand the accuracy of electing the leader correctly is
>>>>>>>>> dependent on the election mechanism with RDBMS because there can be 
>>>>>>>>> edge
>>>>>>>>> cases like,
>>>>>>>>>
>>>>>>>>> 1. Unreachable leader activates during the election process: Then
>>>>>>>>> who becomes the leader?
>>>>>>>>> 2. The elected leader becomes unreachable before the election is
>>>>>>>>> completed: Then will there be a situation where there is no leader?
>>>>>>>>> 3. A leader and a set of nodes are disconnected from the other
>>>>>>>>> part of the cluster and while the leader is trying to remove 
>>>>>>>>> unreachable
>>>>>>>>> members other part is calling an election to make a leader: Who will 
>>>>>>>>> win?
>>>>>>>>>
>>>>>>>>> RDBMS based election algorithm should handle such cases without
>>>>>>>>> bringing the cluster to an inconsistent state or dead lock in all
>>>>>>>>> concurrent cases. If all these kind of cases cannot be handled isn't 
>>>>>>>>> it
>>>>>>>>> better to keep the current hazelcast clustering and use the RDBMS 
>>>>>>>>> only to
>>>>>>>>> handle the split brain scenario? In other words when a new hazelcast 
>>>>>>>>> leader
>>>>>>>>> is elected it should be updated in the RDBMS. If another split party 
>>>>>>>>> has
>>>>>>>>> already elected a leader, the node who is going to write it to RDBMS 
>>>>>>>>> should
>>>>>>>>> avoid updating it. Simply, the RDBMS can be used as an atomic global
>>>>>>>>> variable to keep the leader name by modifying the hazelcast 
>>>>>>>>> clustering.
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Maninda Edirisooriya*
>>>>>>>>> Senior Software Engineer
>>>>>>>>>
>>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>>>>
>>>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>>>> *E-mail* : mani...@wso2.com
>>>>>>>>> *Skype* : @manindae
>>>>>>>>> *Twitter* : @maninda
>>>>>>>>>
>>>>>>>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera <
>>>>>>>>> asank...@wso2.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Akila,
>>>>>>>>>>
>>>>>>>>>> Let me explain the issue in a different way. Let's assume the MB
>>>>>>>>>> nodes are using two different network interfaces for Hazelcast
>>>>>>>>>> communication and database communication. With such a configuration, 
>>>>>>>>>> there
>>>>>>>>>> can be failures only in the network interface used for Hazelcast
>>>>>>>>>> communication in some nodes. When this happens, there will be two or 
>>>>>>>>>> more
>>>>>>>>>> Hazelcast clusters due to the network segmentation, and as a result 
>>>>>>>>>> there
>>>>>>>>>> will be multiple coordinators. Since every node still have access to 
>>>>>>>>>> the
>>>>>>>>>> database, multiple coordinators can affect the correctness of the 
>>>>>>>>>> data
>>>>>>>>>> stored in the DB. But if we used a RDBMS based approach we won't have
>>>>>>>>>> multiple coordinators due to a network partition in Hazelcast. This 
>>>>>>>>>> is one
>>>>>>>>>> advantage we get from this approach.
>>>>>>>>>>
>>>>>>>>>> Even when we use Zookeeper or RAFT the same issue will be there
>>>>>>>>>> since we are using different interfaces for Hazelcast communication 
>>>>>>>>>> and DB
>>>>>>>>>> communication.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera <
>>>>>>>>>> raviha...@wso2.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> What's the advantage of using RDBMS (even as an alternative) to
>>>>>>>>>>> implement a leader/coordinator election? If the network connection 
>>>>>>>>>>> to DB
>>>>>>>>>>> fails then this will be a single point of failure. I don't think we 
>>>>>>>>>>> can
>>>>>>>>>>> scale RDBMS instances and expect the election algorithm to work. 
>>>>>>>>>>> That would
>>>>>>>>>>> be reducing this problem to another problem (electing coordinator 
>>>>>>>>>>> RDBMS
>>>>>>>>>>> instance).
>>>>>>>>>>>
>>>>>>>>>>> IMHO it would be better to look at Zookeeper Atomic Broadcast
>>>>>>>>>>> (ZAB) [1] or RAFT leader election [2] algorithms which have already 
>>>>>>>>>>> proven
>>>>>>>>>>> results.
>>>>>>>>>>>
>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
>>>>>>>>>>> [2] http://libraft.io/
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana <
>>>>>>>>>>> nand...@wso2.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1 to make it a common component . We have the clustering
>>>>>>>>>>>> implementation for BPEL component based on hazelcast.  If the 
>>>>>>>>>>>> coordination
>>>>>>>>>>>> is available at RDBMS level, we can remove hazelcast dependancy.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Nandika
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda <
>>>>>>>>>>>> hasi...@wso2.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Can we make it a common component, which is not hard coupled
>>>>>>>>>>>>> with MB. BPS has the same requirement.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Hasitha.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera <
>>>>>>>>>>>>> asank...@wso2.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In MB, we have used a coordinator based approach to manage
>>>>>>>>>>>>>> distributed messaging algorithm in the cluster. Currently 
>>>>>>>>>>>>>> Hazelcast is used
>>>>>>>>>>>>>> to elect the coordinator. But one issue we faced with Hazelcast 
>>>>>>>>>>>>>> is, during
>>>>>>>>>>>>>> a network segmentation (split brain), Hazelcast can elect two or 
>>>>>>>>>>>>>> more
>>>>>>>>>>>>>> coordinators in the cluster. This affects the correctness of the
>>>>>>>>>>>>>> distributed messaging algorithm since there are some tables in 
>>>>>>>>>>>>>> the database
>>>>>>>>>>>>>> that should only be edited by a single node (i.e. coordinator).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As a solution to this problem we have implemented minimum
>>>>>>>>>>>>>> node count based approach [1] to deactivate set of partitioned 
>>>>>>>>>>>>>> nodes to
>>>>>>>>>>>>>> stop multiple nodes becoming coordinators until the network 
>>>>>>>>>>>>>> segmentation
>>>>>>>>>>>>>> issue is fixed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As an alternative solution, we are thinking of implementing
>>>>>>>>>>>>>> an RDBMS based approach to elect the coordinator node in the 
>>>>>>>>>>>>>> cluster. By
>>>>>>>>>>>>>> doing this we can make sure that even during a network 
>>>>>>>>>>>>>> segmentation only
>>>>>>>>>>>>>> one node will be elected as the coordinator node since the 
>>>>>>>>>>>>>> election is
>>>>>>>>>>>>>> happening through the database.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The algorithm will use a polling mechanism to check the
>>>>>>>>>>>>>> validity of the nodes. To make the election algorithm scalable, 
>>>>>>>>>>>>>> only the
>>>>>>>>>>>>>> coordinator node will be checking status of all the nodes in the 
>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>> and it will inform other nodes through database when a member is
>>>>>>>>>>>>>> added/left. The nodes will be only checking for the status of the
>>>>>>>>>>>>>> coordinator node. When a node detect that coordinator is invalid 
>>>>>>>>>>>>>> it will go
>>>>>>>>>>>>>> for a election to elect a new coordinator.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We are currently working on a POC to test how this works with
>>>>>>>>>>>>>> MB's slot based messaging algorithm.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thoughts?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Asanka Abeyweera
>>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Phone: +94 712228648
>>>>>>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Hasitha Aravinda,
>>>>>>>>>>>>> Associate Technical Lead,
>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>> Email: hasi...@wso2.com
>>>>>>>>>>>>> Mobile : +94 718 210 200
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Nandika Jayawardana
>>>>>>>>>>>> WSO2 Inc ; http://wso2.com
>>>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Akila Ravihansa Perera
>>>>>>>>>>> WSO2 Inc.;  http://wso2.com/
>>>>>>>>>>>
>>>>>>>>>>> Blog: http://ravihansa3000.blogspot.com
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Asanka Abeyweera
>>>>>>>>>> Senior Software Engineer
>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>
>>>>>>>>>> Phone: +94 712228648
>>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>>
>>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Architecture mailing list
>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Asanka Abeyweera
>>>>>>>> Senior Software Engineer
>>>>>>>> WSO2 Inc.
>>>>>>>>
>>>>>>>> Phone: +94 712228648
>>>>>>>> Blog: a5anka.github.io
>>>>>>>>
>>>>>>>> <https://wso2.com/signature>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Architecture mailing list
>>>>>>>> Architecture@wso2.org
>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Imesh Gunaratne*
>>>>>>> Software Architect
>>>>>>> WSO2 Inc: http://wso2.com
>>>>>>> T: +94 11 214 5345 M: +94 77 374 2057
>>>>>>> W: https://medium.com/@imesh TW: @imesh
>>>>>>> lean. enterprise. middleware
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> Architecture@wso2.org
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Asanka Abeyweera
>>>>>> Senior Software Engineer
>>>>>> WSO2 Inc.
>>>>>>
>>>>>> Phone: +94 712228648
>>>>>> Blog: a5anka.github.io
>>>>>>
>>>>>> <https://wso2.com/signature>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> Architecture@wso2.org
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ramith Jayasinghe
>>>>> Technical Lead
>>>>> WSO2 Inc., http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> E: ram...@wso2.com
>>>>> P: +94 772534930
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> Architecture@wso2.org
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Imesh Gunaratne*
>>>> Software Architect
>>>> WSO2 Inc: http://wso2.com
>>>> T: +94 11 214 5345 M: +94 77 374 2057
>>>> W: https://medium.com/@imesh TW: @imesh
>>>> lean. enterprise. middleware
>>>>
>>>>
>>>
>>>
>>> --
>>> *Imesh Gunaratne*
>>> Software Architect
>>> WSO2 Inc: http://wso2.com
>>> T: +94 11 214 5345 M: +94 77 374 2057
>>> W: https://medium.com/@imesh TW: @imesh
>>> lean. enterprise. middleware
>>>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> Architecture@wso2.org
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Asanka Abeyweera
>> Senior Software Engineer
>> WSO2 Inc.
>>
>> Phone: +94 712228648
>> Blog: a5anka.github.io
>>
>> <https://wso2.com/signature>
>>
>> _______________________________________________
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> *Imesh Gunaratne*
> Software Architect
> WSO2 Inc: http://wso2.com
> T: +94 11 214 5345 M: +94 77 374 2057
> W: https://medium.com/@imesh TW: @imesh
> lean. enterprise. middleware
>
>
> _______________________________________________
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Akila Ravihansa Perera
WSO2 Inc.;  http://wso2.com/

Blog: http://ravihansa3000.blogspot.com

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RDBMS based coordinator election algorithm for MB

Reply via email to