Re: [Architecture] RDBMS based coordinator election algorithm for MB

Malaka Silva Fri, 05 Aug 2016 01:03:07 -0700

The same issue with Hazelcast can be experienced with ESB inbounds (running
on top of NTASK) and VFS distribution locks.


The idea of only single worker works at a given time breaks if there is a
Hazelcast heart beat fails. This will make two workers to work in parallel.

Also with distributed locking there is no guarantee that file is only
process only by one worker.

So in the case of network fail
 with DB
make sense to stop processing until it's recovered.
 Also making this component generic ESB can reuse.

On Fri, Aug 5, 2016 at 9:21 AM, Asitha Nanayakkara <asi...@wso2.com> wrote:

> Hi Imesh,
>
> On Fri, Aug 5, 2016 at 7:33 AM, Imesh Gunaratne <im...@wso2.com> wrote:
>
>>
>>
>> On Fri, Aug 5, 2016 at 7:31 AM, Imesh Gunaratne <im...@wso2.com> wrote:
>>>
>>>
>>> You can see here [3] how K8S has implemented leader election feature for
>>> the products deployed on top of that to utilize.
>>>
>>
>> Correction: Please refer [4].
>>
>>
>>>
>>>
>>>> On Thu, Aug 4, 2016 at 7:27 PM, Asanka Abeyweera <asank...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi Imesh,
>>>>>
>>>>> We are not implementing this to overcome a limitation in the
>>>>> coordination algorithm available in the Hazlecast. We are implementing 
>>>>> this
>>>>> since we need an RDBMS based coordination algorithm (not a network based
>>>>> algorithm).
>>>>>
>>>>
>>> Are you saying that database connections do not use the same network
>>> used by Hazelcast?
>>> 
>>>
>>>
>>>> The reason is, a network based election algorithm will always elect
>>>>> multiple leaders when the network is partitioned. But if we use a RDBMS
>>>>> based algorithm this will not happen.
>>>>>
>>>>
>>> I do not think your argument is correct. If there is a problem with the
>>> network, it may apply to both Hazelcast based solution and database based
>>> solution.
>>>
>>
> Yes, if the same network interface is used network partion will cause all
> types of connections to be partitioned. But user can use multiple network
> interfaces for database, Hazelcast and thrift.
>
> Following is the scenario we are trying to solve in MB.
>
> In MB all the details related to messages, subscriptions, queues, topics
> etc are stored in database. And we operate depending on that information.
> If the MB node can't connect to the database that means the node is
> ineffective in the cluster until it can make a database connection.
>
> We have seen instances where Hazelcast cluster get partitioned for some
> time period in networks, Reasons were,
>
>    1. Due to heavy load Hazelcast couldn't process or send (some times
>    both) hearbeats, hence a network partition for Hazelcast cluster
>    2. An actual network partition of Hazelcast cluster
>
> In both scenarios the database connection was working. In that case we get
> two coordinators elected through Hazelcast and working on the same database
> to deliver the messages. this leads to inconsistencies in the cluster
> behavior (for instances duplicate message delivery, corrupred subscription
> states etc) .
>
> Since the point of interest for MB is the database, we decided to do the
> coordinator election through database as well. If the node can't connect to
> the database, then the MB won't operate anyway.
>
> Regards,
> Asitha
>
>
>>> [4] http://blog.kubernetes.io/2016/01/simple-leader-election
>>> -with-Kubernetes.html
>>>
>>> Thanks
>>>
>>>>
>>>>>
>>>>> On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne <im...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Asanka,
>>>>>>
>>>>>> Do we really need to implement a leader election algorithm on our
>>>>>> own? AFAIU this is a complex problem which has been already solved by
>>>>>> several algorithms [1]. IMO it would be better to go ahead with an 
>>>>>> existing
>>>>>> well established implementation on etcd [1] or Consul [2].
>>>>>>
>>>>>> Those provide HTTP APIs for clients to make leader election calls.
>>>>>> [3] is a client library written in Node.js for etcd based leader 
>>>>>> election.
>>>>>>
>>>>>> [1] https://www.projectcalico.org/using-etcd-for-elections
>>>>>> [2] https://www.consul.io/docs/guides/leader-election.html
>>>>>> [3] https://www.npmjs.com/package/etcd-leader
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <asank...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Maninda,
>>>>>>>
>>>>>>> Since we are using RDBMS to poll the node status, the cluster will
>>>>>>> not end up in situation 1,2 or 3. With this approach we consider a node
>>>>>>> unreachable when it cannot access the database. Therefore an unreachable
>>>>>>> node can never be the leader.
>>>>>>>
>>>>>>> As you have mentioned, we are currently using the RDBMS as an atomic
>>>>>>> global variable to create the coordinator entry.
>>>>>>>
>>>>>>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya <
>>>>>>> mani...@wso2.com> wrote:
>>>>>>>
>>>>>>>> Hi Asanka,
>>>>>>>>
>>>>>>>> As I understand the accuracy of electing the leader correctly is
>>>>>>>> dependent on the election mechanism with RDBMS because there can be 
>>>>>>>> edge
>>>>>>>> cases like,
>>>>>>>>
>>>>>>>> 1. Unreachable leader activates during the election process: Then
>>>>>>>> who becomes the leader?
>>>>>>>> 2. The elected leader becomes unreachable before the election is
>>>>>>>> completed: Then will there be a situation where there is no leader?
>>>>>>>> 3. A leader and a set of nodes are disconnected from the other part
>>>>>>>> of the cluster and while the leader is trying to remove unreachable 
>>>>>>>> members
>>>>>>>> other part is calling an election to make a leader: Who will win?
>>>>>>>>
>>>>>>>> RDBMS based election algorithm should handle such cases without
>>>>>>>> bringing the cluster to an inconsistent state or dead lock in all
>>>>>>>> concurrent cases. If all these kind of cases cannot be handled isn't it
>>>>>>>> better to keep the current hazelcast clustering and use the RDBMS only 
>>>>>>>> to
>>>>>>>> handle the split brain scenario? In other words when a new hazelcast 
>>>>>>>> leader
>>>>>>>> is elected it should be updated in the RDBMS. If another split party 
>>>>>>>> has
>>>>>>>> already elected a leader, the node who is going to write it to RDBMS 
>>>>>>>> should
>>>>>>>> avoid updating it. Simply, the RDBMS can be used as an atomic global
>>>>>>>> variable to keep the leader name by modifying the hazelcast clustering.
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Maninda Edirisooriya*
>>>>>>>> Senior Software Engineer
>>>>>>>>
>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>>>
>>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>>> *E-mail* : mani...@wso2.com
>>>>>>>> *Skype* : @manindae
>>>>>>>> *Twitter* : @maninda
>>>>>>>>
>>>>>>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera <
>>>>>>>> asank...@wso2.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Akila,
>>>>>>>>>
>>>>>>>>> Let me explain the issue in a different way. Let's assume the MB
>>>>>>>>> nodes are using two different network interfaces for Hazelcast
>>>>>>>>> communication and database communication. With such a configuration, 
>>>>>>>>> there
>>>>>>>>> can be failures only in the network interface used for Hazelcast
>>>>>>>>> communication in some nodes. When this happens, there will be two or 
>>>>>>>>> more
>>>>>>>>> Hazelcast clusters due to the network segmentation, and as a result 
>>>>>>>>> there
>>>>>>>>> will be multiple coordinators. Since every node still have access to 
>>>>>>>>> the
>>>>>>>>> database, multiple coordinators can affect the correctness of the data
>>>>>>>>> stored in the DB. But if we used a RDBMS based approach we won't have
>>>>>>>>> multiple coordinators due to a network partition in Hazelcast. This 
>>>>>>>>> is one
>>>>>>>>> advantage we get from this approach.
>>>>>>>>>
>>>>>>>>> Even when we use Zookeeper or RAFT the same issue will be there
>>>>>>>>> since we are using different interfaces for Hazelcast communication 
>>>>>>>>> and DB
>>>>>>>>> communication.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera <
>>>>>>>>> raviha...@wso2.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> What's the advantage of using RDBMS (even as an alternative) to
>>>>>>>>>> implement a leader/coordinator election? If the network connection 
>>>>>>>>>> to DB
>>>>>>>>>> fails then this will be a single point of failure. I don't think we 
>>>>>>>>>> can
>>>>>>>>>> scale RDBMS instances and expect the election algorithm to work. 
>>>>>>>>>> That would
>>>>>>>>>> be reducing this problem to another problem (electing coordinator 
>>>>>>>>>> RDBMS
>>>>>>>>>> instance).
>>>>>>>>>>
>>>>>>>>>> IMHO it would be better to look at Zookeeper Atomic Broadcast
>>>>>>>>>> (ZAB) [1] or RAFT leader election [2] algorithms which have already 
>>>>>>>>>> proven
>>>>>>>>>> results.
>>>>>>>>>>
>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
>>>>>>>>>> [2] http://libraft.io/
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana <
>>>>>>>>>> nand...@wso2.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 to make it a common component . We have the clustering
>>>>>>>>>>> implementation for BPEL component based on hazelcast.  If the 
>>>>>>>>>>> coordination
>>>>>>>>>>> is available at RDBMS level, we can remove hazelcast dependancy.
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Nandika
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda <
>>>>>>>>>>> hasi...@wso2.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Can we make it a common component, which is not hard coupled
>>>>>>>>>>>> with MB. BPS has the same requirement.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Hasitha.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera <
>>>>>>>>>>>> asank...@wso2.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In MB, we have used a coordinator based approach to manage
>>>>>>>>>>>>> distributed messaging algorithm in the cluster. Currently 
>>>>>>>>>>>>> Hazelcast is used
>>>>>>>>>>>>> to elect the coordinator. But one issue we faced with Hazelcast 
>>>>>>>>>>>>> is, during
>>>>>>>>>>>>> a network segmentation (split brain), Hazelcast can elect two or 
>>>>>>>>>>>>> more
>>>>>>>>>>>>> coordinators in the cluster. This affects the correctness of the
>>>>>>>>>>>>> distributed messaging algorithm since there are some tables in 
>>>>>>>>>>>>> the database
>>>>>>>>>>>>> that should only be edited by a single node (i.e. coordinator).
>>>>>>>>>>>>>
>>>>>>>>>>>>> As a solution to this problem we have implemented minimum node
>>>>>>>>>>>>> count based approach [1] to deactivate set of partitioned nodes 
>>>>>>>>>>>>> to stop
>>>>>>>>>>>>> multiple nodes becoming coordinators until the network 
>>>>>>>>>>>>> segmentation issue
>>>>>>>>>>>>> is fixed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As an alternative solution, we are thinking of implementing an
>>>>>>>>>>>>> RDBMS based approach to elect the coordinator node in the 
>>>>>>>>>>>>> cluster. By doing
>>>>>>>>>>>>> this we can make sure that even during a network segmentation 
>>>>>>>>>>>>> only one node
>>>>>>>>>>>>> will be elected as the coordinator node since the election is 
>>>>>>>>>>>>> happening
>>>>>>>>>>>>> through the database.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The algorithm will use a polling mechanism to check the
>>>>>>>>>>>>> validity of the nodes. To make the election algorithm scalable, 
>>>>>>>>>>>>> only the
>>>>>>>>>>>>> coordinator node will be checking status of all the nodes in the 
>>>>>>>>>>>>> cluster
>>>>>>>>>>>>> and it will inform other nodes through database when a member is
>>>>>>>>>>>>> added/left. The nodes will be only checking for the status of the
>>>>>>>>>>>>> coordinator node. When a node detect that coordinator is invalid 
>>>>>>>>>>>>> it will go
>>>>>>>>>>>>> for a election to elect a new coordinator.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We are currently working on a POC to test how this works with
>>>>>>>>>>>>> MB's slot based messaging algorithm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Asanka Abeyweera
>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Phone: +94 712228648
>>>>>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>>>>>
>>>>>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> Hasitha Aravinda,
>>>>>>>>>>>> Associate Technical Lead,
>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>> Email: hasi...@wso2.com
>>>>>>>>>>>> Mobile : +94 718 210 200
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Nandika Jayawardana
>>>>>>>>>>> WSO2 Inc ; http://wso2.com
>>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Akila Ravihansa Perera
>>>>>>>>>> WSO2 Inc.;  http://wso2.com/
>>>>>>>>>>
>>>>>>>>>> Blog: http://ravihansa3000.blogspot.com
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Architecture mailing list
>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Asanka Abeyweera
>>>>>>>>> Senior Software Engineer
>>>>>>>>> WSO2 Inc.
>>>>>>>>>
>>>>>>>>> Phone: +94 712228648
>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>
>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Architecture mailing list
>>>>>>>>> Architecture@wso2.org
>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Asanka Abeyweera
>>>>>>> Senior Software Engineer
>>>>>>> WSO2 Inc.
>>>>>>>
>>>>>>> Phone: +94 712228648
>>>>>>> Blog: a5anka.github.io
>>>>>>>
>>>>>>> <https://wso2.com/signature>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> Architecture@wso2.org
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Imesh Gunaratne*
>>>>>> Software Architect
>>>>>> WSO2 Inc: http://wso2.com
>>>>>> T: +94 11 214 5345 M: +94 77 374 2057
>>>>>> W: https://medium.com/@imesh TW: @imesh
>>>>>> lean. enterprise. middleware
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> Architecture@wso2.org
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Asanka Abeyweera
>>>>> Senior Software Engineer
>>>>> WSO2 Inc.
>>>>>
>>>>> Phone: +94 712228648
>>>>> Blog: a5anka.github.io
>>>>>
>>>>> <https://wso2.com/signature>
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> Architecture@wso2.org
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ramith Jayasinghe
>>>> Technical Lead
>>>> WSO2 Inc., http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> E: ram...@wso2.com
>>>> P: +94 772534930
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> Architecture@wso2.org
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> *Imesh Gunaratne*
>>> Software Architect
>>> WSO2 Inc: http://wso2.com
>>> T: +94 11 214 5345 M: +94 77 374 2057
>>> W: https://medium.com/@imesh TW: @imesh
>>> lean. enterprise. middleware
>>>
>>>
>>
>>
>> --
>> *Imesh Gunaratne*
>> Software Architect
>> WSO2 Inc: http://wso2.com
>> T: +94 11 214 5345 M: +94 77 374 2057
>> W: https://medium.com/@imesh TW: @imesh
>> lean. enterprise. middleware
>>
>>
>> _______________________________________________
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> *Asitha Nanayakkara* <http://asitha.github.io/>
> Senior Software Engineer
> WSO2, Inc. <http://wso2.com/>
> Mob: +94 77 853 0682
> [image: https://wso2.com/signature] <https://wso2.com/signature>
>
>
> _______________________________________________
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 

Best Regards,

Malaka Silva
Senior Technical Lead
M: +94 777 219 791
Tel : 94 11 214 5345
Fax :94 11 2145300
Skype : malaka.sampath.silva
LinkedIn : http://www.linkedin.com/pub/malaka-silva/6/33/77
Blog : http://mrmalakasilva.blogspot.com/

WSO2, Inc.
lean . enterprise . middleware
https://wso2.com/signature
http://www.wso2.com/about/team/malaka-silva/
<http://wso2.com/about/team/malaka-silva/>
https://store.wso2.com/store/

Don't make Trees rare, we should keep them with care

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RDBMS based coordinator election algorithm for MB

Reply via email to