Re: [Architecture] RDBMS based coordinator election algorithm for MB

Ramith Jayasinghe Thu, 04 Aug 2016 07:28:57 -0700

Leader election is currently based on hazelcast and things get complicated
when a network partition happens. if the node looses access to database and
the others in the cluster that's comparatively safe ( when nodes are not
incurring moderate load).


Now the problem really is in situations where network is being partitioned
in such a way that the clustering library (whichever we choose to use)
can't see others but nodes have access to the database. the scenario could
be other way around as well.

On Thu, Aug 4, 2016 at 7:27 PM, Asanka Abeyweera <asank...@wso2.com> wrote:

> Hi Imesh,
>
> We are not implementing this to overcome a limitation in the coordination
> algorithm available in the Hazlecast. We are implementing this since we
> need an RDBMS based coordination algorithm (not a network based algorithm).
> The reason is, a network based election algorithm will always elect
> multiple leaders when the network is partitioned. But if we use a RDBMS
> based algorithm this will not happen. We could not find a opensource
> library that implement a RDBMS based coordination algorithm. That is why we
> started writing our own one.
>
>
>
> On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne <im...@wso2.com> wrote:
>
>> Hi Asanka,
>>
>> Do we really need to implement a leader election algorithm on our own?
>> AFAIU this is a complex problem which has been already solved by several
>> algorithms [1]. IMO it would be better to go ahead with an existing well
>> established implementation on etcd [1] or Consul [2].
>>
>> Those provide HTTP APIs for clients to make leader election calls. [3] is
>> a client library written in Node.js for etcd based leader election.
>>
>> [1] https://www.projectcalico.org/using-etcd-for-elections
>> [2] https://www.consul.io/docs/guides/leader-election.html
>> [3] https://www.npmjs.com/package/etcd-leader
>>
>> Thanks
>>
>> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <asank...@wso2.com>
>> wrote:
>>
>>> Hi Maninda,
>>>
>>> Since we are using RDBMS to poll the node status, the cluster will not
>>> end up in situation 1,2 or 3. With this approach we consider a node
>>> unreachable when it cannot access the database. Therefore an unreachable
>>> node can never be the leader.
>>>
>>> As you have mentioned, we are currently using the RDBMS as an atomic
>>> global variable to create the coordinator entry.
>>>
>>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya <mani...@wso2.com>
>>> wrote:
>>>
>>>> Hi Asanka,
>>>>
>>>> As I understand the accuracy of electing the leader correctly is
>>>> dependent on the election mechanism with RDBMS because there can be edge
>>>> cases like,
>>>>
>>>> 1. Unreachable leader activates during the election process: Then who
>>>> becomes the leader?
>>>> 2. The elected leader becomes unreachable before the election is
>>>> completed: Then will there be a situation where there is no leader?
>>>> 3. A leader and a set of nodes are disconnected from the other part of
>>>> the cluster and while the leader is trying to remove unreachable members
>>>> other part is calling an election to make a leader: Who will win?
>>>>
>>>> RDBMS based election algorithm should handle such cases without
>>>> bringing the cluster to an inconsistent state or dead lock in all
>>>> concurrent cases. If all these kind of cases cannot be handled isn't it
>>>> better to keep the current hazelcast clustering and use the RDBMS only to
>>>> handle the split brain scenario? In other words when a new hazelcast leader
>>>> is elected it should be updated in the RDBMS. If another split party has
>>>> already elected a leader, the node who is going to write it to RDBMS should
>>>> avoid updating it. Simply, the RDBMS can be used as an atomic global
>>>> variable to keep the leader name by modifying the hazelcast clustering.
>>>> WDYT?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> *Maninda Edirisooriya*
>>>> Senior Software Engineer
>>>>
>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>
>>>> *Blog* : http://maninda.blogspot.com/
>>>> *E-mail* : mani...@wso2.com
>>>> *Skype* : @manindae
>>>> *Twitter* : @maninda
>>>>
>>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera <asank...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi Akila,
>>>>>
>>>>> Let me explain the issue in a different way. Let's assume the MB nodes
>>>>> are using two different network interfaces for Hazelcast communication and
>>>>> database communication. With such a configuration, there can be failures
>>>>> only in the network interface used for Hazelcast communication in some
>>>>> nodes. When this happens, there will be two or more Hazelcast clusters due
>>>>> to the network segmentation, and as a result there will be multiple
>>>>> coordinators. Since every node still have access to the database, multiple
>>>>> coordinators can affect the correctness of the data stored in the DB. But
>>>>> if we used a RDBMS based approach we won't have multiple coordinators due
>>>>> to a network partition in Hazelcast. This is one advantage we get from 
>>>>> this
>>>>> approach.
>>>>>
>>>>> Even when we use Zookeeper or RAFT the same issue will be there since
>>>>> we are using different interfaces for Hazelcast communication and DB
>>>>> communication.
>>>>>
>>>>>
>>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera <
>>>>> raviha...@wso2.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> What's the advantage of using RDBMS (even as an alternative) to
>>>>>> implement a leader/coordinator election? If the network connection to DB
>>>>>> fails then this will be a single point of failure. I don't think we can
>>>>>> scale RDBMS instances and expect the election algorithm to work. That 
>>>>>> would
>>>>>> be reducing this problem to another problem (electing coordinator RDBMS
>>>>>> instance).
>>>>>>
>>>>>> IMHO it would be better to look at Zookeeper Atomic Broadcast (ZAB)
>>>>>> [1] or RAFT leader election [2] algorithms which have already proven
>>>>>> results.
>>>>>>
>>>>>> [1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
>>>>>> [2] http://libraft.io/
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana <
>>>>>> nand...@wso2.com> wrote:
>>>>>>
>>>>>>> +1 to make it a common component . We have the clustering
>>>>>>> implementation for BPEL component based on hazelcast.  If the 
>>>>>>> coordination
>>>>>>> is available at RDBMS level, we can remove hazelcast dependancy.
>>>>>>>
>>>>>>> Regards
>>>>>>> Nandika
>>>>>>>
>>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda <hasi...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Can we make it a common component, which is not hard coupled with
>>>>>>>> MB. BPS has the same requirement.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Hasitha.
>>>>>>>>
>>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera <
>>>>>>>> asank...@wso2.com> wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> In MB, we have used a coordinator based approach to manage
>>>>>>>>> distributed messaging algorithm in the cluster. Currently Hazelcast 
>>>>>>>>> is used
>>>>>>>>> to elect the coordinator. But one issue we faced with Hazelcast is, 
>>>>>>>>> during
>>>>>>>>> a network segmentation (split brain), Hazelcast can elect two or more
>>>>>>>>> coordinators in the cluster. This affects the correctness of the
>>>>>>>>> distributed messaging algorithm since there are some tables in the 
>>>>>>>>> database
>>>>>>>>> that should only be edited by a single node (i.e. coordinator).
>>>>>>>>>
>>>>>>>>> As a solution to this problem we have implemented minimum node
>>>>>>>>> count based approach [1] to deactivate set of partitioned nodes to 
>>>>>>>>> stop
>>>>>>>>> multiple nodes becoming coordinators until the network segmentation 
>>>>>>>>> issue
>>>>>>>>> is fixed.
>>>>>>>>>
>>>>>>>>> As an alternative solution, we are thinking of implementing an
>>>>>>>>> RDBMS based approach to elect the coordinator node in the cluster. By 
>>>>>>>>> doing
>>>>>>>>> this we can make sure that even during a network segmentation only 
>>>>>>>>> one node
>>>>>>>>> will be elected as the coordinator node since the election is 
>>>>>>>>> happening
>>>>>>>>> through the database.
>>>>>>>>>
>>>>>>>>> The algorithm will use a polling mechanism to check the validity
>>>>>>>>> of the nodes. To make the election algorithm scalable, only the 
>>>>>>>>> coordinator
>>>>>>>>> node will be checking status of all the nodes in the cluster and it 
>>>>>>>>> will
>>>>>>>>> inform other nodes through database when a member is added/left. The 
>>>>>>>>> nodes
>>>>>>>>> will be only checking for the status of the coordinator node. When a 
>>>>>>>>> node
>>>>>>>>> detect that coordinator is invalid it will go for a election to elect 
>>>>>>>>> a new
>>>>>>>>> coordinator.
>>>>>>>>>
>>>>>>>>> We are currently working on a POC to test how this works with MB's
>>>>>>>>> slot based messaging algorithm.
>>>>>>>>>
>>>>>>>>> thoughts?
>>>>>>>>>
>>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Asanka Abeyweera
>>>>>>>>> Senior Software Engineer
>>>>>>>>> WSO2 Inc.
>>>>>>>>>
>>>>>>>>> Phone: +94 712228648
>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>
>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Architecture mailing list
>>>>>>>>> Architecture@wso2.org
>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Hasitha Aravinda,
>>>>>>>> Associate Technical Lead,
>>>>>>>> WSO2 Inc.
>>>>>>>> Email: hasi...@wso2.com
>>>>>>>> Mobile : +94 718 210 200
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Architecture mailing list
>>>>>>>> Architecture@wso2.org
>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Nandika Jayawardana
>>>>>>> WSO2 Inc ; http://wso2.com
>>>>>>> lean.enterprise.middleware
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> Architecture@wso2.org
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Akila Ravihansa Perera
>>>>>> WSO2 Inc.;  http://wso2.com/
>>>>>>
>>>>>> Blog: http://ravihansa3000.blogspot.com
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> Architecture@wso2.org
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Asanka Abeyweera
>>>>> Senior Software Engineer
>>>>> WSO2 Inc.
>>>>>
>>>>> Phone: +94 712228648
>>>>> Blog: a5anka.github.io
>>>>>
>>>>> <https://wso2.com/signature>
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> Architecture@wso2.org
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Asanka Abeyweera
>>> Senior Software Engineer
>>> WSO2 Inc.
>>>
>>> Phone: +94 712228648
>>> Blog: a5anka.github.io
>>>
>>> <https://wso2.com/signature>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> Architecture@wso2.org
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> *Imesh Gunaratne*
>> Software Architect
>> WSO2 Inc: http://wso2.com
>> T: +94 11 214 5345 M: +94 77 374 2057
>> W: https://medium.com/@imesh TW: @imesh
>> lean. enterprise. middleware
>>
>>
>> _______________________________________________
>> Architecture mailing list
>> Architecture@wso2.org
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Asanka Abeyweera
> Senior Software Engineer
> WSO2 Inc.
>
> Phone: +94 712228648
> Blog: a5anka.github.io
>
> <https://wso2.com/signature>
>
> _______________________________________________
> Architecture mailing list
> Architecture@wso2.org
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Ramith Jayasinghe
Technical Lead
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

E: ram...@wso2.com
P: +94 772534930

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RDBMS based coordinator election algorithm for MB

Reply via email to