Re: [Architecture] RDBMS based coordinator election algorithm for MB

Asanka Abeyweera Mon, 08 Aug 2016 22:41:30 -0700

Hi Anjana,



On Tue, Aug 9, 2016 at 10:13 AM, Anjana Fernando <anj...@wso2.com> wrote:

> Hi,
>
> I just noticed this thread. I've some concerns on this implementations.
> First of all, I don't think the statement mentioned here saying an external
> service such as ZooKeeper doesn't work, is correct. Because, if you have a
> ZK cluster (it is suppose to be used as a cluster), you will not have any
> issues. All the nodes have a list of endpoints to all the ZK nodes and they
> connect to those, and ZK has a quorum based mechanism in keeping its state.
> So this makes sure, all the users have a single version of the ZK data.
>

> Also, I guess the fundamental problem here in the split brain situation
> is, we need one external entity taking the decision (e.g. ZK cluster),
> because it should have oversight to the whole environment. I don't see how
> this RDBMS mechanism would solve that. Because, what it gives is a central
> location of state persistence. But the decisions of making who is the
> leader is taken by the users, which can be problematic. Where when we have
> a network partition scenario in that occasion, two groups of users will be
> overriding each other in the centralized RDBMS data repeatedly and it will
> go on forever, where in the ZK situation, there will be only one leader,
> and the guys in the other partition simply won't be able to reach the
> leader, until its network issues are sorted.
>

Yes, If we have an external entity which uses a quorum based approach like
ZooKeeper this issue is not there. With RDBMS approach we are trying to
avoid the requirement to have an external entity for leader election.

With the implementation we are trying, multiple nodes will not be
overriding each other repeatedly during a network partition. The reason is
we are not using network connectivity between nodes in deciding the leader.

I will explain the algorithm we have used. This is bit similar to the RAFT
algorithm[1]. In our implementation we have used a dedicated table for
leader election with three main columns.

   1. Anchor (Primary key)
   2. Node ID
   3. Last heartbeat

For a node to become the leader, it has to create an entry in the table
with Anchor=1. Since Anchor is a primary key, only one node will succeed in
creating one. All the other nodes will become ordinary nodes. Ordinary
nodes will be periodically query the validity of the leader by looking at
the heartbeat value. If the heartbeat is older than a configured value, it
will delete the leader entry and will try to create the leader entry. The
leader node will always try to keep its state by updating the heartbeat
value periodically. In summary a node can be in one of the following three
states,

   - Election Node
   - Will try to create the leader entry. If succeed will become the
      leader, otherewise become an ordinary node.
   - Leader Node
      - Keep updating heartbeat periodically.
      - If the entry is deleted by a node, it will become an election node.
   - Ordinary Node
      - Keep checking the validity of the leader by looking at the
      heartbeat value.
      - When the leader entry becomes invalid it will remove the leader
      entry from table ( with a back-off time) and become an election node.


[1] https://en.wikipedia.org/wiki/Raft_(computer_science)


> So I also think, as Imesh mentioned, creating a coordination algorithm
> from scratch may not be a wise decision, and we should use proven
> technology/libraries to do that. And on a side note, the main reason for
> not using ZK for this earlier was because of the hassle of bringing up
> another set of servers when our products are clustered, and we knew that
> the split brain scenario will occur in HZ, but maybe now we should give an
> extension point probably to plug into an external service if for some
> applications the split brain scenario is a show stopper.
>
> Cheers,
> Anjana.
>
> On Tue, Aug 9, 2016 at 4:45 AM, Kasun Indrasiri <ka...@wso2.com> wrote:
>
>> Hi Ramith/Asanka,
>>
>> ESB/DSS natask impl is also based on HZ. I guess if this model works for
>> the MB, we should make it generic for all such coordination requirements.
>> (Thinking about using this in ESB 5.1)?
>>
>> On Fri, Aug 5, 2016 at 3:58 AM, Sajini De Silva <saj...@wso2.com> wrote:
>>
>>> Hi Maninda,
>>>
>>> Locking the  database will  be supported by some databases but there
>>> will be huge performance impact. So  we  cannot use that approach. If this
>>> approach  cannot be adapted the only thing we can do is queue wise load
>>> balancing through slot coordinator. But in this case we cannot guarantee
>>> that load balance will be equally distributed since some queues can be
>>>  loaded while some will be idle. Also we cannot have multiple slot
>>> coordinators having same queue as it may cause several complications such
>>> as, same slot is assigned to two nodes by different subscribers, message
>>> duplication etc. Actually this slot architecture was discussed in a
>>> separate mail thread before it is implemented.
>>>
>>> Thanks
>>>
>>> On Fri, Aug 5, 2016 at 3:12 PM, Maninda Edirisooriya <mani...@wso2.com>
>>> wrote:
>>>
>>>> Hi Sajini,
>>>>
>>>> Yes that is what I meant. As the number of slots are proportional to
>>>> the number of messages passing through the cluster, slot delivery should
>>>> not be handled by the coordinator when there is only one coordinator in the
>>>> cluster which is a bottleneck for scaling messages passing through the
>>>> cluster. If there is only a single coordinator, it should handle operations
>>>> that are not proportional to messages throughput of the cluster. Then only
>>>> the tasks like subscriber adding / removing should be handled by the
>>>> coordinator. As this is not the current implementation, we can consider
>>>> multiple coordinator approach. Then the number of coordinators should be
>>>> scalable with the message throughout. I am not sure whether locking the
>>>> database per transaction would achieve this coordinator scalability in the
>>>> multiple coordinator implementation.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> *Maninda Edirisooriya*
>>>> Senior Software Engineer
>>>>
>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>
>>>> *Blog* : http://maninda.blogspot.com/
>>>> *E-mail* : mani...@wso2.com
>>>> *Skype* : @manindae
>>>> *Twitter* : @maninda
>>>>
>>>> On Fri, Aug 5, 2016 at 2:42 PM, Sajini De Silva <saj...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi Maninda,
>>>>>
>>>>> On Fri, Aug 5, 2016 at 2:28 PM, Maninda Edirisooriya <mani...@wso2.com
>>>>> > wrote:
>>>>>
>>>>>> @Sajini,
>>>>>>
>>>>>> But the number of slots are proportional to the number of messages
>>>>>> pass through the MB which needs to be handled by the coordinator. That is
>>>>>> what I meant by "information related to meta data of messages pass 
>>>>>> through
>>>>>> a single coordinator". Ideally after the senders and receivers are
>>>>>> subscribed to the cluster, coordinator should have nothing to do until 
>>>>>> they
>>>>>> are removed or changed.
>>>>>>
>>>>>
>>>>> Even though it is possible to have multiple coordinators after having
>>>>> en effort (Lock the database for a whole transaction or the work load
>>>>> distribution as described by Ramith) , coordinator may have different work
>>>>> to do other than subscriber adding and removing. As I said earlier our MB
>>>>> message distribution system is based on slot architecture and slots will
>>>>> managed by the coordinator. You can read [1] to understand more about slot
>>>>> architecture in MB.
>>>>>
>>>>> [1] http://sajinid.blogspot.com/2015/03/wso2-message-broker-
>>>>> 300-slot-based.html
>>>>>
>>>>> Thanks
>>>>>
>>>>>>
>>>>>> @Ramith,
>>>>>>
>>>>>> +1 for multiple coordinators by partitioning the cluster which
>>>>>> maintains the simplicity and correctness of the algorithm than 
>>>>>> compromising
>>>>>> simplicity with a less important factor like "delivering a good mix of
>>>>>> messages".
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> *Maninda Edirisooriya*
>>>>>> Senior Software Engineer
>>>>>>
>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>
>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>> *E-mail* : mani...@wso2.com
>>>>>> *Skype* : @manindae
>>>>>> *Twitter* : @maninda
>>>>>>
>>>>>> On Fri, Aug 5, 2016 at 2:05 PM, Ramith Jayasinghe <ram...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> @Imesh,
>>>>>>>  We can prove that doing leader election using a lib (where we
>>>>>>> maintain cluster state in another place, a.k.a DB) will not solve our
>>>>>>> original problem (this also relates to our past experience with both the
>>>>>>> zookeeper and hazelcast).
>>>>>>>  We can make this implementation a common component if other
>>>>>>> products have a use of it. BPS might be able to use it since their data 
>>>>>>> is
>>>>>>> also in the database.
>>>>>>>
>>>>>>> @Malaka:
>>>>>>>  VFS scenario can't be solved by relying on this
>>>>>>> implementation. why? you can have the access to DB but not VFS
>>>>>>> resources/file (and vice versa). this is the same point we explained 
>>>>>>> before.
>>>>>>>  in Ntask implementation,  if tasks are stored in the database then
>>>>>>> using this implementation makes sense.
>>>>>>>
>>>>>>>
>>>>>>> @Akila,
>>>>>>>  implementing (distributed) a queue algorithm is non-trivial. Having
>>>>>>> one coordinator (single source of truth) keeps things simple hence it's 
>>>>>>> a
>>>>>>> conscious design decision we agreed during the initial stages. However,
>>>>>>> possible extension to this scheme is to have multiple coordinators ( 
>>>>>>> each
>>>>>>> responsible for coordinating a subset of  queues in the cluster), that 
>>>>>>> will
>>>>>>> be some what similar to kafka.
>>>>>>> Even if its preferable to have no coordinator at-all, (to decide how
>>>>>>> messages are disseminated in the cluster)  that will make us give up
>>>>>>> desired behaviour such as delivering a good mix of messages (from 
>>>>>>> different
>>>>>>> publishers) to consumers in a cluster. having said this, we have an 
>>>>>>> ongoing
>>>>>>> research on how to improve the algorithm and we like to try out both 
>>>>>>> these
>>>>>>> approaches.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 5, 2016 at 1:31 PM, Malaka Silva <mal...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The same issue with Hazelcast can be experienced with ESB inbounds
>>>>>>>> (running on top of NTASK) and VFS distribution locks.
>>>>>>>>
>>>>>>>> The idea of only single worker works at a given time breaks if
>>>>>>>> there is a Hazelcast heart beat fails. This will make two workers to 
>>>>>>>> work
>>>>>>>> in parallel.
>>>>>>>>
>>>>>>>> Also with distributed locking there is no guarantee that file is
>>>>>>>> only process only by one worker.
>>>>>>>>
>>>>>>>> So in the case of network fail
>>>>>>>>  with DB
>>>>>>>> make sense to stop processing until it's recovered.
>>>>>>>>  Also making this component generic ESB can reuse.
>>>>>>>>
>>>>>>>> On Fri, Aug 5, 2016 at 9:21 AM, Asitha Nanayakkara <asi...@wso2.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi Imesh,
>>>>>>>>>
>>>>>>>>> On Fri, Aug 5, 2016 at 7:33 AM, Imesh Gunaratne <im...@wso2.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 5, 2016 at 7:31 AM, Imesh Gunaratne <im...@wso2.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> You can see here [3] how K8S has implemented leader election
>>>>>>>>>>> feature for the products deployed on top of that to utilize.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Correction: Please refer [4].
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 4, 2016 at 7:27 PM, Asanka Abeyweera <
>>>>>>>>>>>> asank...@wso2.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Imesh,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We are not implementing this to overcome a limitation in the
>>>>>>>>>>>>> coordination algorithm available in the Hazlecast. We are 
>>>>>>>>>>>>> implementing this
>>>>>>>>>>>>> since we need an RDBMS based coordination algorithm (not a 
>>>>>>>>>>>>> network based
>>>>>>>>>>>>> algorithm).
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Are you saying that database connections do not use the same
>>>>>>>>>>> network used by Hazelcast?
>>>>>>>>>>> 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The reason is, a network based election algorithm will always
>>>>>>>>>>>>> elect multiple leaders when the network is partitioned. But if we 
>>>>>>>>>>>>> use a
>>>>>>>>>>>>> RDBMS based algorithm this will not happen.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> I do not think your argument is correct. If there is a problem
>>>>>>>>>>> with the network, it may apply to both Hazelcast based solution and
>>>>>>>>>>> database based solution.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Yes, if the same network interface is used network partion will
>>>>>>>>> cause all types of connections to be partitioned. But user can use 
>>>>>>>>> multiple
>>>>>>>>> network interfaces for database, Hazelcast and thrift.
>>>>>>>>>
>>>>>>>>> Following is the scenario we are trying to solve in MB.
>>>>>>>>>
>>>>>>>>> In MB all the details related to messages, subscriptions, queues,
>>>>>>>>> topics etc are stored in database. And we operate depending on that
>>>>>>>>> information. If the MB node can't connect to the database that means 
>>>>>>>>> the
>>>>>>>>> node is ineffective in the cluster until it can make a database 
>>>>>>>>> connection.
>>>>>>>>>
>>>>>>>>> We have seen instances where Hazelcast cluster get partitioned for
>>>>>>>>> some time period in networks, Reasons were,
>>>>>>>>>
>>>>>>>>>    1. Due to heavy load Hazelcast couldn't process or send (some
>>>>>>>>>    times both) hearbeats, hence a network partition for Hazelcast 
>>>>>>>>> cluster
>>>>>>>>>    2. An actual network partition of Hazelcast cluster
>>>>>>>>>
>>>>>>>>> In both scenarios the database connection was working. In that
>>>>>>>>> case we get two coordinators elected through Hazelcast and working on 
>>>>>>>>> the
>>>>>>>>> same database to deliver the messages. this leads to inconsistencies 
>>>>>>>>> in the
>>>>>>>>> cluster behavior (for instances duplicate message delivery, corrupred
>>>>>>>>> subscription states etc) .
>>>>>>>>>
>>>>>>>>> Since the point of interest for MB is the database, we decided to
>>>>>>>>> do the coordinator election through database as well. If the node 
>>>>>>>>> can't
>>>>>>>>> connect to the database, then the MB won't operate anyway.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Asitha
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> [4] http://blog.kubernetes.io/2016/01/simple-leader-election
>>>>>>>>>>> -with-Kubernetes.html
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Aug 4, 2016 at 7:16 PM, Imesh Gunaratne <
>>>>>>>>>>>>> im...@wso2.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Asanka,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do we really need to implement a leader election algorithm on
>>>>>>>>>>>>>> our own? AFAIU this is a complex problem which has been already 
>>>>>>>>>>>>>> solved by
>>>>>>>>>>>>>> several algorithms [1]. IMO it would be better to go ahead with 
>>>>>>>>>>>>>> an existing
>>>>>>>>>>>>>> well established implementation on etcd [1] or Consul [2].
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Those provide HTTP APIs for clients to make leader election
>>>>>>>>>>>>>> calls. [3] is a client library written in Node.js for etcd based 
>>>>>>>>>>>>>> leader
>>>>>>>>>>>>>> election.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] https://www.projectcalico.org/using-etcd-for-elections
>>>>>>>>>>>>>> [2] https://www.consul.io/docs/guides/leader-election.html
>>>>>>>>>>>>>> [3] https://www.npmjs.com/package/etcd-leader
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Aug 3, 2016 at 5:12 PM, Asanka Abeyweera <
>>>>>>>>>>>>>> asank...@wso2.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Maninda,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Since we are using RDBMS to poll the node status, the
>>>>>>>>>>>>>>> cluster will not end up in situation 1,2 or 3. With this 
>>>>>>>>>>>>>>> approach we
>>>>>>>>>>>>>>> consider a node unreachable when it cannot access the database. 
>>>>>>>>>>>>>>> Therefore
>>>>>>>>>>>>>>> an unreachable node can never be the leader.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As you have mentioned, we are currently using the RDBMS as
>>>>>>>>>>>>>>> an atomic global variable to create the coordinator entry.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Aug 2, 2016 at 5:22 PM, Maninda Edirisooriya <
>>>>>>>>>>>>>>> mani...@wso2.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Asanka,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As I understand the accuracy of electing the leader
>>>>>>>>>>>>>>>> correctly is dependent on the election mechanism with RDBMS 
>>>>>>>>>>>>>>>> because there
>>>>>>>>>>>>>>>> can be edge cases like,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. Unreachable leader activates during the election
>>>>>>>>>>>>>>>> process: Then who becomes the leader?
>>>>>>>>>>>>>>>> 2. The elected leader becomes unreachable before the
>>>>>>>>>>>>>>>> election is completed: Then will there be a situation where 
>>>>>>>>>>>>>>>> there is no
>>>>>>>>>>>>>>>> leader?
>>>>>>>>>>>>>>>> 3. A leader and a set of nodes are disconnected from the
>>>>>>>>>>>>>>>> other part of the cluster and while the leader is trying to 
>>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>>> unreachable members other part is calling an election to make 
>>>>>>>>>>>>>>>> a leader: Who
>>>>>>>>>>>>>>>> will win?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> RDBMS based election algorithm should handle such cases
>>>>>>>>>>>>>>>> without bringing the cluster to an inconsistent state or dead 
>>>>>>>>>>>>>>>> lock in all
>>>>>>>>>>>>>>>> concurrent cases. If all these kind of cases cannot be handled 
>>>>>>>>>>>>>>>> isn't it
>>>>>>>>>>>>>>>> better to keep the current hazelcast clustering and use the 
>>>>>>>>>>>>>>>> RDBMS only to
>>>>>>>>>>>>>>>> handle the split brain scenario? In other words when a new 
>>>>>>>>>>>>>>>> hazelcast leader
>>>>>>>>>>>>>>>> is elected it should be updated in the RDBMS. If another split 
>>>>>>>>>>>>>>>> party has
>>>>>>>>>>>>>>>> already elected a leader, the node who is going to write it to 
>>>>>>>>>>>>>>>> RDBMS should
>>>>>>>>>>>>>>>> avoid updating it. Simply, the RDBMS can be used as an atomic 
>>>>>>>>>>>>>>>> global
>>>>>>>>>>>>>>>> variable to keep the leader name by modifying the hazelcast 
>>>>>>>>>>>>>>>> clustering.
>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Maninda Edirisooriya*
>>>>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>>>>>>>>>>> *E-mail* : mani...@wso2.com
>>>>>>>>>>>>>>>> *Skype* : @manindae
>>>>>>>>>>>>>>>> *Twitter* : @maninda
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 4:38 PM, Asanka Abeyweera <
>>>>>>>>>>>>>>>> asank...@wso2.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Akila,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let me explain the issue in a different way. Let's assume
>>>>>>>>>>>>>>>>> the MB nodes are using two different network interfaces for 
>>>>>>>>>>>>>>>>> Hazelcast
>>>>>>>>>>>>>>>>> communication and database communication. With such a 
>>>>>>>>>>>>>>>>> configuration, there
>>>>>>>>>>>>>>>>> can be failures only in the network interface used for 
>>>>>>>>>>>>>>>>> Hazelcast
>>>>>>>>>>>>>>>>> communication in some nodes. When this happens, there will be 
>>>>>>>>>>>>>>>>> two or more
>>>>>>>>>>>>>>>>> Hazelcast clusters due to the network segmentation, and as a 
>>>>>>>>>>>>>>>>> result there
>>>>>>>>>>>>>>>>> will be multiple coordinators. Since every node still have 
>>>>>>>>>>>>>>>>> access to the
>>>>>>>>>>>>>>>>> database, multiple coordinators can affect the correctness of 
>>>>>>>>>>>>>>>>> the data
>>>>>>>>>>>>>>>>> stored in the DB. But if we used a RDBMS based approach we 
>>>>>>>>>>>>>>>>> won't have
>>>>>>>>>>>>>>>>> multiple coordinators due to a network partition in 
>>>>>>>>>>>>>>>>> Hazelcast. This is one
>>>>>>>>>>>>>>>>> advantage we get from this approach.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Even when we use Zookeeper or RAFT the same issue will be
>>>>>>>>>>>>>>>>> there since we are using different interfaces for Hazelcast 
>>>>>>>>>>>>>>>>> communication
>>>>>>>>>>>>>>>>> and DB communication.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 2:56 PM, Akila Ravihansa Perera <
>>>>>>>>>>>>>>>>> raviha...@wso2.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What's the advantage of using RDBMS (even as an
>>>>>>>>>>>>>>>>>> alternative) to implement a leader/coordinator election? If 
>>>>>>>>>>>>>>>>>> the network
>>>>>>>>>>>>>>>>>> connection to DB fails then this will be a single point of 
>>>>>>>>>>>>>>>>>> failure. I don't
>>>>>>>>>>>>>>>>>> think we can scale RDBMS instances and expect the election 
>>>>>>>>>>>>>>>>>> algorithm to
>>>>>>>>>>>>>>>>>> work. That would be reducing this problem to another problem 
>>>>>>>>>>>>>>>>>> (electing
>>>>>>>>>>>>>>>>>> coordinator RDBMS instance).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> IMHO it would be better to look at Zookeeper Atomic
>>>>>>>>>>>>>>>>>> Broadcast (ZAB) [1] or RAFT leader election [2] algorithms 
>>>>>>>>>>>>>>>>>> which have
>>>>>>>>>>>>>>>>>> already proven results.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1] https://cwiki.apache.org/c
>>>>>>>>>>>>>>>>>> onfluence/display/ZOOKEEPER/Zab1.0
>>>>>>>>>>>>>>>>>> [2] http://libraft.io/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:42 PM, Nandika Jayawardana <
>>>>>>>>>>>>>>>>>> nand...@wso2.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +1 to make it a common component . We have the
>>>>>>>>>>>>>>>>>>> clustering implementation for BPEL component based on 
>>>>>>>>>>>>>>>>>>> hazelcast.  If the
>>>>>>>>>>>>>>>>>>> coordination is available at RDBMS level, we can remove 
>>>>>>>>>>>>>>>>>>> hazelcast
>>>>>>>>>>>>>>>>>>> dependancy.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>> Nandika
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 1:28 PM, Hasitha Aravinda <
>>>>>>>>>>>>>>>>>>> hasi...@wso2.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Can we make it a common component, which is not hard
>>>>>>>>>>>>>>>>>>>> coupled with MB. BPS has the same requirement.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Hasitha.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jul 28, 2016 at 9:47 AM, Asanka Abeyweera <
>>>>>>>>>>>>>>>>>>>> asank...@wso2.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In MB, we have used a coordinator based approach to
>>>>>>>>>>>>>>>>>>>>> manage distributed messaging algorithm in the cluster. 
>>>>>>>>>>>>>>>>>>>>> Currently Hazelcast
>>>>>>>>>>>>>>>>>>>>> is used to elect the coordinator. But one issue we faced 
>>>>>>>>>>>>>>>>>>>>> with Hazelcast is,
>>>>>>>>>>>>>>>>>>>>> during a network segmentation (split brain), Hazelcast 
>>>>>>>>>>>>>>>>>>>>> can elect two or
>>>>>>>>>>>>>>>>>>>>> more coordinators in the cluster. This affects the 
>>>>>>>>>>>>>>>>>>>>> correctness of the
>>>>>>>>>>>>>>>>>>>>> distributed messaging algorithm since there are some 
>>>>>>>>>>>>>>>>>>>>> tables in the database
>>>>>>>>>>>>>>>>>>>>> that should only be edited by a single node (i.e. 
>>>>>>>>>>>>>>>>>>>>> coordinator).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> As a solution to this problem we have implemented
>>>>>>>>>>>>>>>>>>>>> minimum node count based approach [1] to deactivate set 
>>>>>>>>>>>>>>>>>>>>> of partitioned
>>>>>>>>>>>>>>>>>>>>> nodes to stop multiple nodes becoming coordinators until 
>>>>>>>>>>>>>>>>>>>>> the network
>>>>>>>>>>>>>>>>>>>>> segmentation issue is fixed.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> As an alternative solution, we are thinking of
>>>>>>>>>>>>>>>>>>>>> implementing an RDBMS based approach to elect the 
>>>>>>>>>>>>>>>>>>>>> coordinator node in the
>>>>>>>>>>>>>>>>>>>>> cluster. By doing this we can make sure that even during 
>>>>>>>>>>>>>>>>>>>>> a network
>>>>>>>>>>>>>>>>>>>>> segmentation only one node will be elected as the 
>>>>>>>>>>>>>>>>>>>>> coordinator node since
>>>>>>>>>>>>>>>>>>>>> the election is happening through the database.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The algorithm will use a polling mechanism to check
>>>>>>>>>>>>>>>>>>>>> the validity of the nodes. To make the election algorithm 
>>>>>>>>>>>>>>>>>>>>> scalable, only
>>>>>>>>>>>>>>>>>>>>> the coordinator node will be checking status of all the 
>>>>>>>>>>>>>>>>>>>>> nodes in the
>>>>>>>>>>>>>>>>>>>>> cluster and it will inform other nodes through database 
>>>>>>>>>>>>>>>>>>>>> when a member is
>>>>>>>>>>>>>>>>>>>>> added/left. The nodes will be only checking for the 
>>>>>>>>>>>>>>>>>>>>> status of the
>>>>>>>>>>>>>>>>>>>>> coordinator node. When a node detect that coordinator is 
>>>>>>>>>>>>>>>>>>>>> invalid it will go
>>>>>>>>>>>>>>>>>>>>> for a election to elect a new coordinator.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We are currently working on a POC to test how this
>>>>>>>>>>>>>>>>>>>>> works with MB's slot based messaging algorithm.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> thoughts?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [1] https://wso2.org/jira/browse/MB-1664
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Asanka Abeyweera
>>>>>>>>>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Phone: +94 712228648
>>>>>>>>>>>>>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/
>>>>>>>>>>>>>>>>>>>>> mailman/listinfo/architecture
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Hasitha Aravinda,
>>>>>>>>>>>>>>>>>>>> Associate Technical Lead,
>>>>>>>>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>>>>>>>>> Email: hasi...@wso2.com
>>>>>>>>>>>>>>>>>>>> Mobile : +94 718 210 200
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/
>>>>>>>>>>>>>>>>>>>> mailman/listinfo/architecture
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Nandika Jayawardana
>>>>>>>>>>>>>>>>>>> WSO2 Inc ; http://wso2.com
>>>>>>>>>>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/
>>>>>>>>>>>>>>>>>>> mailman/listinfo/architecture
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Akila Ravihansa Perera
>>>>>>>>>>>>>>>>>> WSO2 Inc.;  http://wso2.com/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Blog: http://ravihansa3000.blogspot.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/
>>>>>>>>>>>>>>>>>> mailman/listinfo/architecture
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Asanka Abeyweera
>>>>>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Phone: +94 712228648
>>>>>>>>>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/
>>>>>>>>>>>>>>>>> mailman/listinfo/architecture
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Asanka Abeyweera
>>>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Phone: +94 712228648
>>>>>>>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Imesh Gunaratne*
>>>>>>>>>>>>>> Software Architect
>>>>>>>>>>>>>> WSO2 Inc: http://wso2.com
>>>>>>>>>>>>>> T: +94 11 214 5345 M: +94 77 374 2057
>>>>>>>>>>>>>> W: https://medium.com/@imesh TW: @imesh
>>>>>>>>>>>>>> lean. enterprise. middleware
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Asanka Abeyweera
>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Phone: +94 712228648
>>>>>>>>>>>>> Blog: a5anka.github.io
>>>>>>>>>>>>>
>>>>>>>>>>>>> <https://wso2.com/signature>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Ramith Jayasinghe
>>>>>>>>>>>> Technical Lead
>>>>>>>>>>>> WSO2 Inc., http://wso2.com
>>>>>>>>>>>> lean.enterprise.middleware
>>>>>>>>>>>>
>>>>>>>>>>>> E: ram...@wso2.com
>>>>>>>>>>>> P: +94 772534930
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Imesh Gunaratne*
>>>>>>>>>>> Software Architect
>>>>>>>>>>> WSO2 Inc: http://wso2.com
>>>>>>>>>>> T: +94 11 214 5345 M: +94 77 374 2057
>>>>>>>>>>> W: https://medium.com/@imesh TW: @imesh
>>>>>>>>>>> lean. enterprise. middleware
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Imesh Gunaratne*
>>>>>>>>>> Software Architect
>>>>>>>>>> WSO2 Inc: http://wso2.com
>>>>>>>>>> T: +94 11 214 5345 M: +94 77 374 2057
>>>>>>>>>> W: https://medium.com/@imesh TW: @imesh
>>>>>>>>>> lean. enterprise. middleware
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Architecture mailing list
>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Asitha Nanayakkara* <http://asitha.github.io/>
>>>>>>>>> Senior Software Engineer
>>>>>>>>> WSO2, Inc. <http://wso2.com/>
>>>>>>>>> Mob: +94 77 853 0682
>>>>>>>>> [image: https://wso2.com/signature] <https://wso2.com/signature>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Architecture mailing list
>>>>>>>>> Architecture@wso2.org
>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>>
>>>>>>>> Malaka Silva
>>>>>>>> Senior Technical Lead
>>>>>>>> M: +94 777 219 791
>>>>>>>> Tel : 94 11 214 5345
>>>>>>>> Fax :94 11 2145300
>>>>>>>> Skype : malaka.sampath.silva
>>>>>>>> LinkedIn : http://www.linkedin.com/pub/malaka-silva/6/33/77
>>>>>>>> Blog : http://mrmalakasilva.blogspot.com/
>>>>>>>>
>>>>>>>> WSO2, Inc.
>>>>>>>> lean . enterprise . middleware
>>>>>>>> https://wso2.com/signature
>>>>>>>> http://www.wso2.com/about/team/malaka-silva/
>>>>>>>> <http://wso2.com/about/team/malaka-silva/>
>>>>>>>> https://store.wso2.com/store/
>>>>>>>>
>>>>>>>> Don't make Trees rare, we should keep them with care
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Architecture mailing list
>>>>>>>> Architecture@wso2.org
>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ramith Jayasinghe
>>>>>>> Technical Lead
>>>>>>> WSO2 Inc., http://wso2.com
>>>>>>> lean.enterprise.middleware
>>>>>>>
>>>>>>> E: ram...@wso2.com
>>>>>>> P: +94 772534930
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Architecture mailing list
>>>>>>> Architecture@wso2.org
>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sajini De SIlva
>>>>> Senior Software Engineer; WSO2 Inc.; http://wso2.com ,
>>>>> Email: saj...@wso2.com
>>>>> Blog: http://sajinid.blogspot.com/
>>>>> Git hub profile: https://github.com/sajinidesilva
>>>>>
>>>>> Phone: +94 712797729
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Sajini De SIlva
>>> Senior Software Engineer; WSO2 Inc.; http://wso2.com ,
>>> Email: saj...@wso2.com
>>> Blog: http://sajinid.blogspot.com/
>>> Git hub profile: https://github.com/sajinidesilva
>>>
>>> Phone: +94 712797729
>>>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> Architecture@wso2.org
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Kasun Indrasiri
>> Director, Integration Technologies
>> WSO2, Inc.; http://wso2.com
>> lean.enterprise.middleware
>>
>> cell: +1 650 450 2293
>> Blog : http://kasunpanorama.blogspot.com/
>>
>
>
>
> --
> *Anjana Fernando*
> Associate Director / Architect
> WSO2 Inc. | http://wso2.com
> lean . enterprise . middleware
>



-- 
Asanka Abeyweera
Senior Software Engineer
WSO2 Inc.

Phone: +94 712228648
Blog: a5anka.github.io

<https://wso2.com/signature>

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RDBMS based coordinator election algorithm for MB

Reply via email to