Hi All,

We are currently working on $subject.

The RDBMS based coordinator election approach has previously been adopted
for MB (and is the default configuration for MB 3.2.0) [1, 2]. It was then
extended to be a common component [3], now available at [4].

Support for coordination is available with the following in EI/ESB:

   - Inbound Endpoints (eg:- JMS)
   - Scheduled Tasks
   - Message Processors


*Current Implementation:*

In the current implementation, coordination for the above (based on ntask)
happens via the NTaskTaskManager introduced in carbon-mediation.


​
In ntask, ​​​Hazelcast is used for coordinator election which happens via
the ClusterGroupCommunicator, used by the ntask.core.TaskManager, where the
oldest member is elected as the leader (coordinator).


*Proposed Implementation:*

The proposed implementation would introduce an RDBMS based
ClusterGroupCommunicator in ntask, which would introduce the common
component [4] to use the RDBMS based approach to elect the
leader/coordinator. The distributed map maintained at the original
ClusterGroupCommunicator would not be maintained here.


​
The IExecutorService (the Hazelcast distributed ExecutorService), used with
TaskCalls will not be replaced for the time being. The current
IExecutorService related implementation requires the retrieval of the
Member upon specifying the Hazelcast node ID. Since we will not be
maintaining a map, the identification by ID would have to be done by
retrieving and iterating through the members from the Hazelcast cluster
when required but it would be a reliable approach to retrieve available
members only.

In partition scenarios there could be a situation where the Hazelcast
leader assumes some members have left the cluster while in fact they have
not, but the RDBMS leader would maintain this information correctly. While
a mapping between RDBMS node IDs and Hazelcast node IDs can be used to
prevent the rescheduling of tasks on members that have not "actually" left,
there will be a limitation on scheduling tasks on all members since the
members that belong to other partitions can not be accessed.

Since this would happen only in an error scenario, one approach would be to
reschedule tasks only on the members belonging to the partition the
coordinator belongs to. This approach could be adapted while ensuring that
we are not rescheduling tasks which are already scheduled on an available
member.

Another approach would be to introduce a mechanism to communicate with
members based on their RDBMS node IDs. However this could require
significant changes to be introduced including communication also happening
through the database.

Feedback would be highly appreciated.

[1] Mail: "[Architecture] RDBMS based coordinator election algorithm for MB"
[2] https://github.com/wso2/andes/pull/668
[3] Mail: Implementing a RDBMS based leader election mechanism
[4] https://github.com/wso2/carbon-coordination

Thank you,
Maryam

-- 
*Maryam Ziyad Mohamed*
Software Engineer | WSO2
[image: http://wso2.com/signature] <http://wso2.com/signature>
_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to