Hi All, We are currently working on $subject.
The RDBMS based coordinator election approach has previously been adopted for MB (and is the default configuration for MB 3.2.0) [1, 2]. It was then extended to be a common component [3], now available at [4]. Support for coordination is available with the following in EI/ESB: - Inbound Endpoints (eg:- JMS) - Scheduled Tasks - Message Processors *Current Implementation:* In the current implementation, coordination for the above (based on ntask) happens via the NTaskTaskManager introduced in carbon-mediation. In ntask, Hazelcast is used for coordinator election which happens via the ClusterGroupCommunicator, used by the ntask.core.TaskManager, where the oldest member is elected as the leader (coordinator). *Proposed Implementation:* The proposed implementation would introduce an RDBMS based ClusterGroupCommunicator in ntask, which would introduce the common component [4] to use the RDBMS based approach to elect the leader/coordinator. The distributed map maintained at the original ClusterGroupCommunicator would not be maintained here. The IExecutorService (the Hazelcast distributed ExecutorService), used with TaskCalls will not be replaced for the time being. The current IExecutorService related implementation requires the retrieval of the Member upon specifying the Hazelcast node ID. Since we will not be maintaining a map, the identification by ID would have to be done by retrieving and iterating through the members from the Hazelcast cluster when required but it would be a reliable approach to retrieve available members only. In partition scenarios there could be a situation where the Hazelcast leader assumes some members have left the cluster while in fact they have not, but the RDBMS leader would maintain this information correctly. While a mapping between RDBMS node IDs and Hazelcast node IDs can be used to prevent the rescheduling of tasks on members that have not "actually" left, there will be a limitation on scheduling tasks on all members since the members that belong to other partitions can not be accessed. Since this would happen only in an error scenario, one approach would be to reschedule tasks only on the members belonging to the partition the coordinator belongs to. This approach could be adapted while ensuring that we are not rescheduling tasks which are already scheduled on an available member. Another approach would be to introduce a mechanism to communicate with members based on their RDBMS node IDs. However this could require significant changes to be introduced including communication also happening through the database. Feedback would be highly appreciated. [1] Mail: "[Architecture] RDBMS based coordinator election algorithm for MB" [2] https://github.com/wso2/andes/pull/668 [3] Mail: Implementing a RDBMS based leader election mechanism [4] https://github.com/wso2/carbon-coordination Thank you, Maryam -- *Maryam Ziyad Mohamed* Software Engineer | WSO2 [image: http://wso2.com/signature] <http://wso2.com/signature>
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture