Hi Thang, ACK from me.
Best regards, Hieu -----Original Message----- From: Thang Duc Nguyen <thang.d.ngu...@dektech.com.au> Sent: Friday, March 4, 2022 3:37 PM To: Minh Hon Chau <minh.c...@dektech.com.au>; Hieu Hong Hoang <hieu.h.ho...@dektech.com.au>; Thien Minh Huynh <thien.m.hu...@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au> Subject: [PATCH 1/1] amf: reboot to recovery PL in split-brain [#3309] The connection between the standby SC and that PL was dropped (disconnect the reconnect ), but that PL still connected with the active SC. It led the standby SC considered that PL absented regardless the connection was established after that. During failover, the standby SC will notify all recorded absent nodes left cluster. It causes PL left cluster from AMF view but still connect to active. This scenario is a kind of split-brain use case and amfd should order PL reboot to recovery the issue. --- src/amf/amfd/main.cc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/amf/amfd/main.cc b/src/amf/amfd/main.cc index 6487a6b54..59e9bf723 100644 --- a/src/amf/amfd/main.cc +++ b/src/amf/amfd/main.cc @@ -436,6 +436,13 @@ static void handle_event_in_failover_state(AVD_EVT *evt) { if (AVD_AVND_STATE_ABSENT == node->node_state && cb->failover_list.find(node->node_info.nodeId) == cb->failover_list.end()) { bool fover_done = false; + if (amfnd_svc_db->find(node->node_info.nodeId) != + amfnd_svc_db->end()) { + LOG_WA("Node %x reconnect before failover," + "order reboot node", node->node_info.nodeId); + LOG_WA("Sending node reboot order"); + avd_d2n_reboot_snd(node); + } /* Check whether this node failover has been performed or not. */ for (const auto &i_su : node->list_of_ncs_su) { -- 2.25.1 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel