There is a case that after AMFD send reboot order due to “out of sync window”. AMFD receive CLM track callback but node is not AMF member yet and delete node from node_id_db. Later AMFND down handler will do nothing since it cannot find the node. When node reboot up, AMFD continue use old msg_id counter send msg to AMFND cause messasge ID mismatch in AMFND then AMFND order reboot itself node.
Solution: in AMFND down handler, if not found node in node_id_db, searching node in node_name_db. If found, continue proceed as normal AMFND down event. --- src/amf/amfd/ndfsm.cc | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc index 9d54df13d..598c57c47 100644 --- a/src/amf/amfd/ndfsm.cc +++ b/src/amf/amfd/ndfsm.cc @@ -775,6 +775,16 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb, AVD_EVT *evt) { nds_mds_ver_db.erase(evt->info.node_id); amfnd_svc_db->erase(evt->info.node_id); + if (node == nullptr) { + for (const auto &value : *node_name_db) { + AVD_AVND *avnd = value.second; + if (avnd->node_info.nodeId == evt->info.node_id) { + node = avnd; + break; + } + } + } + if (node != nullptr) { // Do nothing if the local node goes down. Most likely due to system // shutdown. If node director goes down due to a bug, the AMF watchdog will -- 2.18.0 ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel