[tickets] [opensaf:tickets] #3015 Amf: node can not join the cluster

Thang Duc Nguyen via Opensaf-tickets Fri, 01 Mar 2019 01:00:00 -0800

In syslog
**2019-02-27 06:48:20.220 SC-1 osafamfd[285]: NO Node 'PL-4' left the cluster
2019-02-27 06:48:20.241 SC-1 osafimmnd[229]: NO Implementer disconnected 6 
<274, 2010f> (OpenSafImmPBE)
2019-02-27 06:48:20.244 SC-1 osafimmnd[229]: WA Postponing hard delete of admin 
owner with id:2 when imm is not writable state
2019-02-27 06:48:20.247 SC-1 osafimmnd[229]: WA Failed in hard remove of admin 
owner 2
2019-02-27 06:48:20.281 SC-1 osafamfd[285]: WA avd_msg_sanity_chk: invalid node 
ID (2040f)
2019-02-27 06:48:20.284 SC-1 osafamfd[285]: WA avd_msg_sanity_chk: invalid msg 
id 31, msg type 8, from 2040f should be 1
2019-02-27 06:48:20.285 SC-1 osafamfd[285]: WA avd_msg_sanity_chk: reboot node 
2040f to recover it
2019-02-27 06:48:20.285 SC-1 osafamfd[285]: ER ncsmds_api failed 2
2019-02-27 06:48:20.287 SC-1 osafamfd[285]: WA avd_msg_sanity_chk: invalid msg 
id 32, msg type 8, from 2040f should be 1
2019-02-27 06:48:20.288 SC-1 osafamfd[285]: WA avd_msg_sanity_chk: reboot node 
2040f to recover it
2019-02-27 06:48:20.288 SC-1 osafamfd[285]: ER ncsmds_api failed 2
2019-02-27 06:48:20.290 SC-1 osafamfd[285]: WA avd_msg_sanity_chk: invalid msg 
id 33, msg type 4, from 2040f should be 1
2019-02-27 06:48:20.290 SC-1 osafamfd[285]: WA avd_msg_sanity_chk: reboot node 
2040f to recover it
2019-02-27 06:48:20.291 SC-1 osafamfd[285]: ER ncsmds_api failed 2
2019-02-27 06:48:20.293 SC-1 osafamfd[285]: NO Node 'PL-4' left the cluster
2019-02-27 06:48:20.325 SC-1 osafimmloadd: logtrace: trace enabled to file 
'osafimmnd', mask=0x0
2019-02-27 06:48:20.326 SC-1 osafimmloadd: NO Sync starting
2019-02-27 06:48:20.603 SC-1 osafimmloadd: IN Synced 361 objects in total
2019-02-27 06:48:20.603 SC-1 osafimmnd[229]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 19197
2019-02-27 06:48:20.603 SC-1 osafimmnd[229]: WA Removing admin owner 2 
safImmService (ROF==TRUE) which is in demise, AFTER generating finalize sync 
message
2019-02-27 06:48:20.604 SC-1 osafimmnd[229]: NO Removing zombie Admin Owner 2 
safImmService
2019-02-27 06:48:20.613 SC-1 osafimmd[218]: NO ACT: New Epoch for IMMND process 
at node 2020f old epoch: 6  new epoch:7
2019-02-27 06:48:20.614 SC-1 osafimmd[218]: NO ACT: New Epoch for IMMND process 
at node 2040f old epoch: 0  new epoch:7
2019-02-27 06:48:20.616 SC-1 osafimmd[218]: NO ACT: New Epoch for IMMND process 
at node 2030f old epoch: 6  new epoch:7
2019-02-27 06:48:20.617 SC-1 osafimmd[218]: NO ACT: New Epoch for IMMND process 
at node 2050f old epoch: 6  new epoch:7
2019-02-27 06:48:20.618 SC-1 osafimmnd[229]: NO Epoch set to 7 in ImmModel
2019-02-27 06:48:20.618 SC-1 osafimmd[218]: NO ACT: New Epoch for IMMND process 
at node 2010f old epoch: 6  new epoch:7
2019-02-27 06:48:20.621 SC-1 osafimmloadd: NO Sync ending normally
2019-02-27 06:48:20.707 SC-1 osafimmnd[229]: NO SERVER STATE: 
IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
2019-02-27 06:48:20.748 SC-1 osafamfd[285]: NO Received node_up from 2040f: 
msg_id 1
2019-02-27 06:48:20.748 SC-1 osafamfd[285]: WA Not a Cluster Member dropping 
the msg
2019-02-27 06:48:21.776 SC-1 osafamfd[285]: NO Received node_up from 2040f: 
msg_id 1
2019-02-27 06:48:21.776 SC-1 osafamfd[285]: WA Not a Cluster Member dropping 
the msg**



---

** [tickets:#3015] Amf: node can not join the cluster**

**Status:** assigned
**Milestone:** 5.19.03
**Created:** Fri Mar 01, 2019 04:34 AM UTC by Thang Duc Nguyen
**Last Updated:** Fri Mar 01, 2019 04:34 AM UTC
**Owner:** Thang Duc Nguyen


When the pbe was hung on ACTIVE SC, a PL node start then stop then start at 
during that time.
The AMF will be TIMEOUT when update runtime in the first start. After timeout, 
AMFD on ACTIVE will process event and it can process the events in the below 
order and it causes the PL can not join the cluster.
- clm_track_cb was called to process the first node down event.
- clm_track_cb was called to process the node up event.
- avd_mds_avnd_down_evh was called to first amfnd down event -> it sets PL NOT 
a member of cluster.
- So the PL was stucked with the below message
*2019-02-26 14:11:36.879 SC-1 osafamfd[285]: NO Received nodeup from 2040f: 
msgid 1
2019-02-26 14:11:36.880 SC-1 osafamfd[285]: WA Not a Cluster Member dropping 
the msg
2019-02-26 14:11:37.985 SC-1 osafamfd[285]: NO Received nodeup from 2040f: 
msgid 1
2019-02-26 14:11:37.986 SC-1 osafamfd[285]: WA Not a Cluster Member dropping 
the msg
2019-02-26 14:11:39.079 SC-1 osafamfd[285]: NO Received nodeup from 2040f: 
msgid 1
2019-02-26 14:11:39.081 SC-1 osafamfd[285]: WA Not a Cluster Member dropping 
the msg*

In this case, increase the priority amfnd down event subscribe by AMFD can make 
the PL was rebooted and can re-join the cluster.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #3015 Amf: node can not join the cluster

Reply via email to