- **summary**: IMMD asserted when trying to become active during failover --> 
Support the case when amfnd crashes
- **status**: assigned --> review
- **Milestone**: 4.4.0 --> 4.4.FC



---

** [tickets:#721] Support the case when amfnd crashes**

**Status:** review
**Created:** Thu Jan 16, 2014 07:32 AM UTC by Sirisha Alla
**Last Updated:** Mon Jan 27, 2014 02:39 PM UTC
**Owner:** Mathi Naickan

The issue is seen on changeset 4733 + patches of CLM corresponding to 
changesets of #220. Continuous failovers are happening when some api 
invocations of IMM application are ongoing. The IMMD has asserted on the new 
active leading to cluster reset.

SC-1 is active and amfnd is killed to trigger a failover

Jan 15 18:23:03 SLES-64BIT-SLOT1 osafimmnd[2411]: NO Ccb 35 COMMITTED (exowner)
Jan 15 18:23:07 SLES-64BIT-SLOT1 osafimmnd[2411]: NO implementer for class 
'testMA_verifyObjApplNoResponseModCallback_101' is released => class extent is 
UNSAFE
Jan 15 18:23:57 SLES-64BIT-SLOT1 sshd[3010]: Accepted keyboard-interactive/pam 
for root from 192.168.56.103 port 60396 ssh2
Jan 15 18:23:59 SLES-64BIT-SLOT1 root: killing osafamfnd from invoke_failover.sh
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafclmd[2455]: AL AMF Node Director is down, 
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafntfd[2441]: AL AMF Node Director is down, 
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafevtd[2609]: AL AMF Node Director is down, 
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafckptd[2600]: AL AMF Node Director is down, 
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osaflogd[2421]: AL AMF Node Director is down, 
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafrded[2382]: AL AMF Node Director is down, 
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafclmna[2465]: AL AMF Node Director is down, 
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafimmd[2401]: AL AMF Node Director is down, 
terminate this process

SC-2 tried to become active but IMMD asserted leading to cluster reset

Jan 15 18:24:01 SLES-64BIT-SLOT2 osaffmd[2625]: NO Peer FM down on node_id: 
131343
Jan 15 18:24:01 SLES-64BIT-SLOT2 osaffmd[2625]: NO Role: STANDBY, Node Down for 
node id: 2010f
Jan 15 18:24:01 SLES-64BIT-SLOT2 osaffmd[2625]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131599, SupervisionTime = 60
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: WA IMMD lost contact with peer 
IMMD (NCSMDS_RED_DOWN)
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA DISCARD DUPLICATE FEVS 
message:92993
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA Error code 2 returned for 
message type 57 - ignoring
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA DISCARD DUPLICATE FEVS 
message:92994
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA Error code 2 returned for 
message type 57 - ignoring
Jan 15 18:24:01 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafrded[2616]: NO rde_rde_set_role: role set 
to 1
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: NO ACTIVE request
Jan 15 18:24:01 SLES-64BIT-SLOT2 osaflogd[2654]: NO ACTIVE request
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafntfd[2667]: NO ACTIVE request
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafclmd[2681]: NO ACTIVE request
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafamfd[2700]: NO FAILOVER StandBy --> Active
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: NO ellect_coord invoke from 
lga_callback ACTIVE
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: ER Changing IMMND coord while 
old coord is still up!
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: immd_proc.c:297: 
immd_proc_elect_coord: Assertion 'immnd_info_node->immnd_key == cb->node_id' 
failed.
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA Director Service in 
NOACTIVE state - fevs replies pending:2 fevs highest processed:92994
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafamfnd[2714]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafamfnd[2714]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafamfnd[2714]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60
Jan 15 18:24:01 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafclmd[2681]: ER clms_mds_msg_send FAILED: 2
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafclmd[2681]: ER 
clms_clma_api_msg_dispatcher FAILED: type 0
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: NO No IMMD service => cluster 
restart


Attached the logs with IMMD traces


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to