- **summary**: IMMD asserted when trying to become active during failover -->
Support the case when amfnd crashes
- **status**: assigned --> review
- **Milestone**: 4.4.0 --> 4.4.FC
---
** [tickets:#721] Support the case when amfnd crashes**
**Status:** review
**Created:** Thu Jan 16, 2014 07:32 AM UTC by Sirisha Alla
**Last Updated:** Mon Jan 27, 2014 02:39 PM UTC
**Owner:** Mathi Naickan
The issue is seen on changeset 4733 + patches of CLM corresponding to
changesets of #220. Continuous failovers are happening when some api
invocations of IMM application are ongoing. The IMMD has asserted on the new
active leading to cluster reset.
SC-1 is active and amfnd is killed to trigger a failover
Jan 15 18:23:03 SLES-64BIT-SLOT1 osafimmnd[2411]: NO Ccb 35 COMMITTED (exowner)
Jan 15 18:23:07 SLES-64BIT-SLOT1 osafimmnd[2411]: NO implementer for class
'testMA_verifyObjApplNoResponseModCallback_101' is released => class extent is
UNSAFE
Jan 15 18:23:57 SLES-64BIT-SLOT1 sshd[3010]: Accepted keyboard-interactive/pam
for root from 192.168.56.103 port 60396 ssh2
Jan 15 18:23:59 SLES-64BIT-SLOT1 root: killing osafamfnd from invoke_failover.sh
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafclmd[2455]: AL AMF Node Director is down,
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafntfd[2441]: AL AMF Node Director is down,
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafevtd[2609]: AL AMF Node Director is down,
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafckptd[2600]: AL AMF Node Director is down,
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osaflogd[2421]: AL AMF Node Director is down,
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafrded[2382]: AL AMF Node Director is down,
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafclmna[2465]: AL AMF Node Director is down,
terminate this process
Jan 15 18:23:59 SLES-64BIT-SLOT1 osafimmd[2401]: AL AMF Node Director is down,
terminate this process
SC-2 tried to become active but IMMD asserted leading to cluster reset
Jan 15 18:24:01 SLES-64BIT-SLOT2 osaffmd[2625]: NO Peer FM down on node_id:
131343
Jan 15 18:24:01 SLES-64BIT-SLOT2 osaffmd[2625]: NO Role: STANDBY, Node Down for
node id: 2010f
Jan 15 18:24:01 SLES-64BIT-SLOT2 osaffmd[2625]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId =
131599, SupervisionTime = 60
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: WA IMMD lost contact with peer
IMMD (NCSMDS_RED_DOWN)
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA DISCARD DUPLICATE FEVS
message:92993
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA Error code 2 returned for
message type 57 - ignoring
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA DISCARD DUPLICATE FEVS
message:92994
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA Error code 2 returned for
message type 57 - ignoring
Jan 15 18:24:01 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting remote node in the
absence of PLM is outside the scope of OpenSAF
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafrded[2616]: NO rde_rde_set_role: role set
to 1
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: NO ACTIVE request
Jan 15 18:24:01 SLES-64BIT-SLOT2 osaflogd[2654]: NO ACTIVE request
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafntfd[2667]: NO ACTIVE request
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafclmd[2681]: NO ACTIVE request
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafamfd[2700]: NO FAILOVER StandBy --> Active
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: NO ellect_coord invoke from
lga_callback ACTIVE
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: ER Changing IMMND coord while
old coord is still up!
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmd[2635]: immd_proc.c:297:
immd_proc_elect_coord: Assertion 'immnd_info_node->immnd_key == cb->node_id'
failed.
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: WA Director Service in
NOACTIVE state - fevs replies pending:2 fevs highest processed:92994
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafamfnd[2714]: NO
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafamfnd[2714]: ER
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafamfnd[2714]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131599, SupervisionTime = 60
Jan 15 18:24:01 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafclmd[2681]: ER clms_mds_msg_send FAILED: 2
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafclmd[2681]: ER
clms_clma_api_msg_dispatcher FAILED: type 0
Jan 15 18:24:01 SLES-64BIT-SLOT2 osafimmnd[2645]: NO No IMMD service => cluster
restart
Attached the logs with IMMD traces
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets