immd traces are not availabel when the assertion is happened:
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: immnd_evt.c:9146: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.

can you check, that the system resources are not used completely, like hard 
disk (check space si full).
looks, like memory corruption problems also. 

keep, sufficent resources and try to run the test again.



---

** [tickets:#2037] IMM: Immd asserted on active controller in backward 
compatability**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:51 AM UTC by Madhurika Koppula
**Last Updated:** Thu Sep 15, 2016 07:19 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[immnd_immd_cores.rtf](https://sourceforge.net/p/opensaf/tickets/2037/attachment/immnd_immd_cores.rtf)
 (8.5 kB; application/rtf)


**Environment Details:**

OS : Suse 64bit
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled ).

Backward Compatability:
Opensaf versions on nodes:
SC-1 (5.0), SC-2 (5.1 FC), PL-3 (5.0), PL-4(5.1FC).

**Summary:** IMMD asserted on active controller after immnd crash.

**Steps followed & Observed behaviour:**

1) SC-1 is with role standby, SC-2 is with role active.
2) Sequence of api's called as below. 
            a) saImmOiInitialize() 
            b) saImmOiImplementerSet() 
c) kill -9 `pidof osafimmnd` d) saImmOiRtObjectDelete() 
             e) saImmOiFinalize()

Observations:

1) First immnd asserted on active controller  when calling 
immnd_evt_proc_fevs_rcv
2) Second active controller rebooted with immd assertion failed.

Below is the snippet of active controller SC-2:

Sep 20 20:52:47 SCALE_SLOT-42 osafntfimcnd[15091]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: Started
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:3, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: immnd_evt.c:9146: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.
Sep 20 20:52:47 SCALE_SLOT-42 python2.5: WA imma_mds_svc_evt: 
mds_auth_server_connect failed
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:4, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO Restarting a component of 
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 6)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery

Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO Extended intro from node 2020f
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: immd_evt.c:816: 
immd_accept_node: Assertion 'node_info->immnd_key != cb->node_id' failed.
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60

After reboot timestamp is as below:

Sep 20 20:52:47 SCALE_SLOT-42 opensaf_reboot: Rebooting local node; timeout=60
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA DISCARD DUPLICATE FEVS 
message:12996
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Sep 22 02:13:05 SCALE_SLOT-42 syslog-ng[1133]: syslog-ng starting up; 
version='2.0.9'
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Backgrounding to notify hosts...
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: DNS resolution of CONN-PC 
failed; retrying later
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Flags: TI-RPC
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:07 SCALE_SLOT-42 opensafd: Starting OpenSAF Services(5.1.FC - ) 
(Using TIPC)
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: Started
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
Sep 22 02:13:07 SCALE_SLOT-42 osafrded[1754]: Started
Sep 22 02:13:08 SCALE_SLOT-42 osaffmd[1763]: Started


Below is the snippet of standby controller SC-1:

Sep 20 22:54:27 SCALE_SLOT-41 osafimmd[9455]: NO Skipping re-send of fevs 
message 12996 since it has recently been resent.
Sep 20 22:54:27 SCALE_SLOT-41 osafimmnd[15931]: NO Global discard node received 
for nodeId:2020f pid:15114
Sep 20 22:54:27 SCALE_SLOT-41 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Sep 20 22:54:27 SCALE_SLOT-41 osaffmd[9445]: NO Controller Failover: Setting 
role to ACTIVE
Sep 20 22:54:27 SCALE_SLOT-41 osafrded[9436]: NO RDE role set to ACTIVE
Sep 20 22:54:27 SCALE_SLOT-41 osafrded[9436]: NO Running 
'/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s)
Sep 20 22:54:27 SCALE_SLOT-41 osafimmd[9455]: NO ACTIVE request
Sep 20 22:54:27 SCALE_SLOT-41 osaflogd[9476]: NO ACTIVE request

=============================================================================
Attachments:

1) syslog and imm traces of both controllers.
2) Stacktraces of immd and immnd asserts.






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to