- **Milestone**: 4.5.2 --> 4.6.2
---
** [tickets:#1529] Node rebooted as saImmOiInitialize_2 failed during
middleware active assignment**
**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Thu Oct 08, 2015 07:53 AM UTC by Chani Srivastava
**Last Updated:** Fri Oct 09, 2015 10:54 AM UTC
**Owner:** nobody
**Attachments:**
-
[SC1_syslog.txt](https://sourceforge.net/p/opensaf/tickets/1529/attachment/SC1_syslog.txt)
(436.4 kB; text/plain)
-
[SC2_syslog.txt](https://sourceforge.net/p/opensaf/tickets/1529/attachment/SC2_syslog.txt)
(425.6 kB; text/plain)
-
[1529.tgz](https://sourceforge.net/p/opensaf/tickets/1529/attachment/1529.tgz)
(586.3 kB; application/x-compressed-tar)
Setup:
Changeset-6901
Invoked continuous failovers on a 4-node Cluster with 2 controllers and 2
payloads. All nodes have 64bit architecture.
2PBE enabled with 25K objects
Issue Observed:
Cluster reset occurred on invoking continuous failovers
Attachments:
Attaching syslogs for SC-1 and SC-2
Traces for immnd and immd can be shared seperately if required
Steps:
* Initially SC-1 is active and SC-2 standby
* A test script invoked failover via killing osafclmd on SC1
* SC-2 became active
Oct 7 18:23:32 OSAF-SC1 root: killing osafclmd from invoke_failover.sh
Oct 7 19:25:20 OSAF-SC2 osafamfd[2191]: NO FAILOVER StandBy --> Active
* On the new active controler, saImmOiInitialize_2 failed
Oct 7 19:25:22 OSAF-SC2 osafntfimcnd[2735]: ER ntfimcn_imm_init
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Oct 7 19:25:22 OSAF-SC2 osafntfimcnd[2735]: ER ntfimcn_imm_init() Fail
Oct 7 19:25:22 OSAF-SC2 osafimmnd[2131]: NO Implementer connected: 333
(safLckService) <299, 2020f>
Oct 7 19:25:22 OSAF-SC2 osafimmnd[2131]: NO Implementer connected: 334
(safEvtService) <298, 2020f>
Oct 7 19:25:23 OSAF-SC2 osafntfimcnd[2738]: ER ntfimcn_imm_init
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Oct 7 19:25:23 OSAF-SC2 osafntfimcnd[2738]: ER ntfimcn_imm_init() Fail
Oct 7 19:25:23 OSAF-SC2 osafimmnd[2131]: WA MDS Send Failed
Oct 7 19:25:23 OSAF-SC2 osafimmnd[2131]: WA Error code 2 returned for message
type 4 - ignoring
* Other services also fail to initialize with IMM on new active
controller..i.e. SC-2
* And finally SMF had csi set timeout
* SC-2 went for reboot and hence the entire cluster reset, as SC-2 is the only
active controller at the time
Oct 7 19:25:51 OSAF-SC2 osafamfnd[2205]: NO
'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to
'csiSetcallbackTimeout' : Recovery is 'nodeFailfast'
Oct 7 19:25:51 OSAF-SC2 osafamfnd[2205]: ER
safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due
to:csiSetcallbackTimeout Recovery is:nodeFailfast
Oct 7 19:25:51 OSAF-SC2 osafamfnd[2205]: Rebooting OpenSAF NodeId = 131599 EE
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId =
131599, SupervisionTime = 60
Oct 7 19:25:51 OSAF-SC2 opensaf_reboot: Rebooting local node; timeout=60
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets