---

** [tickets:#1110] NTF healthcheck callback timedout leading to node reboot**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Thu Sep 18, 2014 07:41 AM UTC by Sirisha Alla
**Last Updated:** Thu Sep 18, 2014 07:41 AM UTC
**Owner:** nobody

This issue is in continuation to ticket #1109.

During failover, the node that went for reboot failed to come up due to #1109. 
Just then NTF health check callback timeout happened on the then Active 
Controller leading to cluster reset.

Syslog of SC-2:

Sep 18 12:28:01 SLES-64BIT-SLOT2 osafamfd[2391]: NO FAILOVER StandBy --> Active
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmd[2327]: NO ellect_coord invoke from 
rda_callback ACTIVE
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmd[2327]: NO New coord elected, resides 
at 2020f
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmnd[2337]: NO This IMMND is now the NEW 
Coord
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmnd[2337]: NO PBE writing when new coord 
elected => force PBE to regenerate db file
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmnd[2337]: NO STARTING PBE process.
.....
Sep 18 12:28:11 SLES-64BIT-SLOT2 osafamfnd[2401]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Sep 18 12:28:21 SLES-64BIT-SLOT2 osafamfd[2391]: ER 
sendStateChangeNotificationAvd: saNtfNotificationSend Failed (5)
Sep 18 12:28:31 SLES-64BIT-SLOT2 kernel: [  111.656926] TIPC: Established link 
<1.1.2:eth0-1.1.1:eth0> on network plane A
Sep 18 12:28:32 SLES-64BIT-SLOT2 osafimmd[2327]: NO New IMMND process is on 
STANDBY Controller at 2010f
Sep 18 12:28:32 SLES-64BIT-SLOT2 osafimmd[2327]: NO Extended intro from node 
2010f
.......

SC-1 went for reboot because of #1109

Sep 18 12:29:40 SLES-64BIT-SLOT2 osaffmd[2317]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131599, SupervisionTime = 60
Sep 18 12:29:40 SLES-64BIT-SLOT2 kernel: [  180.896027] TIPC: Resetting link 
<1.1.2:eth0-1.1.1:eth0>, peer not responding
Sep 18 12:29:40 SLES-64BIT-SLOT2 kernel: [  180.896032] TIPC: Lost link 
<1.1.2:eth0-1.1.1:eth0> on network plane A
Sep 18 12:29:40 SLES-64BIT-SLOT2 kernel: [  180.896034] TIPC: Lost contact with 
<1.1.1>
Sep 18 12:29:40 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF


Health check callback timedout on NTF.

Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: NO SU failover probation 
timer started (timeout: 1200000000000 ns)
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: NO Performing failover of 
'safSu=SC-2,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: NO 
'safComp=NTF,safSu=SC-2,safSg=2N,safApp=OpenSAF' recovery action escalated from 
'componentFailover' to 'suFailover'
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: NO 
'safComp=NTF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: ER 
safComp=NTF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout Recovery is:suFailover
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60
Sep 18 12:33:54 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60
Sep 18 12:34:17 SLES-64BIT-SLOT2 syslog-ng[1139]: syslog-ng starting up; 
version='2.0.9'
Sep 18 12:34:18 SLES-64BIT-SLOT2 ifup:     lo

syslog and mds logs for both the controllers attached. NTFD traces on SC-2 
attached. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to