---
** [tickets:#1110] NTF healthcheck callback timedout leading to node reboot**
**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Thu Sep 18, 2014 07:41 AM UTC by Sirisha Alla
**Last Updated:** Thu Sep 18, 2014 07:41 AM UTC
**Owner:** nobody
This issue is in continuation to ticket #1109.
During failover, the node that went for reboot failed to come up due to #1109.
Just then NTF health check callback timeout happened on the then Active
Controller leading to cluster reset.
Syslog of SC-2:
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafamfd[2391]: NO FAILOVER StandBy --> Active
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmd[2327]: NO ellect_coord invoke from
rda_callback ACTIVE
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmd[2327]: NO New coord elected, resides
at 2020f
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmnd[2337]: NO This IMMND is now the NEW
Coord
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmnd[2337]: NO PBE writing when new coord
elected => force PBE to regenerate db file
Sep 18 12:28:01 SLES-64BIT-SLOT2 osafimmnd[2337]: NO STARTING PBE process.
.....
Sep 18 12:28:11 SLES-64BIT-SLOT2 osafamfnd[2401]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Sep 18 12:28:21 SLES-64BIT-SLOT2 osafamfd[2391]: ER
sendStateChangeNotificationAvd: saNtfNotificationSend Failed (5)
Sep 18 12:28:31 SLES-64BIT-SLOT2 kernel: [ 111.656926] TIPC: Established link
<1.1.2:eth0-1.1.1:eth0> on network plane A
Sep 18 12:28:32 SLES-64BIT-SLOT2 osafimmd[2327]: NO New IMMND process is on
STANDBY Controller at 2010f
Sep 18 12:28:32 SLES-64BIT-SLOT2 osafimmd[2327]: NO Extended intro from node
2010f
.......
SC-1 went for reboot because of #1109
Sep 18 12:29:40 SLES-64BIT-SLOT2 osaffmd[2317]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId =
131599, SupervisionTime = 60
Sep 18 12:29:40 SLES-64BIT-SLOT2 kernel: [ 180.896027] TIPC: Resetting link
<1.1.2:eth0-1.1.1:eth0>, peer not responding
Sep 18 12:29:40 SLES-64BIT-SLOT2 kernel: [ 180.896032] TIPC: Lost link
<1.1.2:eth0-1.1.1:eth0> on network plane A
Sep 18 12:29:40 SLES-64BIT-SLOT2 kernel: [ 180.896034] TIPC: Lost contact with
<1.1.1>
Sep 18 12:29:40 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting remote node in the
absence of PLM is outside the scope of OpenSAF
Health check callback timedout on NTF.
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: NO SU failover probation
timer started (timeout: 1200000000000 ns)
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: NO Performing failover of
'safSu=SC-2,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: NO
'safComp=NTF,safSu=SC-2,safSg=2N,safApp=OpenSAF' recovery action escalated from
'componentFailover' to 'suFailover'
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: NO
'safComp=NTF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: ER
safComp=NTF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due
to:healthCheckcallbackTimeout Recovery is:suFailover
Sep 18 12:33:54 SLES-64BIT-SLOT2 osafamfnd[2401]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131599, SupervisionTime = 60
Sep 18 12:33:54 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60
Sep 18 12:34:17 SLES-64BIT-SLOT2 syslog-ng[1139]: syslog-ng starting up;
version='2.0.9'
Sep 18 12:34:18 SLES-64BIT-SLOT2 ifup: lo
syslog and mds logs for both the controllers attached. NTFD traces on SC-2
attached.
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets