- **Milestone**: 4.6.2 --> 4.7.2
---
** [tickets:#1729] Immd crashed on Active controller because of health check
timeout **
**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Wed Apr 06, 2016 09:01 AM UTC by Ritu Raj
**Last Updated:** Wed Apr 06, 2016 11:02 AM UTC
**Owner:** nobody
Setup:
Changeset- 7436
Version - opensaf 5.0
4 nodes configured with single PBE and a load of 30K objects
Issue Observed:
1) Standby controller did not join the active controller.
2) IMMD on active controler got health check timeout.
Steps performed:
* Started OpenSAF on the controller SC-1 with PBE load and SC-1 took the
active role .
* Now, started OpenSAF on the controller SC-2 and SC-2 failed to join the
cluster
Apr 6 12:54:04 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF Services(5.0.FC - )
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr 6 12:54:04 SLES-32BIT-SLOT2 kernel:
[95783.514531] TIPC: Activated (version 2.0.0)
Apr 6 12:54:04 SLES-32BIT-SLOT2 kernel: [95783.514587] NET: Registered
protocol family
..........
Apr 6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: Started
Apr 6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: NO
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
.........
Apr 6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO IMMD service is UP ...
ScAbsenseAllowed?:0 introduced?:0
Apr 6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO SERVER STATE:
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr 6 12:54:09 SLES-32BIT-SLOT2 osafimmnd[28303]: WA Resending introduce-me -
problems with MDS ? 5
........
Apr 6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed to load/sync.
Giving up after 51 seconds, restarting..
Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Failed DESC:IMMND
Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Going for recovery
Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Trying To RESPAWN
/usr/lib/opensaf/clc-cli/osaf-immnd attempt #1
Apr 6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Sending SIGKILL to IMMND,
pid=28297
Apr 6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER IMMND - Periodic server
job failed
Apr 6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed, exiting...
Apr 6 12:55:10 SLES-32BIT-SLOT2 osafimmnd[28340]: Started
........
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
Apr 6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me -
problems with MDS ? 50
....
Apr 6 12:57:07 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF failed
* After the opensafd failed to come up on SC-2, SC-1 rebooted with immd
healthcheck timeout.
Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO Performing failover of
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated
from 'componentFailover' to 'suFailover'
Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
**Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: ER
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:healthCheckcallbackTimeout Recovery is:suFailover**
Apr 6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131343, SupervisionTime = 60
Apr 6 12:57:09 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
timeout=60
This issue is random.
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets