[tickets] [opensaf:tickets] #1729 Immd crashed on Active controller because of health check timeout

2016-04-06 Thread Ritu Raj



---

** [tickets:#1729] Immd crashed on Active controller because of health check 
timeout **

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Wed Apr 06, 2016 09:01 AM UTC by Ritu Raj
**Last Updated:** Wed Apr 06, 2016 09:01 AM UTC
**Owner:** nobody


Setup:
Changeset- 7436
Version - opensaf 5.0
4 nodes configured with single PBE and a load of 30K objects

Issue Observed:
1) Standby controller did not join the active controller.
2) IMMD on active controler got health check timeout.

Steps performed:
* Started OpenSAF on the controller SC-1 with  PBE load and SC-1 took the 
active role .

*  Now, started OpenSAF on  the controller SC-2 and SC-2 failed to join the 
cluster

Apr  6 12:54:04 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF Services(5.0.FC - ) 
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr  6 12:54:04 SLES-32BIT-SLOT2 kernel: 
[95783.514531] TIPC: Activated (version 2.0.0)
Apr  6 12:54:04 SLES-32BIT-SLOT2 kernel: [95783.514587] NET: Registered 
protocol family
..
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: Started
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
.
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr  6 12:54:09 SLES-32BIT-SLOT2 osafimmnd[28303]: WA Resending introduce-me - 
problems with MDS ? 5


Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed to load/sync. 
Giving up after 51 seconds, restarting..
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Failed   DESC:IMMND
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Going for recovery
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Trying To RESPAWN 
/usr/lib/opensaf/clc-cli/osaf-immnd attempt #1
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Sending SIGKILL to IMMND, 
pid=28297
Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER IMMND - Periodic server 
job failed
Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed, exiting...
Apr  6 12:55:10 SLES-32BIT-SLOT2 osafimmnd[28340]: Started

Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50

Apr  6 12:57:07 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF failed

* After the opensafd failed to come up on SC-2, SC-1 rebooted with immd 
healthcheck timeout.

Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO Performing failover of 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
**Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout  Recovery is:suFailover**
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Apr  6 12:57:09 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60

 This issue is random.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1729 Immd crashed on Active controller because of health check timeout

2016-04-06 Thread Ritu Raj
Syslogs and traces of IMM of both controllers. 


Attachments:

- 
[1729.tar.bz2](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/9df5c75c/2faa/attachment/1729.tar.bz2)
 (5.5 MB; application/x-bzip)


---

** [tickets:#1729] Immd crashed on Active controller because of health check 
timeout **

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Wed Apr 06, 2016 09:01 AM UTC by Ritu Raj
**Last Updated:** Wed Apr 06, 2016 09:01 AM UTC
**Owner:** nobody


Setup:
Changeset- 7436
Version - opensaf 5.0
4 nodes configured with single PBE and a load of 30K objects

Issue Observed:
1) Standby controller did not join the active controller.
2) IMMD on active controler got health check timeout.

Steps performed:
* Started OpenSAF on the controller SC-1 with  PBE load and SC-1 took the 
active role .

*  Now, started OpenSAF on  the controller SC-2 and SC-2 failed to join the 
cluster

Apr  6 12:54:04 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF Services(5.0.FC - ) 
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr  6 12:54:04 SLES-32BIT-SLOT2 kernel: 
[95783.514531] TIPC: Activated (version 2.0.0)
Apr  6 12:54:04 SLES-32BIT-SLOT2 kernel: [95783.514587] NET: Registered 
protocol family
..
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: Started
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
.
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr  6 12:54:09 SLES-32BIT-SLOT2 osafimmnd[28303]: WA Resending introduce-me - 
problems with MDS ? 5


Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed to load/sync. 
Giving up after 51 seconds, restarting..
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Failed   DESC:IMMND
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Going for recovery
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Trying To RESPAWN 
/usr/lib/opensaf/clc-cli/osaf-immnd attempt #1
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Sending SIGKILL to IMMND, 
pid=28297
Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER IMMND - Periodic server 
job failed
Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed, exiting...
Apr  6 12:55:10 SLES-32BIT-SLOT2 osafimmnd[28340]: Started

Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50

Apr  6 12:57:07 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF failed

* After the opensafd failed to come up on SC-2, SC-1 rebooted with immd 
healthcheck timeout.

Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO Performing failover of 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
**Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout  Recovery is:suFailover**
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Apr  6 12:57:09 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60

 This issue is random.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinf

[tickets] [opensaf:tickets] #1729 Immd crashed on Active controller because of health check timeout

2016-05-04 Thread Mathi Naickan
- **Milestone**: 4.6.2 --> 4.7.2



---

** [tickets:#1729] Immd crashed on Active controller because of health check 
timeout **

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Wed Apr 06, 2016 09:01 AM UTC by Ritu Raj
**Last Updated:** Wed Apr 06, 2016 11:02 AM UTC
**Owner:** nobody


Setup:
Changeset- 7436
Version - opensaf 5.0
4 nodes configured with single PBE and a load of 30K objects

Issue Observed:
1) Standby controller did not join the active controller.
2) IMMD on active controler got health check timeout.

Steps performed:
* Started OpenSAF on the controller SC-1 with  PBE load and SC-1 took the 
active role .

*  Now, started OpenSAF on  the controller SC-2 and SC-2 failed to join the 
cluster

Apr  6 12:54:04 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF Services(5.0.FC - ) 
(Using TIPC)
Starting OpenSAF Services (Using TIPC):Apr  6 12:54:04 SLES-32BIT-SLOT2 kernel: 
[95783.514531] TIPC: Activated (version 2.0.0)
Apr  6 12:54:04 SLES-32BIT-SLOT2 kernel: [95783.514587] NET: Registered 
protocol family
..
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: Started
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafclmna[28264]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
.
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr  6 12:54:04 SLES-32BIT-SLOT2 osafimmnd[28303]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr  6 12:54:09 SLES-32BIT-SLOT2 osafimmnd[28303]: WA Resending introduce-me - 
problems with MDS ? 5


Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed to load/sync. 
Giving up after 51 seconds, restarting..
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Failed   DESC:IMMND
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Going for recovery
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Trying To RESPAWN 
/usr/lib/opensaf/clc-cli/osaf-immnd attempt #1
Apr  6 12:54:55 SLES-32BIT-SLOT2 opensafd[28232]: ER Sending SIGKILL to IMMND, 
pid=28297
Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER IMMND - Periodic server 
job failed
Apr  6 12:54:55 SLES-32BIT-SLOT2 osafimmnd[28303]: ER Failed, exiting...
Apr  6 12:55:10 SLES-32BIT-SLOT2 osafimmnd[28340]: Started

Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50
Apr  6 12:56:00 SLES-32BIT-SLOT2 osafimmnd[28340]: WA Resending introduce-me - 
problems with MDS ? 50

Apr  6 12:57:07 SLES-32BIT-SLOT2 opensafd: Starting OpenSAF failed

* After the opensafd failed to come up on SC-2, SC-1 rebooted with immd 
healthcheck timeout.

Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO Performing failover of 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
**Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout  Recovery is:suFailover**
Apr  6 12:57:09 SLES-64BIT-SLOT1 osafamfnd[2211]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60
Apr  6 12:57:09 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60

 This issue is random.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z