- **status**: unassigned --> accepted
- **assigned_to**: Mathi Naickan
- **Priority**: major --> critical
---
** [tickets:#1762] CLM : Healthy payloads are marked as Non-member nodes after
failover**
**Status:** accepted
**Milestone:** 5.0.RC2
**Created:** Thu Apr 14, 2016 11:08 AM UTC by Srikanth R
**Last Updated:** Fri Apr 22, 2016 12:31 PM UTC
**Owner:** Mathi Naickan
Setup :
Changeset : 7436 5.0.FC
5 nodes cluster with Application deployed on PL-3 and PL-4.
Issue :
Healthy payloads are marked as Non-member nodes after failover
Steps performed :
* Started opensaf on all the nodes .i.e SC-1 to PL-5
* Initially brought up AMF application deployed on PL-3 and PL-4
* Ran some tests on the setup including switchovers, failovers and CLM lock
operations on PL-3 and PL-4.
* Restarted opensafd on PL-4. After the restart, AMF applications on PL-3 got
the corresponding standby assignment as per expectation.
Below is the trace from osafclmd
Apr 14 14:15:45.621396 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM
node safNode=PL-4,safCluster=myClmCluster exit
Apr 14 14:15:56.548867 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM
node safNode=PL-4,safCluster=myClmCluster Join
* Similarly restarted opensafd on PL-3 and the AMF application came up fine.
Apr 14 14:16:00.890903 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM
node safNode=PL-3,safCluster=myClmCluster exit
Apr 14 14:21:41.602270 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM
node safNode=PL-3,safCluster=myClmCluster Join
* Now induced a failover by killing ckptd on the active controller SC-1.
* SC-2 took active role.
Apr 14 14:21:44 CONTROLLER-2 osafamfd[22600]: NO FAILOVER StandBy --> Active
* But the two payloads PL-3 and PL-4 are marked as out of cluster by AMF.
PL-5 is still part of the cluster
Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-4' left the cluster
Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-3' left the cluster
Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: WA avd_msg_sanity_chk: invalid
node ID (2030f)
* Below is the trace from CLMD about PL-3 & PL-4 exit, just after the active
promotion.
Apr 14 14:21:45.009100 osafclmd [22590:clms_ntf.c:0180] TR Notification for
CLM node safNode=PL-4,safCluster=myClmCluster exit
Apr 14 14:21:45.136368 osafclmd [22590:clms_ntf.c:0180] TR Notification for
CLM node safNode=PL-3,safCluster=myClmCluster exit
* The AMF applications on PL-3 and PL-4 did not receive any csi removal
callback during failover, but AMF nodes are marked as disabled & attribute
saClmNodeIsMember of the CLM objects PL_3 and PL-4 is set to 0. Opensafd
status doesn't show PL-3 and PL-4,
* The CLM apis on PL-3 and PL-4 failed with ERR_UNAVAILABLE, but not for other
services like CKPT, MQSV.
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets