- **Milestone**: 4.3.3 --> 4.4.2
---
** [tickets:#707] Quiesced controller failed to become Active when the standby
controller rebooted in middle of switchover**
**Status:** unassigned
**Milestone:** 4.4.2
**Created:** Fri Jan 03, 2014 03:34 PM UTC by Sirisha Alla
**Last Updated:** Fri Sep 19, 2014 06:54 AM UTC
**Owner:** Nagendra Kumar
The issue is observed on changeset 4733 + #220 patches corresponding to cs 4741
and cs 4742. The test setup is a 4 node SLES 64bit VMs.The setup is single PBE
enabled loaded with 25k objects.
Following is the steps followed to reproduce the issue.
1) Trigger middleware switchover. Make sure that the IMMND coordinator is on
standby controller before triggering switchover
2) Reboot the standby controller when the active has just moved to quiesced
The test is tried multiple times and different errors are seen each time
1) AMFD received BAD_HANDLE from IMM. Here SLOT2(SC-2) is the active controller
at the beginning of the test
Jan 3 14:42:13 SLES-64BIT-SLOT2 osafimmpbed: NO Successfully opened
pre-existing sqlite pbe file /home/sirisha/immsv/immpbe/imm.db
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Failed to stop cluster
tracking 5
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ClmTrack stop failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafrded[2375]: NO rde_rde_set_role: role set
to 3
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Node 'SC-1' left the cluster
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8884]: NO exiting on signal 15
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected
74 <445, 2020f> (@OpenSafImmReplicatorB)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfd[2430]: NO handle_state_ntfimcn:
osafntfimcnd process terminated. State change
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 77
(safMsgGrpService) <320, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 78
(safCheckPointService) <304, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 79
(safEvtService) <305, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 80
(safLckService) <303, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Backup create cmd =
/usr/lib64/opensaf/smf-backup-create
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Bundle check cmd =
/usr/lib64/opensaf/smf-bundle-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO FAILOVER Quiesced --> Active
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 81
(MsgQueueService131343) <451, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node check cmd =
/usr/lib64/opensaf/smf-node-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ncs_mbcsv_svc
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally
disconnected. Marking it as doomed 81 <451, 2020f> (MsgQueueService131343)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF repository check cmd =
/usr/lib64/opensaf/smf-repository-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected
81 <451, 2020f> (MsgQueueService131343)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cluster reboot cmd =
/usr/lib64/opensaf/smf-cluster-reboot
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer (applier)
connected: 82 (@OpenSafImmReplicatorA) <453, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Admin Op Timeout =
600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected
59 <11, 2020f> (safAmfService)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cli Timeout = 600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Re-initializing with IMM
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8918]: NO Started
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Reboot Timeout =
600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 83
(safAmfService) <11, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF will use the STEP
standard set of actions.
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Impl Set Failed for
SaAmfCompBaseType, returned 9
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO DN for si_swap operation =
safSi=SC-2N,safApp=OpenSAF
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER exiting since
avd_imm_impl_set failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SI si_swap operation max
retry = 200
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Max num of campaign
restarts = 10
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO IMM persist command =
immdump /etc/opensaf/imm.xml
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node reboot cmd = reboot
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Turn PBE off during upgrade
= 1
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Enable = 0
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Timeout =
100000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 84
(safSmfService) <299, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: ER AMF director unexpectedly
crashed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally
disconnected. Marking it as doomed 83 <11, 2020f> (safAmfService)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131599, SupervisionTime = 60
2) AMFD received ERR_LIBRARY from IMM. Here SLOT2(SC-2) is the active
controller at the beginning of the test
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafrded[2359]: NO rde_rde_set_role: role set
to 3
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Node 'SC-1' left the cluster
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafntfimcnd[2991]: NO exiting on signal 15
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 30
(safMsgGrpService) <315, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 31
(safCheckPointService) <332, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected
26 <453, 2020f> (@OpenSafImmReplicatorA)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafntfd[2418]: NO handle_state_ntfimcn:
osafntfimcnd process terminated. State change
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 32
(safLckService) <316, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 33
(safEvtService) <331, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Backup create cmd =
/usr/lib64/opensaf/smf-backup-create
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Bundle check cmd =
/usr/lib64/opensaf/smf-bundle-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node check cmd =
/usr/lib64/opensaf/smf-node-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF repository check cmd =
/usr/lib64/opensaf/smf-repository-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cluster reboot cmd =
/usr/lib64/opensaf/smf-cluster-reboot
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Admin Op Timeout =
600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cli Timeout = 600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Reboot Timeout =
600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF will use the STEP
standard set of actions.
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO DN for si_swap operation =
safSi=SC-2N,safApp=OpenSAF
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SI si_swap operation max
retry = 200
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Max num of campaign
restarts = 10
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO IMM persist command =
immdump /etc/opensaf/imm.xml
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node reboot cmd = reboot
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Turn PBE off during upgrade
= 1
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Enable = 0
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Timeout =
100000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO FAILOVER Quiesced --> Active
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER ncs_mbcsv_svc
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 34
(MsgQueueService131343) <456, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer locally
disconnected. Marking it as doomed 34 <456, 2020f> (MsgQueueService131343)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 4
<22, 2020f> (safAmfService)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Re-initializing with IMM
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER saImmOiImplementerSet
failed 2
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER exiting since
avd_imm_impl_set failed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: ER AMF director unexpectedly
crashed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131599, SupervisionTime = 60
Jan 3 15:28:28 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected
34 <456, 2020f> (MsgQueueService131343)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: WA IMMND - Client Node Get
Failed for cli_hdl 94489412111
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init
Becoming an applier failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init() Fail
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for
message type 6 - ignoring
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA ERR_BAD_HANDLE: Client
1967095153167 not found in server
Jan 3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init() Fail
Jan 3 15:28:31 SLES-64BIT-SLOT2 kernel: [ 198.527931] md: stopping all md
devices.
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for
message type 40 - ignoring
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init() Fail
3) AMFD received ERR_TIMEOUT from IMM. Here SLOT1(SC-1) is the active
controller at the beginning of the test
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmd[3806]: NO Coord re-elected, resides
at 2010f
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO This IMMND re-elected
coord redundantly, failover ?
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected
25 <4, 2010f> (@safLogService)
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 28
(safClmService) <15, 2010f>
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 29
(safLogService) <4, 2010f>
Jan 3 15:25:06 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the
absence of PLM is outside the scope of OpenSAF
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafrded[3787]: NO rde_rde_set_role: role set
to 1
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafclmd[3860]: NO ACTIVE request
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: ER FAILOVER Active -->
Quiesced FAILED, ImplementerClear failed 5
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: role.cc:583:
avd_mds_qsd_role_evh: Assertion '0' failed.
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: ER AMF director unexpectedly
crashed
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131343, SupervisionTime = 60
Jan 3 15:25:13 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
timeout=60
Jan 3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer locally
disconnected. Marking it as doomed 4 <21, 2010f> (safAmfService)
Jan 3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected 4
<21, 2010f> (safAmfService)
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1471.089956] md: stopping all md
devices.
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.120172] sd 0:0:0:0: [sda]
Synchronizing SCSI cache
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.219424] ohci_hcd 0000:00:06.0:
PCI INT A disabled
Jan 3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER clms_mds_msg_send FAILED: 2
Jan 3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER
clms_clma_api_msg_dispatcher FAILED: type 0
There are no traces enabled when issue(1) is observed. Issue(3) could be the
same issue as #405.
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets