Hi Nags,

Do you agree with the point I added to this ticket?:

    The likely cause is that an RT update is attempted by AMFD using
    the oi-handle after it has released implementer and before it has
    restored that implementer. An saImmOiRtObjectUpdate with an oi-handle
    that has no implementer-name or an applier-name will result in error.

    The AMFD should maintain a state variable describing the detailed
    state of its oi-handle. Not simply initialized or not but also primary-
    implementer-is-set or applier-is-set or no-implementer-is-set or
    handle-not-initialized.

That is, yes you probably do need some new state associated with the oi-handle,
to keep track of if the handle has implementer-set on it or not yet.
I assume that the AMF thread for handling BAD_HANDLE only re-initializes the 
oi-handle
and does not set-implementer. So the only thing that the bad-handle thread 
needs to do
is to set your new handle-info-state-for-imple-set-or-not to false.
This it should do even if it is the bad-handle thread that continues with the 
task of setting implementer.

I am not sure what you mean by "Imm should(rather must) not give Bad_handle/TO 
in regular cases".
Currently the imm returns BAD_HANLDE for.

    1 - the interface-violation case specified by SAF for invalid handle (e.g. 
handle was closed or never initialized).
    2 - the interface violation case specified by SAF where the handle is valid 
but not in correct state
         (the case of this ticket, handle is initialized but implementer has 
not been set for the oi-handle when an
          oi operation is done).
    3 - the "handle closed by server side" case needed for OpenSAF, i.e. IMMND 
restarted.

I am not sure which of these you call "regular" and that you dont want to get 
bad-handle for :-)

I added in this ticket a reference to ticket #1064 (enhancement) indicating 
that for the state error case (2)
we should instead return one of the unambiguous state error codes: 
BAD_OPERATION.

But this ticket is not really about 'handling* case 2.
It is about fixing AMFD so that case (2) never happens.
After all, both case 1 and case 2 are application bug cases, i.e. cases where 
it makes no sense to
write code for "handling" the cases.
The interface violation caes should be eliminated so that the AMFD can assume 
that ALL cases of BAD_HANDLE
are of type (3) and not a bug in AMFD that it tries to compensate for.

Does this make more sense ?

/AndersBj

________________________________
From: Nagendra Kumar [mailto:nagendr...@users.sf.net]
Sent: den 18 september 2014 12:55
To: [opensaf:tickets]
Subject: [opensaf:tickets] #707 Quiesced controller failed to become Active 
when the standby controller rebooted in middle of switchover


Hi Anders,

This ticket needs synchronization between Amfd thread and thread being spawned 
for imm apis for handling bad_handle.
I am not sure whether to keep mutex as it will make any way Amfd thread waiting.
Since most of the flows hits imm interactions, it is bound to delay Amfd HA.
So, what is the advantages of reinitializing imm in a separate thread for Amf ?

Rather, Imm should(rather must) not give Bad_handle/TO in regular cases.

-Nagu

________________________________

[tickets:#707]<http://sourceforge.net/p/opensaf/tickets/707> Quiesced 
controller failed to become Active when the standby controller rebooted in 
middle of switchover

Status: unassigned
Milestone: 4.3.3
Created: Fri Jan 03, 2014 03:34 PM UTC by Sirisha Alla
Last Updated: Thu Sep 11, 2014 01:29 PM UTC
Owner: Nagendra Kumar

The issue is observed on changeset 4733 + #220 patches corresponding to cs 4741 
and cs 4742. The test setup is a 4 node SLES 64bit VMs.The setup is single PBE 
enabled loaded with 25k objects.

Following is the steps followed to reproduce the issue.

1) Trigger middleware switchover. Make sure that the IMMND coordinator is on 
standby controller before triggering switchover
2) Reboot the standby controller when the active has just moved to quiesced

The test is tried multiple times and different errors are seen each time

1) AMFD received BAD_HANDLE from IMM. Here SLOT2(SC-2) is the active controller 
at the beginning of the test

Jan 3 14:42:13 SLES-64BIT-SLOT2 osafimmpbed: NO Successfully opened 
pre-existing sqlite pbe file /home/sirisha/immsv/immpbe/imm.db
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Failed to stop cluster 
tracking 5
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ClmTrack stop failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafrded[2375]: NO rde_rde_set_role: role set 
to 3
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Node 'SC-1' left the cluster
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8884]: NO exiting on signal 15
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected 74 
<445, 2020f> (@OpenSafImmReplicatorB)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfd[2430]: NO handle_state_ntfimcn: 
osafntfimcnd process terminated. State change
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 77 
(safMsgGrpService) <320, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 78 
(safCheckPointService) <304, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 79 
(safEvtService) <305, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 80 
(safLckService) <303, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Backup create cmd = 
/usr/lib64/opensaf/smf-backup-create
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Bundle check cmd = 
/usr/lib64/opensaf/smf-bundle-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO FAILOVER Quiesced --> Active
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 81 
(MsgQueueService131343) <451, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node check cmd = 
/usr/lib64/opensaf/smf-node-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ncs_mbcsv_svc 
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally 
disconnected. Marking it as doomed 81 <451, 2020f> (MsgQueueService131343)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF repository check cmd = 
/usr/lib64/opensaf/smf-repository-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected 81 
<451, 2020f> (MsgQueueService131343)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cluster reboot cmd = 
/usr/lib64/opensaf/smf-cluster-reboot
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer (applier) 
connected: 82 (@OpenSafImmReplicatorA) <453, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Admin Op Timeout = 
600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected 59 
<11, 2020f> (safAmfService)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cli Timeout = 600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Re-initializing with IMM
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8918]: NO Started
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Reboot Timeout = 600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 83 
(safAmfService) <11, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF will use the STEP 
standard set of actions.
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Impl Set Failed for 
SaAmfCompBaseType, returned 9
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO DN for si_swap operation = 
safSi=SC-2N,safApp=OpenSAF
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER exiting since 
avd_imm_impl_set failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SI si_swap operation max 
retry = 200
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Max num of campaign restarts 
= 10
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO IMM persist command = 
immdump /etc/opensaf/imm.xml
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node reboot cmd = reboot
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Turn PBE off during upgrade 
= 1
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Enable = 0
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Timeout = 100000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 84 
(safSmfService) <299, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: ER AMF director unexpectedly 
crashed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally 
disconnected. Marking it as doomed 83 <11, 2020f> (safAmfService)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 131599, SupervisionTime = 60

2) AMFD received ERR_LIBRARY from IMM. Here SLOT2(SC-2) is the active 
controller at the beginning of the test

Jan 3 15:28:28 SLES-64BIT-SLOT2 osafrded[2359]: NO rde_rde_set_role: role set 
to 3
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Node 'SC-1' left the cluster
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafntfimcnd[2991]: NO exiting on signal 15
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 30 
(safMsgGrpService) <315, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 31 
(safCheckPointService) <332, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 26 
<453, 2020f> (@OpenSafImmReplicatorA)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafntfd[2418]: NO handle_state_ntfimcn: 
osafntfimcnd process terminated. State change
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 32 
(safLckService) <316, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 33 
(safEvtService) <331, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Backup create cmd = 
/usr/lib64/opensaf/smf-backup-create
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Bundle check cmd = 
/usr/lib64/opensaf/smf-bundle-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node check cmd = 
/usr/lib64/opensaf/smf-node-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF repository check cmd = 
/usr/lib64/opensaf/smf-repository-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cluster reboot cmd = 
/usr/lib64/opensaf/smf-cluster-reboot
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Admin Op Timeout = 
600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cli Timeout = 600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Reboot Timeout = 600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF will use the STEP 
standard set of actions.
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO DN for si_swap operation = 
safSi=SC-2N,safApp=OpenSAF
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SI si_swap operation max 
retry = 200
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Max num of campaign restarts 
= 10
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO IMM persist command = 
immdump /etc/opensaf/imm.xml
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node reboot cmd = reboot
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Turn PBE off during upgrade 
= 1
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Enable = 0
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Timeout = 100000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO FAILOVER Quiesced --> Active
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER ncs_mbcsv_svc 
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 34 
(MsgQueueService131343) <456, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer locally 
disconnected. Marking it as doomed 34 <456, 2020f> (MsgQueueService131343)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 4 
<22, 2020f> (safAmfService)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Re-initializing with IMM
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER saImmOiImplementerSet failed 
2
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER exiting since 
avd_imm_impl_set failed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: ER AMF director unexpectedly 
crashed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 131599, SupervisionTime = 60
Jan 3 15:28:28 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; timeout=60
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 34 
<456, 2020f> (MsgQueueService131343)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: WA IMMND - Client Node Get 
Failed for cli_hdl 94489412111
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init 
Becoming an applier failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init() Fail
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for 
message type 6 - ignoring
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA ERR_BAD_HANDLE: Client 
1967095153167 not found in server
Jan 3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init 
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init() Fail
Jan 3 15:28:31 SLES-64BIT-SLOT2 kernel: [ 198.527931] md: stopping all md 
devices.
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for 
message type 40 - ignoring
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init 
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init() Fail

3) AMFD received ERR_TIMEOUT from IMM. Here SLOT1(SC-1) is the active 
controller at the beginning of the test

Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmd[3806]: NO Coord re-elected, resides at 
2010f
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO This IMMND re-elected coord 
redundantly, failover ?
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected 25 
<4, 2010f> (@safLogService)
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 28 
(safClmService) <15, 2010f>
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 29 
(safLogService) <4, 2010f>
Jan 3 15:25:06 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafrded[3787]: NO rde_rde_set_role: role set 
to 1
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafclmd[3860]: NO ACTIVE request
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: ER FAILOVER Active --> Quiesced 
FAILED, ImplementerClear failed 5
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: role.cc:583: 
avd_mds_qsd_role_evh: Assertion '0' failed.
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: ER AMF director unexpectedly 
crashed
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 131343, SupervisionTime = 60
Jan 3 15:25:13 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; timeout=60
Jan 3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer locally 
disconnected. Marking it as doomed 4 <21, 2010f> (safAmfService)
Jan 3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected 4 
<21, 2010f> (safAmfService)
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1471.089956] md: stopping all md 
devices.
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.120172] sd 0:0:0:0: [sda] 
Synchronizing SCSI cache
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.219424] ohci_hcd 0000:00:06.0: 
PCI INT A disabled
Jan 3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER clms_mds_msg_send FAILED: 2
Jan 3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER clms_clma_api_msg_dispatcher 
FAILED: type 0

There are no traces enabled when issue(1) is observed. Issue(3) could be the 
same issue as #405.

________________________________

Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/707/<https://sourceforge.net/p/opensaf/tickets/707>

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/<https://sourceforge.net/auth/subscriptions>



---

** [tickets:#707] Quiesced controller failed to become Active when the standby 
controller rebooted in middle of switchover**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Fri Jan 03, 2014 03:34 PM UTC by Sirisha Alla
**Last Updated:** Thu Sep 18, 2014 10:54 AM UTC
**Owner:** Nagendra Kumar

The issue is observed on changeset 4733 + #220 patches corresponding to cs 4741 
and cs 4742. The test setup is a 4 node SLES 64bit VMs.The setup is single PBE 
enabled loaded with 25k objects.

Following is the steps followed to reproduce the issue.

1) Trigger middleware switchover. Make sure that the IMMND coordinator is on 
standby controller before triggering switchover
2) Reboot the standby controller when the active has just moved to quiesced

The test is tried multiple times and different errors are seen each time

1) AMFD received BAD_HANDLE from IMM. Here SLOT2(SC-2) is the active controller 
at the beginning of the test

Jan  3 14:42:13 SLES-64BIT-SLOT2 osafimmpbed: NO Successfully opened 
pre-existing sqlite pbe file /home/sirisha/immsv/immpbe/imm.db
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Failed to stop cluster 
tracking 5
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ClmTrack stop failed
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafrded[2375]: NO rde_rde_set_role: role set 
to 3
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Node 'SC-1' left the cluster
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8884]: NO exiting on signal 15
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected 
74 <445, 2020f> (@OpenSafImmReplicatorB)
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafntfd[2430]: NO handle_state_ntfimcn: 
osafntfimcnd process terminated. State change
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 77 
(safMsgGrpService) <320, 2020f>
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 78 
(safCheckPointService) <304, 2020f>
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 79 
(safEvtService) <305, 2020f>
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 80 
(safLckService) <303, 2020f>
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Backup create cmd = 
/usr/lib64/opensaf/smf-backup-create
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Bundle check cmd = 
/usr/lib64/opensaf/smf-bundle-check
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO FAILOVER Quiesced --> Active
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 81 
(MsgQueueService131343) <451, 2020f>
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node check cmd = 
/usr/lib64/opensaf/smf-node-check
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ncs_mbcsv_svc 
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally 
disconnected. Marking it as doomed 81 <451, 2020f> (MsgQueueService131343)
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF repository check cmd = 
/usr/lib64/opensaf/smf-repository-check
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected 
81 <451, 2020f> (MsgQueueService131343)
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cluster reboot cmd = 
/usr/lib64/opensaf/smf-cluster-reboot
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer (applier) 
connected: 82 (@OpenSafImmReplicatorA) <453, 2020f>
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Admin Op Timeout = 
600000000000
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected 
59 <11, 2020f> (safAmfService)
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cli Timeout = 600000000000
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Re-initializing with IMM
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8918]: NO Started
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Reboot Timeout = 
600000000000
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 83 
(safAmfService) <11, 2020f>
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF will use the STEP 
standard set of actions.
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Impl Set Failed for 
SaAmfCompBaseType, returned 9
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO DN for si_swap operation = 
safSi=SC-2N,safApp=OpenSAF
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER exiting since 
avd_imm_impl_set failed
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SI si_swap operation max 
retry = 200
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Max num of campaign 
restarts = 10
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO IMM persist command = 
immdump /etc/opensaf/imm.xml
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node reboot cmd = reboot
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Turn PBE off during upgrade 
= 1
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Enable = 0
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Timeout = 
100000000000
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 84 
(safSmfService) <299, 2020f>
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: ER AMF director unexpectedly 
crashed
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally 
disconnected. Marking it as doomed 83 <11, 2020f> (safAmfService)
Jan  3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 131599, SupervisionTime = 60

2) AMFD received ERR_LIBRARY from IMM. Here SLOT2(SC-2) is the active 
controller at the beginning of the test

Jan  3 15:28:28 SLES-64BIT-SLOT2 osafrded[2359]: NO rde_rde_set_role: role set 
to 3
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Node 'SC-1' left the cluster
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafntfimcnd[2991]: NO exiting on signal 15
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 30 
(safMsgGrpService) <315, 2020f>
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 31 
(safCheckPointService) <332, 2020f>
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 
26 <453, 2020f> (@OpenSafImmReplicatorA)
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafntfd[2418]: NO handle_state_ntfimcn: 
osafntfimcnd process terminated. State change
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 32 
(safLckService) <316, 2020f>
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 33 
(safEvtService) <331, 2020f>
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Backup create cmd = 
/usr/lib64/opensaf/smf-backup-create
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Bundle check cmd = 
/usr/lib64/opensaf/smf-bundle-check
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node check cmd = 
/usr/lib64/opensaf/smf-node-check
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF repository check cmd = 
/usr/lib64/opensaf/smf-repository-check
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cluster reboot cmd = 
/usr/lib64/opensaf/smf-cluster-reboot
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Admin Op Timeout = 
600000000000
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cli Timeout = 600000000000
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Reboot Timeout = 
600000000000
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF will use the STEP 
standard set of actions.
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO DN for si_swap operation = 
safSi=SC-2N,safApp=OpenSAF
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SI si_swap operation max 
retry = 200
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Max num of campaign 
restarts = 10
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO IMM persist command = 
immdump /etc/opensaf/imm.xml
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node reboot cmd = reboot
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Turn PBE off during upgrade 
= 1
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Enable = 0
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Timeout = 
100000000000
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO FAILOVER Quiesced --> Active
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER ncs_mbcsv_svc 
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 34 
(MsgQueueService131343) <456, 2020f>
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer locally 
disconnected. Marking it as doomed 34 <456, 2020f> (MsgQueueService131343)
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 4 
<22, 2020f> (safAmfService)
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Re-initializing with IMM
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER saImmOiImplementerSet 
failed 2
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER exiting since 
avd_imm_impl_set failed
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: ER AMF director unexpectedly 
crashed
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 131599, SupervisionTime = 60
Jan  3 15:28:28 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 
34 <456, 2020f> (MsgQueueService131343)
Jan  3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: WA IMMND - Client Node Get 
Failed for cli_hdl 94489412111
Jan  3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init 
Becoming an applier failed SA_AIS_ERR_TIMEOUT (5)
Jan  3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init() Fail
Jan  3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan  3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for 
message type 6 - ignoring
Jan  3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA ERR_BAD_HANDLE: Client 
1967095153167 not found in server
Jan  3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init 
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan  3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init() Fail
Jan  3 15:28:31 SLES-64BIT-SLOT2 kernel: [  198.527931] md: stopping all md 
devices.
Jan  3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan  3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for 
message type 40 - ignoring
Jan  3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init 
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan  3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init() Fail

3) AMFD received ERR_TIMEOUT from IMM. Here SLOT1(SC-1) is the active 
controller at the beginning of the test

Jan  3 15:25:06 SLES-64BIT-SLOT1 osafimmd[3806]: NO Coord re-elected, resides 
at 2010f
Jan  3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO This IMMND re-elected 
coord redundantly, failover ?
Jan  3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected 
25 <4, 2010f> (@safLogService)
Jan  3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 28 
(safClmService) <15, 2010f>
Jan  3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 29 
(safLogService) <4, 2010f>
Jan  3 15:25:06 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Jan  3 15:25:06 SLES-64BIT-SLOT1 osafrded[3787]: NO rde_rde_set_role: role set 
to 1
Jan  3 15:25:06 SLES-64BIT-SLOT1 osafclmd[3860]: NO ACTIVE request
Jan  3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: ER FAILOVER Active --> 
Quiesced FAILED, ImplementerClear failed 5
Jan  3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: role.cc:583: 
avd_mds_qsd_role_evh: Assertion '0' failed.
Jan  3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: ER AMF director unexpectedly 
crashed
Jan  3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 131343, SupervisionTime = 60
Jan  3 15:25:13 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60
Jan  3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer locally 
disconnected. Marking it as doomed 4 <21, 2010f> (safAmfService)
Jan  3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected 4 
<21, 2010f> (safAmfService)
Jan  3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1471.089956] md: stopping all md 
devices.
Jan  3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.120172] sd 0:0:0:0: [sda] 
Synchronizing SCSI cache
Jan  3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.219424] ohci_hcd 0000:00:06.0: 
PCI INT A disabled
Jan  3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER clms_mds_msg_send FAILED: 2
Jan  3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER 
clms_clma_api_msg_dispatcher FAILED: type 0

There are no traces enabled when issue(1) is observed. Issue(3) could be the 
same issue as #405.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to http://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
http://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to