Hi,
comments inline.
On 2017/03/16 07:33 PM, David Hoyt wrote:
> Some additional info.
>
> I found out that the users were testing in a lab that had a delay between the
> two SC nodes. The delay was added for geographical redundancy testing.
> Once the time was reduced, the timeout error for the opensaf swap went away.
>
> In looking through the osafimmnd log file, I see the following:
> Mar 14 11:31:48.320965 osafimmnd [21104:ImmModel.cc:12042] T5 Forcing Adm Req
> continuation to expire 609885356033
> ...
> Mar 14 11:31:48.601903 osafimmnd [21104:ImmModel.cc:12437] T5 Timeout on
> AdministrativeOp continuation 609885356033 tmout:1
> Mar 14 11:31:48.601952 osafimmnd [21104:ImmModel.cc:11311] T5 REQ ADM
> CONTINUATION 5069295 FOUND FOR 609885356033
> Mar 14 11:31:48.601987 osafimmnd [21104:immnd_proc.c:1086] WA Timeout on
> syncronous admin operation 1
>
>
> The code around line 12042 of file ImmModel.cc is as follows:
>
> 12040 for(ci2=sAdmReqContinuationMap.begin();
> ci2!=sAdmReqContinuationMap.end(); ++ci2) {
> 12041 if((ci2->second.mTimeout) && (ci2->second.mImplId ==
> implHandle)) {
> 12042 TRACE_5("Forcing Adm Req continuation to expire %llu",
> ci2->first);
> 12043 ci2->second.mTimeout = 1; /* one second is minimum
> timeout. */
> 12044 }
> 12045 }
>
>
> Right after the log at line 12042 is generated, the timeout value is updated
> to 1 second (line12043).
The node where the adminoperation is targeted went down from OpenSAF
perspective.
Then the minimum timeout of 1 second is updated.
> Can I increase this to 2 seconds?
OpenSAF, noted the other node as down, increasing to 2 seconds what
additional benefit can be achieved?
> If so, would it cause any badness?
Explain, what is the end result you are targeting.
Regards,
Neel.
>
> Regards,
> David
>
>
> From: David Hoyt
> Sent: Tuesday, March 14, 2017 2:45 PM
> To: [email protected]
> Subject: si-swap opensaf SUs results in error but the action still completes
>
> Hi all,
>
> Setup:
>
> - Opensaf 4.6 running on RHEL 6.6 VMs with TCP
>
> - 2 controllers, 4 payloads
>
> - SC-1's hostname=sb117vm0
>
> - SC-2's hostname=sb117vm1
>
>
> I'm encountering a timeout when issuing a si-swap of the opensaf SUs.
> However, the swap actually completes, so why does the timeout error occur?
>
>
> Before the swap, SC-1/SC-2 were active/standby:
> [root@sb117vm0 ~]# date ; amf-state siass | grep -A 1
> "safSi=SC-2N,safApp=OpenSAF"
> Tue Mar 14 11:30:58 EDT 2017
> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
> saAmfSISUHAState=ACTIVE(1)
> --
> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
> saAmfSISUHAState=STANDBY(2)
> [root@sb117vm0 ~]#
>
>
> The following shows the error response from the si-swap as well as the
> opensaf SU status after the swap. The status indicates that the swap did
> occur.
> [root@sb117vm0 ~]# date ; amf-adm si-swap safSi=SC-2N,safApp=OpenSAF; date ;
> amf-state su | grep -A 4 "safSg=2N,safApp=OpenSAF"; amf-state siass | grep
> -A 2 "safSi=SC-2N,safApp=OpenSAF"
> Tue Mar 14 11:31:41 EDT 2017
> error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5)
> Tue Mar 14 11:31:48 EDT 2017
> safSu=SC-1,safSg=2N,safApp=OpenSAF
> saAmfSUAdminState=UNLOCKED(1)
> saAmfSUOperState=ENABLED(1)
> saAmfSUPresenceState=INSTANTIATED(3)
> saAmfSUReadinessState=IN-SERVICE(2)
> --
> safSu=SC-2,safSg=2N,safApp=OpenSAF
> saAmfSUAdminState=UNLOCKED(1)
> saAmfSUOperState=ENABLED(1)
> saAmfSUPresenceState=INSTANTIATED(3)
> saAmfSUReadinessState=IN-SERVICE(2)
>
>
> safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
> saAmfSISUHAState=STANDBY(2)
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> --
> safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
> saAmfSISUHAState=ACTIVE(1)
> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
> [root@sb117vm0 ~]#
>
>
> Below provides a snippet of the /var/log/messages file from both SCs.
> Again, the logs show the swap action completed as expected - with the
> exception of the osafimmnd timeout.
> Also attached are the osafamfnd log files from each SC.
>
> sb117vm0 (SC-1)
>
> Mar 14 11:31:41 sb117vm0 osafamfd[21236]: NO safSi=SC-2N,safApp=OpenSAF Swap
> initiated
> Mar 14 11:31:41 sb117vm0 osafamfnd[21267]: NO Assigning
> 'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
> Mar 14 11:31:41 sb117vm0 osafimmnd[21104]: NO Implementer disconnected 65 <3,
> 2010f> (safLogService)
> Mar 14 11:31:42 sb117vm0 osafimmnd[21104]: NO Implementer disconnected 66 <7,
> 2010f> (safClmService)
> Mar 14 11:31:43 sb117vm0 osafamfnd[21267]: NO Assigned
> 'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
> Mar 14 11:31:43 sb117vm0 osafimmnd[21104]: NO Implementer connected: 69
> (safLogService) <0, 2020f>
> Mar 14 11:31:43 sb117vm0 osafimmnd[21104]: NO Implementer connected: 70
> (safClmService) <0, 2020f>
> Mar 14 11:31:43 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
> PRTA update Ccb:10000017e/4294967678
> Mar 14 11:31:43 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
> to start prepare of ccb:10000017e/4294967678
> Mar 14 11:31:43 sb117vm0 osafamfnd[21267]: NO Assigning
> 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
> Mar 14 11:31:43 sb117vm0 osafamfnd[21267]: NO Assigned
> 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
> Mar 14 11:31:44 sb117vm0 osafamfd[21236]: NO Controller switch over initiated
> Mar 14 11:31:44 sb117vm0 osafamfd[21236]: NO ROLE SWITCH Active --> Quiesced
> Mar 14 11:31:44 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
> PRTA update Ccb:10000017f/4294967679
> Mar 14 11:31:44 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
> to start prepare of ccb:10000017f/4294967679
> Mar 14 11:31:44 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
> PRTA update Ccb:100000180/4294967680
> Mar 14 11:31:45 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
> to start prepare of ccb:100000180/4294967680
> Mar 14 11:31:45 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
> PRTA update Ccb:100000181/4294967681
> Mar 14 11:31:45 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
> to start prepare of ccb:100000181/4294967681
> Mar 14 11:31:46 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
> PRTA update Ccb:100000182/4294967682
> Mar 14 11:31:46 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
> to start prepare of ccb:100000182/4294967682
> Mar 14 11:31:46 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
> PRTA update Ccb:100000183/4294967683
> Mar 14 11:31:46 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
> to start prepare of ccb:100000183/4294967683
> Mar 14 11:31:47 sb117vm0 osafrded[21057]: NO RDE role set to QUIESCED
> Mar 14 11:31:48 sb117vm0 osafimmnd[21104]: NO Implementer disconnected 68
> <1161150, 2010f> (safAmfService)
> Mar 14 11:31:48 sb117vm0 osafimmnd[21104]: NO Implementer (applier)
> connected: 71 (@safAmfService2010f) <1161150, 2010f>
> Mar 14 11:31:48 sb117vm0 osafimmnd[21104]: WA Timeout on syncronous admin
> operation 1
> Mar 14 11:31:50 sb117vm0 osafimmnd[21104]: NO Implementer disconnected 67 <0,
> 2020f> (@safAmfService2020f)
> Mar 14 11:31:50 sb117vm0 osafimmnd[21104]: NO Implementer connected: 72
> (safAmfService) <0, 2020f>
> Mar 14 11:31:50 sb117vm0 osafamfd[21236]: NO Switching Quiesced --> StandBy
> Mar 14 11:31:50 sb117vm0 osafrded[21057]: NO RDE role set to STANDBY
> Mar 14 11:31:50 sb117vm0 osafamfd[21236]: NO Controller switch over done
>
>
> sb117vm1 (SC-2)
>
> Mar 14 11:31:41 sb117vm1 osafimmnd[1898]: NO Implementer disconnected 65 <0,
> 2010f> (safLogService)
> Mar 14 11:31:42 sb117vm1 osafimmnd[1898]: NO Implementer disconnected 66 <0,
> 2010f> (safClmService)
> Mar 14 11:31:43 sb117vm1 osafamfnd[1994]: NO Assigning
> 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
> Mar 14 11:31:43 sb117vm1 osafimmd[1886]: WA IMMD not re-electing coord for
> switch-over (si-swap) coord at (2010f)
> Mar 14 11:31:43 sb117vm1 osafimmnd[1898]: NO Implementer connected: 69
> (safLogService) <3, 2020f>
> Mar 14 11:31:43 sb117vm1 osafimmnd[1898]: NO Implementer connected: 70
> (safClmService) <4, 2020f>
> Mar 14 11:31:43 sb117vm1 osaflogd[1923]: WA read_logsv_configuration(). All
> attributes could not be read
> Mar 14 11:31:43 sb117vm1 osafamfnd[1994]: NO Assigned
> 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
> Mar 14 11:31:43 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
> primary on PRTA update ccb:10000017e
> Mar 14 11:31:43 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
> ccbId:10000017e/4294967678 numOps:1
> Mar 14 11:31:44 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
> primary on PRTA update ccb:10000017f
> Mar 14 11:31:44 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
> ccbId:10000017f/4294967679 numOps:1
> Mar 14 11:31:44 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
> primary on PRTA update ccb:100000180
> Mar 14 11:31:45 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
> ccbId:100000180/4294967680 numOps:1
> Mar 14 11:31:45 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
> primary on PRTA update ccb:100000181
> Mar 14 11:31:45 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
> ccbId:100000181/4294967681 numOps:1
> Mar 14 11:31:46 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
> primary on PRTA update ccb:100000182
> Mar 14 11:31:46 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
> ccbId:100000182/4294967682 numOps:1
> Mar 14 11:31:46 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
> primary on PRTA update ccb:100000183
> Mar 14 11:31:46 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
> ccbId:100000183/4294967683 numOps:1
> Mar 14 11:31:48 sb117vm1 osafimmnd[1898]: NO Implementer disconnected 68 <0,
> 2010f> (safAmfService)
> Mar 14 11:31:48 sb117vm1 osafimmnd[1898]: NO Implementer (applier) connected:
> 71 (@safAmfService2010f) <0, 2010f>
> Mar 14 11:31:50 sb117vm1 osafamfd[1973]: NO Switching StandBy --> Active State
> Mar 14 11:31:50 sb117vm1 osafimmnd[1898]: NO Implementer disconnected 67 <9,
> 2020f> (@safAmfService2020f)
> Mar 14 11:31:50 sb117vm1 osafimmnd[1898]: NO Implementer connected: 72
> (safAmfService) <9, 2020f>
> Mar 14 11:31:50 sb117vm1 osafrded[1850]: NO RDE role set to ACTIVE
> Mar 14 11:31:50 sb117vm1 osafclmd[1950]: NO ACTIVE request
> Mar 14 11:31:50 sb117vm1 osafamfd[1973]: NO Controller switch over done
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users