Re: [users] si-swap opensaf SUs results in error but the action still completes

Alex Jones Mon, 20 Mar 2017 06:34:05 -0700

What version are you running?

Alex


On 03/20/2017 09:19 AM, David Hoyt wrote:
> Correction, I believe the default time-out is 60 seconds, not 10.
> 
> / /
> 
> Regards,
> 
> /David/
> 
>  
> 
>  
> 
> *From:* David Hoyt
> *Sent:* Monday, March 20, 2017 9:19 AM
> *To:* Alex Jones <[email protected]>; Neelakanta Reddy
> <[email protected]>; [email protected]
> *Subject:* RE: [users] si-swap opensaf SUs results in error but the
> action still completes
> 
>  
> 
> Alex, isn't the default time-out 10 seconds?
> 
> If so, then why did immnd time-out ~7 seconds later?
> 
>  
> 
> Mar 14 11:31:41 sb117vm0 osafamfd[21236]: NO safSi=SC-2N,safApp=OpenSAF
> Swap initiated
> 
> …
> 
> Mar 14 11:31:48 sb117vm0 osafimmnd[21104]: WA Timeout on syncronous
> admin operation 1
> 
> / /
> 
> Regards,
> 
> /David/
> 
>  
> 
>  
> 
> -----Original Message-----
> From: Alex Jones
> Sent: Saturday, March 18, 2017 9:41 AM
> To: David Hoyt <[email protected] <mailto:[email protected]>>;
> Neelakanta Reddy <[email protected]
> <mailto:[email protected]>>;
> [email protected]
> <mailto:[email protected]>
> Subject: RE: [users] si-swap opensaf SUs results in error but the action
> still completes
> 
>  
> 
> David,
> 
>  
> 
>   You can pass "-t <timeout in seconds> to "amf-adm" to set the timeout
> to whatever you want.
> 
>  
> 
> Alex
> 
>  
> 
> ________________________________________
> 
> From: David Hoyt [[email protected]]
> 
> Sent: Friday, March 17, 2017 9:35 AM
> 
> To: Neelakanta Reddy; [email protected]
> <mailto:[email protected]>
> 
> Subject: Re: [users] si-swap opensaf SUs results in error but the
> action        still completes
> 
>  
> 
> Hi Neel,
> 
>  
> 
> The purpose of the test is to see if our system can continue to run
> “normally” when in a geographical configuration.
> 
> That is, both SCs are NOT co-located, but reside thousands of km apart.
> 
> This is simulated in the lab by adding a delay between the two severs
> which host the SCs.
> 
>  
> 
> What we’re seeing is that when the delay is increased to a certain
> value, the si-swap command between the two OpenSAF SUs results in an error.
> 
> [root@sb117vm0 ~]# date ; amf-adm si-swap safSi=SC-2N,safApp=OpenSAF;
> Tue Mar 14 11:31:41 EDT 2017 error - saImmOmAdminOperationInvoke_2
> FAILED: SA_AIS_ERR_TIMEOUT (5)
> 
>  
> 
> However, the logs show that the action actually completes about 2
> seconds after the timeout.
> 
> Mar 14 11:31:48 sb117vm0 osafimmnd[21104]: WA Timeout on syncronous
> admin operation 1 Mar 14 11:31:50 sb117vm0 osafimmnd[21104]: NO
> Implementer disconnected 67 <0, 2020f> (@safAmfService2020f) Mar 14
> 11:31:50 sb117vm0 osafimmnd[21104]: NO Implementer connected: 72
> (safAmfService) <0, 2020f> Mar 14 11:31:50 sb117vm0 osafamfd[21236]: NO
> Switching Quiesced --> StandBy Mar 14 11:31:50 sb117vm0 osafrded[21057]:
> NO RDE role set to STANDBY Mar 14 11:31:50 sb117vm0 osafamfd[21236]: NO
> Controller switch over done
> 
>  
> 
> I’m trying to determine if there’s some way to delay the immnd time-out
> so that the si-swap command returns success.
> 
> Regards,
> 
> David
> 
>  
> 
>  
> 
> From: Neelakanta Reddy [mailto:[email protected]]
> 
> Sent: Friday, March 17, 2017 7:10 AM
> 
> To: David Hoyt <[email protected] <mailto:[email protected]>>;
> [email protected]
> <mailto:[email protected]>
> 
> Subject: Re: [users] si-swap opensaf SUs results in error but the action
> still completes
> 
>  
> 
> ________________________________
> 
> NOTICE: This email was received from an EXTERNAL sender
> ________________________________
> 
>  
> 
> Hi,
> 
>  
> 
> comments inline.
> 
>  
> 
> On 2017/03/16 07:33 PM, David Hoyt wrote:
> 
>> Some additional info.
> 
>> 
> 
>> I found out that the users were testing in a lab that had a delay
> between the two SC nodes. The delay was added for geographical
> redundancy testing.
> 
>> Once the time was reduced, the timeout error for the opensaf swap went
> away.
> 
>> 
> 
>> In looking through the osafimmnd log file, I see the following:
> 
>> Mar 14 11:31:48.320965 osafimmnd [21104:ImmModel.cc:12042] T5 Forcing
> 
>> Adm Req continuation to expire 609885356033 ...
> 
>> Mar 14 11:31:48.601903 osafimmnd [21104:ImmModel.cc:12437] T5 Timeout
> 
>> on AdministrativeOp continuation 609885356033 tmout:1 Mar 14
> 
>> 11:31:48.601952 osafimmnd [21104:ImmModel.cc:11311] T5 REQ ADM
> 
>> CONTINUATION 5069295 FOUND FOR 609885356033 Mar 14 11:31:48.601987
> 
>> osafimmnd [21104:immnd_proc.c:1086] WA Timeout on syncronous admin
> 
>> operation 1
> 
>> 
> 
>> 
> 
>> The code around line 12042 of file ImmModel.cc is as follows:
> 
>> 
> 
>> 12040 for(ci2=sAdmReqContinuationMap.begin();
> 
>> ci2!=sAdmReqContinuationMap.end(); ++ci2) {
> 
>> 12041 if((ci2->second.mTimeout) && (ci2->second.mImplId ==
> 
>> implHandle)) {
> 
>> 12042 TRACE_5("Forcing Adm Req continuation to expire %llu",
> 
>> ci2->first);
> 
>> 12043 ci2->second.mTimeout = 1; /* one second is minimum timeout. */
> 
>> 12044 }
> 
>> 12045 }
> 
>> 
> 
>> 
> 
>> Right after the log at line 12042 is generated, the timeout value is
> updated to 1 second (line12043).
> 
> The node where the adminoperation is targeted went down from OpenSAF
> perspective.
> 
> Then the minimum timeout of 1 second is updated.
> 
>> Can I increase this to 2 seconds?
> 
> OpenSAF, noted the other node as down, increasing to 2 seconds what
> additional benefit can be achieved?
> 
>  
> 
>> If so, would it cause any badness?
> 
> Explain, what is the end result you are targeting.
> 
>  
> 
> Regards,
> 
> Neel.
> 
>> 
> 
>> Regards,
> 
>> David
>

signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] si-swap opensaf SUs results in error but the action still completes

Reply via email to