I mentioned "aware of the timing difference" but forgot to elaborate.
The typical om-client has a retrye lopp for TRY_AGAIN with a short wait time between each try and a max number of retries. This works well/predicatably if the timing for each TRY_AGAIN is stable and "short". By "short" i mean realtime-ish like milliseconds. But if that loop gets repeated ERR_TIMEOUT where each such timeout is 10 seconds (default then you could be looking at applications apparently hanging for minutes. So just treating ERR_TIMEOUT the same as TRY_AGAIN even for idempotent operations has to be done with care. The retry loop timing/number-of-retries needs to be sensitive to TIMEOUT or TRY_AGAIN. /AndersBj On 05/28/2015 02:36 PM, Anders Björnerstedt wrote: > Note that you have to be careful about generalizing the handling of > ERR_TIMEOUT. > That error code is the only error code that literally means you dont > know what happened. > The operation may have been executed or it may be still on going or it > may have failed. > You dont know which. > > For all other error codes (except SA_AIS_OK) including of course > TRY_AGAIN and NO_RESOURCES, > the user knows that the oepration was not executed. > > So handling NO_RESOURCES the same way as TRY_AGAIN is possible. > But handling ERR_TIMEOUT the same way as TRY_AGAIN is only possible if > hte operation is idempotent > and you are aware of the timing difference. > > Idempotent operations are operations where you an invoke a request and > get ack iof the request was already > fulfilled when it was received. saImmOiImplementerClear is one such > idempotent operation. > > saImmOiImplementerSet, classImplementerSet, objectImplementerSet, > classImplementerRelease, objectImplementerRelease > are all idempotent. > > But saImmOmCcbObjectCreate is not. > > Most (probably all) finalize operations are idempotent. > > /AndersBj > > On 05/28/2015 01:19 PM, Nagendra Kumar wrote: >> Hi Hans N, >> Thanks for pointing it out. >> Do you mean, we should add handling of errors like BAD_HANDLE, TIMEOUT in >> immutil function and remove handling from Amf code ? If yes, then I can >> raise an enhancement ticket on imm util and another on 'Amf/other services >> in general' to clean the code. Is that ok? >> >> Thanks >> -Nagu >>> -----Original Message----- >>> From: Hans Nordebäck [mailto:[email protected]] >>> Sent: 28 May 2015 16:32 >>> To: Anders Björnerstedt; Nagendra Kumar; Praveen Malviya >>> Cc: [email protected] >>> Subject: RE: [devel] [PATCH 1 of 1] amfd: try again if ImplementerClear >>> times >>> out >>> >>> Hi, in ticket #1290 BAD_HANDLE, one comment was that preferably should >>> immutil be updated to handle BAD_HANDLE. Probably immutil should also >>> handle other errors, e.g. ERR_TIMEOUT, as for this ticket #1360. >>> >>> /Thanks HansN >>> >>> -----Original Message----- >>> From: Anders Björnerstedt >>> Sent: den 27 maj 2015 09:52 >>> To: Nagendra Kumar; Hans Nordebäck; Praveen Malviya >>> Cc: [email protected] >>> Subject: RE: [devel] [PATCH 1 of 1] amfd: try again if ImplementerClear >>> times >>> out >>> >>> Hi again, >>> >>> I understand that this case is apparently due to message loss IMMND->IMMD. >>> >>> Since change-set: >>> >>> changeset: 3523:5228e50660b4 >>> user: Anders Bjornerstedt <[email protected]> >>> date: Fri Apr 27 15:02:01 2012 +0200 >>> summary: Immsv: IMMND subscribes (mds) in normal mode on IMMD >>> (#2446) >>> >>> IMMND subscribes to IMMD sv-id in NORMAL mode, i.e. not REDUNDANT >>> mode. >>> MDS will then supposedly take care of buffering messages that can not be >>> delivered due to absence of IMMD until the IMMD is back. The IMMD itself >>> MBCPs fevs mesasages (which ImplementerCLear is) to standby before >>> sending. >>> >>> So this message loss could only happen if the IMMD goes down after having >>> received the request, but before having MBCP'ed the request. So this should >>> be very rare indeed. >>> >>> The fix in itself is safe though. >>> >>> I worry about the followingthough. >>> Assume that instead of ImplementerClear, the AMf simply closes the OI >>> handle and re-opens it. >>> This should have exactly the same effect. >>> A close of handle will be registgered and acceptet by the local IMMND and >>> the >>> local IMMND will close the implementer by generating that message internally >>> to the IMMND. If that message gets lost, then the AMFD would here actually >>> get Ok, but we would possibly have an "orphaned" OI. >>> >>> I have not seen one such case though. >>> >>> /AndersBj >>> >>> >>> >>> -----Original Message----- >>> From: Anders Björnerstedt [mailto:[email protected]] >>> Sent: den 27 maj 2015 09:23 >>> To: Nagendra Kumar; Hans Nordebäck; Praveen Malviya >>> Cc: [email protected] >>> Subject: Re: [devel] [PATCH 1 of 1] amfd: try again if ImplementerClear >>> times >>> out >>> >>> Ok, that should be a very rare error scenario, but in this case for this >>> particular operation (ImplementerClear) which is an idempotent operation, >>> then repeating the request should work and be safe. >>> >>> /AndersBj >>> >>> -----Original Message----- >>> From: Nagendra Kumar [mailto:[email protected]] >>> Sent: den 27 maj 2015 08:35 >>> To: Hans Nordebäck; Praveen Malviya >>> Cc: [email protected] >>> Subject: Re: [devel] [PATCH 1 of 1] amfd: try again if ImplementerClear >>> times >>> out >>> >>> Hi Hans N/Anders Bj, >>> Thanks for your review. >>>>>> A log record can be written before the "goto try_again". >>> I would like to get logs record for BAD_HANDLE and other types of errors >>> also. >>> That's why the log record is on top common for all the errors. >>> >>>>>> An alternative to try_again can be to change IMMA_SYNCR_TIMEOUT >>> from the default 10 seconds to a higher value. >>> In the scenario of the ticket(specific), Immd has gone down and the packet >>> has been lost along with Immd and hence increase of wait time will increase >>> the HA unavailability.) So, my thought was it is ok for default wait time >>> for 10 >>> seconds, and try again so that local Immd will respond, hence quicker >>> recovery. >>> >>> Thanks >>> -Nagu >>> >>>> -----Original Message----- >>>> From: Hans Nordebäck [mailto:[email protected]] >>>> Sent: 26 May 2015 18:15 >>>> To: Nagendra Kumar; Praveen Malviya >>>> Cc: [email protected] >>>> Subject: Re: [PATCH 1 of 1] amfd: try again if ImplementerClear times >>>> out >>>> >>>> ack, code review only. Some comments below discussed with AndersBj. >>>> >>>> A log record can be written before the "goto try_again". An >>>> alternative to try_again can be to change IMMA_SYNCR_TIMEOUT from the >>>> default 10 seconds to a higher value. >>>> >>>> /Thanks HansN >>>> >>>> On 05/12/2015 10:44 AM, [email protected] wrote: >>>>> osaf/services/saf/amf/amfd/role.cc | 9 ++++++--- >>>>> 1 files changed, 6 insertions(+), 3 deletions(-) >>>>> >>>>> >>>>> ImplementerClear returns SA_AIS_ERR_TIMEOUT because the message >>> sent >>>>> from immnd to immd got lost because of node down. >>>>> So, Amf need to handle it as try_again. >>>>> After Amf does try_again, Imm returns SA_AIS_ERR_BAD_HANDLE. >>>>> Amf handles SA_AIS_ERR_BAD_HANDLE and reinitializes its interface >>>>> with >>>> Imm. >>>>> In handling SA_AIS_ERR_BAD_HANDLE, Amf sets the implementer in >>>> avd_imm_reinit_bg_thread, so >>>>> when Amf proceeds and calls avd_imm_applier_set, Imm returns >>>>> SA_AIS_ERR_INVALID_PARAM. Since, it is expected error from Imm, Amf >>>>> ignores it. >>>>> >>>>> diff --git a/osaf/services/saf/amf/amfd/role.cc >>>> b/osaf/services/saf/amf/amfd/role.cc >>>>> --- a/osaf/services/saf/amf/amfd/role.cc >>>>> +++ b/osaf/services/saf/amf/amfd/role.cc >>>>> @@ -659,15 +659,17 @@ void avd_mds_qsd_role_evh(AVD_CL_CB *cb, >>>>> _exit(EXIT_FAILURE); // should never get here... >>>>> } >>>>> >>>>> +try_again: >>>>> /* Take mutex here to sync with imm reinit thread.*/ >>>>> osaf_mutex_lock_ordie(&imm_reinit_mutex); >>>>> - >>>>> /* Give up IMM OI implementer role */ >>>>> if ((rc = immutil_saImmOiImplementerClear(cb->immOiHandle)) != >>>> SA_AIS_OK) { >>>>> osaf_mutex_unlock_ordie(&imm_reinit_mutex); >>>>> LOG_ER("FAILOVER Active --> Quiesced FAILED, >>>> ImplementerClear failed %u", rc); >>>>> if (rc == SA_AIS_ERR_BAD_HANDLE) { >>>>> avd_imm_reinit_bg(); >>>>> + } else if (rc == SA_AIS_ERR_TIMEOUT) { >>>>> + goto try_again; >>>>> } else >>>>> osafassert(0); >>>>> } else >>>>> @@ -685,10 +687,11 @@ void avd_mds_qsd_role_evh(AVD_CL_CB *cb, >>>>> LOG_ER("avd_imm_applier_set FAILED, %u", rc); >>>>> if (rc == SA_AIS_ERR_BAD_HANDLE) { >>>>> avd_imm_reinit_bg(); >>>>> - } else if (rc == SA_AIS_ERR_EXIST) { >>>>> + } else if ((rc == SA_AIS_ERR_EXIST) || (rc == >>>> SA_AIS_ERR_INVALID_PARAM)) { >>>>> /* This may arise if >>>> immutil_saImmOiImplementerClear >>>>> failed and amf reinitializes imm interface >>>>> and >>>>> - set applier in avd_imm_reinit_bg_thread.*/ >>>>> + set applier in avd_imm_reinit_bg_thread. Imm may >>>>> + return ERR_EXIST or INVALID_PARAM. */ >>>>> } else >>>>> osafassert(0); >>>>> } else >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Opensaf-devel mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Opensaf-devel mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > > ------------------------------------------------------------------------------ > _______________________________________________ > Opensaf-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
