Hi Mathi The fix is ok but I have found something related that maybe should be handled in the same fix.
According to Anders B (maintainer of IMM) the TRY AGAIN timeout in these situation should be 60 sec because of the time I may take for IMM to reinitialize in some situations (see also #308). This is fulfilled if lgs_imm_impl_set() function is used but I have found places in the log service where immutil still is used: amf_active_state_handler() Sets nTries to 250 and errorsAreFatal = 0 which means that it is possible to get stuck here in 100 sec (or even more since there are several immutil functions used in this function with these timeout settings). lgs_imm_activate() As above. Only used during log initialization lgs_become_imm_applier() This function is used in several places. This function is also using immutil functions as above. amf_standby_state_handler() Using function lgs_become_imm_applier() log_initialize() Using function lgs_become_imm_applier() In these situations failed implementer handling does not abort (in immutil) and TRY AGAIN timeout is more than 60 sec, so maybe we can leave it for now. I think this should be fixed so that all OI and Applier handling is done the same way (by using lgs_imm_impl_set() function). The only thing that will be different is that the log service may run for up to 60 sec without an OI or Applier. Also when becoming an applier (standby lgs) settings from the IMM class "OpenSafLogConfig" should be reread just to be sure that the settings are correct if there are changes done before we have become an applier. Maybe you can just push your changes as is. The rest could be handled via a separate ticket? What is your opinion? > -----Original Message----- > From: mathi.naic...@oracle.com [mailto:mathi.naic...@oracle.com] > Sent: den 1 oktober 2013 11:15 > To: Lennart Lund > Cc: opensaf-devel@lists.sourceforge.net > Subject: [PATCH 1 of 1] logsv: improved error handling in rda callback [#567] > > osaf/services/saf/logsv/lgs/lgs_evt.c | 31 +++++-------------------------- > 1 files changed, 5 insertions(+), 26 deletions(-) > > > The patch introduces the following changes: > Exit when MDS or MBCSv role change fails > Avoid the 'goto done' as it was skipping the agent down processing > Become implementer in a separate thread > > diff --git a/osaf/services/saf/logsv/lgs/lgs_evt.c > b/osaf/services/saf/logsv/lgs/lgs_evt.c > --- a/osaf/services/saf/logsv/lgs/lgs_evt.c > +++ b/osaf/services/saf/logsv/lgs/lgs_evt.c > @@ -478,12 +478,12 @@ static uint32_t proc_rda_cb_msg(lgsv_lgs > > if ((rc = lgs_mds_change_role(lgs_cb)) != > NCSCC_RC_SUCCESS) { > LOG_ER("lgs_mds_change_role FAILED %u", rc); > - goto done; > + exit(EXIT_FAILURE); > } > > if ((rc = lgs_mbcsv_change_HA_state(lgs_cb)) != > NCSCC_RC_SUCCESS) { > LOG_ER("lgs_mbcsv_change_HA_state FAILED %u", > rc); > - goto done; > + exit(EXIT_FAILURE); > } > > /* fail over, become implementer > @@ -492,27 +492,9 @@ static uint32_t proc_rda_cb_msg(lgsv_lgs > TRACE("Give up applier role and become implementer"); > lgs_giveup_imm_applier(lgs_cb); > > - immutilWrapperProfile.nTries = 250; /* LOG will be blocked > until IMM responds */ > - immutilWrapperProfile.errorsAreFatal = 0; > - if ((rc = immutil_saImmOiImplementerSet(lgs_cb- > >immOiHandle, "safLogService")) > - != SA_AIS_OK) { > - > LOG_ER("immutil_saImmOiImplementerSet(safLogService) FAILED > %u", rc); > - goto done; > - } > - if ((rc = immutil_saImmOiClassImplementerSet(lgs_cb- > >immOiHandle, > - != SA_AIS_OK) { > - > LOG_ER("immutil_saImmOiImplementerSet(SaLogStreamConfig) > FAILED %u", rc); > - goto done; > - } > - /* Do this only if the class exists */ > - if (true == *(bool*) > lgs_imm_logconf_get(LGS_IMM_LOG_OPENSAFLOGCONFIG_CLASS_EXIST, > NULL)) { > - if ((rc = > immutil_saImmOiClassImplementerSet(lgs_cb->immOiHandle, > "OpenSafLogConfig")) > - != SA_AIS_OK) { > - > LOG_ER("immutil_saImmOiImplementerSet(OpenSafLogConfig) > FAILED %u", rc); > - goto done; > - } > - } > - > + /* Declare implementership from a separate thread */ > + lgs_imm_impl_set(lgs_cb); > + > /* Agent down list has to be processed first */ > lgs_process_lga_down_list(); > > @@ -526,9 +508,6 @@ static uint32_t proc_rda_cb_msg(lgsv_lgs > } > } > > -done: > - immutilWrapperProfile.nTries = 20; /* Reset retry time to more > normal value. */ > - immutilWrapperProfile.errorsAreFatal = 1; > TRACE_LEAVE(); > return rc; > } ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel