Hi Minh,
Thanks for reviewing and testing the patch. For record purpose, I have
published V2 after merging both the patches.
Regarding refactoring, I remember the ticket raised by a user which says:
"#809 AMF: ensure IMM is updated before notification is sent came for
user list"
So without refactoring, with this patch and with #314, AMF is very close
to the fix of #809.
Thanks,
Praveen
On 22-Nov-16 12:35 PM, minh chau wrote:
> Hi Praveen,
>
> I think your additional patch can address the issue that job's removed
> due to unhandled error code.
> Just one minor thing, this Fifo::execute now is for IMM, NTF job (going
> to use it for CLM), and @avd_imm_status affects to all other jobs. I
> guess we can improve/refactor it later with conditional job execution
> for particular job types.
>
> Ack (tested only).
>
> Thanks,
> Minh
>
> On 22/11/16 17:28, praveen malviya wrote:
>> Hi Minh,
>>
>> This is happening when initialization with IMM is going on and AMFD
>> tries to update in IMM from job queue. I think (wild guess), the
>> reason of IMM returning two different written code is: there are two
>> phases a)one is initialization and b) implementer set.
>> Since we cannot handle each return code, I think, AMFD should not
>> execute IMM jobs when init is going on. A patch on top of published
>> patch will look like this:
>> diff --git a/osaf/services/saf/amf/amfd/imm.cc
>> b/osaf/services/saf/amf/amfd/imm.cc
>> --- a/osaf/services/saf/amf/amfd/imm.cc
>> +++ b/osaf/services/saf/amf/amfd/imm.cc
>> @@ -411,6 +411,10 @@ AvdJobDequeueResultT Fifo::execute(const
>> return JOB_EINVH;
>> }
>>
>> + if (avd_cb->avd_imm_status == AVD_IMM_INIT_ONGOING) {
>> + TRACE("Already IMM init is going, try again after
>> sometime");
>> + return JOB_ETRYAGAIN;
>> + }
>> if ((ajob = peek()) == nullptr)
>> return JOB_ENOTEXIST;
>>
>>
>> what do you think?
>>
>> Thanks,
>> Praveen
>> On 21-Nov-16 7:12 PM, minh chau wrote:
>>> Hi Praveen,
>>>
>>> I have rerun the test with the patch, the test is "lock si" with
>>> simulated pbe hanging
>>> I got this error
>>>
>>> Nov 21 23:52:38.732148 osafamfd [488:imm.cc:0417] >> execute
>>> Nov 21 23:52:38.732159 osafamfd [488:imm.cc:0219] >> exec: Update
>>> 'safSi=AmfDemoTwon,safApp=AmfDemoTwon' saAmfSIAdminState
>>> Nov 21 23:52:38.732171 osafamfd [488:imm.cc:0658] >>
>>> object_name_to_class_type: safSi=AmfDemoTwon,safApp=AmfDemoTwon
>>> Nov 21 23:52:38.732186 osafamfd [488:imm.cc:0704] <<
>>> object_name_to_class_type: 24
>>> Nov 21 23:52:38.732198 osafamfd [488:imma_oi_api.c:2446] >>
>>> rt_object_update_common
>>> Nov 21 23:52:38.732249 osafamfd [488:imma_oi_api.c:2519] ER
>>> ERR_BAD_OPERATION: The SaImmOiHandleT is not associated with any
>>> implementer name
>>> Nov 21 23:52:38.732259 osafamfd [488:imma_oi_api.c:2719] <<
>>> rt_object_update_common
>>> ...
>>> Nov 21 23:52:38.732692 osafamfd [488:imm.cc:0250] ER exec: update
>>> FAILED 20
>>> Nov 21 23:52:38.733654 osafamfd [488:imm.cc:0254] << exec
>>> Nov 21 23:52:38.733668 osafamfd [488:imm.cc:0421] << execute: 5
>>>
>>> Sometimes I got error code "2" instead
>>>
>>> Nov 22 0:15:59.462401 osafamfd [486:imm.cc:0417] >> execute
>>> Nov 22 0:15:59.462405 osafamfd [486:imm.cc:0219] >> exec: Update
>>> 'safSi=AmfDemoTwon,safApp=AmfDemoTwon' saAmfSIAdminState
>>> Nov 22 0:15:59.462410 osafamfd [486:imm.cc:0658] >>
>>> object_name_to_class_type: safSi=AmfDemoTwon,safApp=AmfDemoTwon
>>> Nov 22 0:15:59.462490 osafamfd [486:imm.cc:0704] <<
>>> object_name_to_class_type: 24
>>> Nov 22 0:15:59.462637 osafamfd [486:imma_oi_api.c:2446] >>
>>> rt_object_update_common
>>> Nov 22 0:15:59.462649 osafamfd [486:imma_oi_api.c:2611] T4 ERR_LIBRARY:
>>> Overlapping use of IMM OI handle by multiple threads
>>> Nov 22 0:15:59.462654 osafamfd [486:imma_oi_api.c:2719] <<
>>> rt_object_update_common
>>> Nov 22 0:15:59.462707 osafamfd [486:imm.cc:0250] ER exec: update
>>> FAILED 2
>>> Nov 22 0:15:59.462714 osafamfd [486:imm.cc:0254] << exec
>>> Nov 22 0:15:59.462718 osafamfd [486:imm.cc:0421] << execute: 5
>>>
>>> The "exec: update FAILED" is supposed to remove the update job of
>>> saAmfSIAdminState out of the queue, I think
>>>
>>> Thanks,
>>> Minh
>>>
>>> On 21/11/16 20:51, [email protected] wrote:
>>>> osaf/services/saf/amf/amfd/imm.cc | 16 +++++++++++-----
>>>> osaf/services/saf/amf/amfd/include/cb.h | 6 ++++++
>>>> osaf/services/saf/amf/amfd/main.cc | 2 ++
>>>> 3 files changed, 19 insertions(+), 5 deletions(-)
>>>>
>>>>
>>>> When AMFD receives BAD_HANDLE from IMM, it calls avd_imm_reinit_bg()
>>>> to init with IMM in a separate thread. After this if again main thread
>>>> tries to execute some job for IMM update, IMM will return BAD_HANDLE
>>>> and AMFD
>>>> will again call avd_imm_reinit_bg() to init with IMM. This can happen
>>>> when AMFD tries
>>>> to update IMM in job queues.
>>>>
>>>> Patch fixes this problem.
>>>> Also if avd_saImmOiRtObjectUpdate_sync() fails with BAD_HANDLE,
>>>> update will be pushed in jon queue to be updated later.
>>>>
>>>> diff --git a/osaf/services/saf/amf/amfd/imm.cc
>>>> b/osaf/services/saf/amf/amfd/imm.cc
>>>> --- a/osaf/services/saf/amf/amfd/imm.cc
>>>> +++ b/osaf/services/saf/amf/amfd/imm.cc
>>>> @@ -1368,6 +1368,7 @@ SaAisErrorT avd_imm_init(void *avd_cb)
>>>> immutilWrapperProfile.nTries = 180;
>>>> + cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
>>>> if ((error = immutil_saImmOiInitialize_2(&cb->immOiHandle,
>>>> &avd_callbacks, &immVersion)) != SA_AIS_OK) {
>>>> LOG_ER("saImmOiInitialize failed %u", error);
>>>> goto done;
>>>> @@ -1383,6 +1384,7 @@ SaAisErrorT avd_imm_init(void *avd_cb)
>>>> goto done;
>>>> }
>>>> + cb->avd_imm_status = AVD_IMM_INIT_DONE;
>>>> TRACE("Successfully initialized IMM");
>>>> done:
>>>> @@ -1639,6 +1641,7 @@ SaAisErrorT avd_saImmOiRtObjectUpdate_sy
>>>> const SaImmAttrModificationT_2 *attrMods[] = {&attrMod, nullptr};
>>>> SaImmAttrValueT attrValues[] = {value};
>>>> + const std::string attribute_name(attributeName);
>>>> TRACE_ENTER2("'%s' %s", dn.c_str(), attributeName);
>>>> attrMod.modType = modifyType;
>>>> @@ -1651,6 +1654,9 @@ SaAisErrorT avd_saImmOiRtObjectUpdate_sy
>>>> if (rc != SA_AIS_OK) {
>>>> LOG_WA("saImmOiRtObjectUpdate of '%s' %s failed with %u",
>>>> dn.c_str(), attributeName, rc);
>>>> + //Now it will be updated through job queue.
>>>> + avd_saImmOiRtObjectUpdate(dn, attribute_name, attrValueType,
>>>> value);
>>>> +
>>>> }
>>>> return rc;
>>>> }
>>>> @@ -1954,16 +1960,16 @@ void avd_imm_reinit_bg(void)
>>>> int rc = 0;
>>>> TRACE_ENTER();
>>>> + if (avd_cb->avd_imm_status == AVD_IMM_INIT_ONGOING) {
>>>> + TRACE("Already IMM init is going in another thread");
>>>> + return;
>>>> + }
>>>> + avd_cb->avd_imm_status = AVD_IMM_INIT_ONGOING;
>>>> LOG_NO("Re-initializing with IMM");
>>>> osaf_mutex_lock_ordie(&imm_reinit_thread_startup_mutex);
>>>> - (void) saImmOiFinalize(avd_cb->immOiHandle);
>>>> -
>>>> - avd_cb->immOiHandle = 0;
>>>> - avd_cb->is_implementer = false;
>>>> -
>>>> pthread_attr_init(&attr);
>>>> pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
>>>> diff --git a/osaf/services/saf/amf/amfd/include/cb.h
>>>> b/osaf/services/saf/amf/amfd/include/cb.h
>>>> --- a/osaf/services/saf/amf/amfd/include/cb.h
>>>> +++ b/osaf/services/saf/amf/amfd/include/cb.h
>>>> @@ -59,6 +59,11 @@ typedef enum {
>>>> AVD_APP_STATE /* Cluster startup timer has expired */
>>>> } AVD_INIT_STATE;
>>>> +typedef enum {
>>>> + AVD_IMM_INIT_BASE = 1,
>>>> + AVD_IMM_INIT_ONGOING = 2,
>>>> + AVD_IMM_INIT_DONE = 3,
>>>> +} AVD_IMM_INIT_STATUS;
>>>> /*
>>>> * Sync state of the Standby.
>>>> */
>>>> @@ -237,6 +242,7 @@ typedef struct cl_cb_tag {
>>>> /* The duration that amfd should tolerate the absence of SCs */
>>>> uint32_t scs_absence_max_duration;
>>>> + AVD_IMM_INIT_STATUS avd_imm_status;
>>>> } AVD_CL_CB;
>>>> extern AVD_CL_CB *avd_cb;
>>>> diff --git a/osaf/services/saf/amf/amfd/main.cc
>>>> b/osaf/services/saf/amf/amfd/main.cc
>>>> --- a/osaf/services/saf/amf/amfd/main.cc
>>>> +++ b/osaf/services/saf/amf/amfd/main.cc
>>>> @@ -562,6 +562,7 @@ static uint32_t initialize(void)
>>>> cb->all_nodes_synced = false;
>>>> cb->node_sync_window_closed = false;
>>>> cb->scs_absence_max_duration = 0;
>>>> + cb->avd_imm_status = AVD_IMM_INIT_BASE;
>>>> if ((val = getenv("AVSV_HB_PERIOD")) != nullptr) {
>>>> cb->heartbeat_tmr_period = strtoll(val, nullptr, 0);
>>>> @@ -687,6 +688,7 @@ static void main_loop(void)
>>>> }
>>>> if (evt->rcv_evt == AVD_IMM_REINITIALIZED) {
>>>> + cb->avd_imm_status = AVD_IMM_INIT_DONE;
>>>> TRACE("Received IMM reinit msg");
>>>> polltmo = retval_to_polltmo(Fifo::execute(cb));
>>>> continue;
>>>>
>>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel