Hi Praveen,
The only admin operations that are going on are inside amfd due to
CLM tracking. No operator is issuing admin lock commands.
Here is the admin lock start:
May 26 14:27:57.501241 osafamfd [22769:node.cc:0983] TR su_cnt_admin_oper:6
May 26 14:27:57.501246 osafamfd [22769:node.cc:1007] <<
avd_node_admin_lock_unlock_shutdown
May 26 14:27:57.501250 osafamfd [22769:clm.cc:0175] << clm_node_exit_start
May 26 14:27:57.501254 osafamfd [22769:clm.cc:0379] << clm_track_cb
Here MDS notifies amfd that the node is down. Node gets deleted here:
May 26 14:28:00.218238 osafamfd [22769:ndfsm.cc:0324] >>
avd_mds_avnd_down_evh: 2010f, 0x72d160
May 26 14:28:00.218253 osafamfd [22769:ndproc.cc:0923] >>
avd_node_failover: 'safAmfNode=SC-1,safAmfCluster=Q50amfCluster'
And, when the SA_CLM_CHANGE_COMPLETED comes the node has already
been deleted:
May 26 14:28:00.265919 osafamfd [22769:clm.cc:0213] >> clm_track_cb: '0'
'4' '1'
May 26 14:28:00.265930 osafamfd [22769:clm.cc:0273] IN clm_track_cb: CLM
node 'safNode=SC-1,safCluster=Q50clmCluster' is not an AMF cluster member
May 26 14:28:00.265938 osafamfd [22769:clm.cc:0379] << clm_track_cb
Do we consider the CLM operation to have completed when the
CHANGE_COMPLETED callback comes?
How do we handle this case?
Alex
On 05/27/2015 09:09 AM, praveen malviya wrote:
>
>
> On 27-May-15 2:58 AM, Alex Jones wrote:
>> Praveen/Nagu,
>>
>> I'm seeing an issue where the node admin state is different
>> between
>> IMM and amfd. I can reproduce this very consistently.
>>
>> If I power down the standby controller (which is also hosting
>> other
>> standby SUs), when it comes back up amfd still thinks the admin state is
>> locked, even though IMM does not. When I am in this state, if I try to
>> force the admin change, I see:
>>
>> imm.cc:1756] >> report_admin_op_error: inv:124554051585, res:6, Error
>> String: 'Clm lock operation going on'
>>
> Before bringing down the node, did admin issue lock on clm node?
>
> I think node was powered down before the completion of CLM lock.
>
> Thanks
> Praveen
>> After looking at the code and the traces, it appears that the
>> ClmResponse to clm_node_exit_start() is never sent.
>> node->su_cnt_admin_oper is 6 which is correct (the number of the SUs),
>> so it waits to send the clm response.
>>
>> I thought maybe we needed to add this to the end of
>> avd_node_down_mw_susi_failover():
>>
>> if (avnd->clm_pend_inv != 0) {
>> // send CLM response
>> LOG_NO("sending CLM response due to node fail");
>> saClmResponse_4(cb->clmHandle, avnd->clm_pend_inv,
>> SA_CLM_CALLBACK_RESPONSE_OK);
>> avnd->clm_pend_inv = 0;
>> }
>>
>> If I add this code, this doesn't totally clear the problem. I
>> still have to manually unlock the amf node when it comes back up.
>>
>> How is this supposed to work?
>>
>> Alex
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Opensaf-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>>
>
------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users