Re: [devel] [PATCH 1/1] amfd: ignore amfnd down event if node state is absent [#3015]

2019-06-23 Thread Minh Hon Chau

Hi Thang,

Ack from me.

Thanks

Minh

On 11/6/19 4:11 pm, Minh Hon Chau wrote:

Hi Thang,

I can see it's a race in main thread that how amfd processes the mds 
down and clm callback.


Node is going down

<143>1 2019-06-11T15:16:42.157517+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="38507"] 275:amf/amfd/mds.cc:398 >> avd_mds_svc_evt
<143>1 2019-06-11T15:16:42.157526+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="38508"] 275:amf/amfd/mds.cc:459 TR avnd 
2030f00bd down
<143>1 2019-06-11T15:16:42.157535+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="38509"] 275:amf/amfd/mds.cc:0 << avd_mds_svc_evt


<143>1 2019-06-11T15:16:48.332481+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="38623"] 272:amf/amfd/clm.cc:226 >> clm_track_cb: '0' 
'4' '1'
<143>1 2019-06-11T15:16:48.33249+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="38624"] 272:amf/amfd/clm.cc:242 TR 
numberOfMembers:'4', numberOfItems:'1'
<143>1 2019-06-11T15:16:48.3325+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38625"] 272:amf/amfd/clm.cc:248 TR i = 0, 
node:'safNode=PL-3,safCluster=myClmCluster', clusterChange:3
<143>1 2019-06-11T15:16:48.33251+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="38626"] 272:amf/amfd/clm.cc:332 TR Node Left: 
rootCauseEntity safNode=PL-3,safCluster=myClmCluster for node 131855
<143>1 2019-06-11T15:16:48.332519+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="38627"] 272:amf/amfd/clm.cc:188 >> 
clm_node_exit_complete: 2030f
<143>1 2019-06-11T15:16:48.332534+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="38628"] 272:amf/amfd/ndproc.cc:1267 >> 
avd_node_failover: 'safAmfNode=PL-3,safAmfCluster=myAmfCluster'
<143>1 2019-06-11T15:16:48.332545+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="38629"] 272:amf/amfd/ndfsm.cc:1153 >> 
avd_node_mark_absent


Node is coming up again

<143>1 2019-06-11T15:16:48.34867+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="39826"] 272:amf/amfd/clm.cc:226 >> clm_track_cb: '0' 
'4' '1'
<143>1 2019-06-11T15:16:48.348674+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="39827"] 272:amf/amfd/clm.cc:242 TR 
numberOfMembers:'5', numberOfItems:'1'
<143>1 2019-06-11T15:16:48.348678+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="39828"] 272:amf/amfd/clm.cc:248 TR i = 0, 
node:'safNode=PL-3,safCluster=myClmCluster', clusterChange:2
<143>1 2019-06-11T15:16:48.348685+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="39829"] 272:amf/amfd/node.cc:53 TR added node 131855
<143>1 2019-06-11T15:16:48.348689+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="39830"] 272:amf/amfd/clm.cc:417 TR Node Joined 
'safNode=PL-3,safCluster=myClmCluster' '36'


Now amfd processes the mds down in main thread, its a race here then 
the @node_info.member set to FALSE


<143>1 2019-06-11T15:16:48.351948+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="39971"] 272:amf/amfd/ndfsm.cc:779 >> 
avd_mds_avnd_down_evh: 2030f, 0x558e549a1650
<143>1 2019-06-11T15:16:48.351954+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="39972"] 272:amf/amfd/ndproc.cc:1267 >> 
avd_node_failover: 'safAmfNode=PL-3,safAmfCluster=myAmfCluster'
<143>1 2019-06-11T15:16:48.351959+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="39973"] 272:amf/amfd/ndfsm.cc:1153 >> 
avd_node_mark_absent


Now the mds up comes, node_up come, but the node is not a clm member

<143>1 2019-06-11T15:16:48.701771+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40552"] 275:amf/amfd/mds.cc:398 >> avd_mds_svc_evt
<143>1 2019-06-11T15:16:48.701791+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40553"] 275:amf/amfd/mds.cc:0 << avd_mds_svc_evt


<143>1 2019-06-11T15:16:48.706254+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40560"] 272:amf/amfd/ndfsm.cc:743 >> 
avd_mds_avnd_up_evh
<143>1 2019-06-11T15:16:48.706271+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40561"] 275:amf/amfd/ndmsg.cc:389 << avd_n2d_msg_rcv
<143>1 2019-06-11T15:16:48.706288+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40562"] 272:amf/amfd/ndfsm.cc:757 TR amfnd on 2030f 
is up
<143>1 2019-06-11T15:16:48.706298+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40563"] 272:amf/amfd/ndfsm.cc:0 << avd_mds_avnd_up_evh


<143>1 2019-06-11T15:16:48.707145+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40596"] 272:amf/amfd/ndfsm.cc:275 >> 
avd_node_up_evh: from 2030f, safAmfNode=PL-3,safAmfCluster=myAmfCluster
<143>1 2019-06-11T15:16:48.707153+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40597"] 272:amf/amfd/ndfsm.cc:292 TR leds_set 0
<143>1 2019-06-11T15:16:48.70716+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40598"] 272:amf/amfd/ndfsm.cc:308 TR node_id '2030f' 
not in failover_list.
<141>1 2019-06-11T15:16:48.707185+10:00 SC-1 osafamfd 272 osafamfd 
[meta sequenceId="40599"] 272:amf/amfd/ndfsm.cc:232 NO Received 
node_up from 2030f: msg_id 1
<140>1 2019-06-11T15:16:48.7072+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40600"] 272:amf/amfd/ndfsm.cc:387 WA Not a Cluster Member 
dropping the msg

Re: [devel] [PATCH 1/1] amfd: ignore amfnd down event if node state is absent [#3015]

2019-06-11 Thread Minh Hon Chau

Hi Thang,

I can see it's a race in main thread that how amfd processes the mds 
down and clm callback.


Node is going down

<143>1 2019-06-11T15:16:42.157517+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38507"] 275:amf/amfd/mds.cc:398 >> avd_mds_svc_evt
<143>1 2019-06-11T15:16:42.157526+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38508"] 275:amf/amfd/mds.cc:459 TR avnd 2030f00bd down
<143>1 2019-06-11T15:16:42.157535+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38509"] 275:amf/amfd/mds.cc:0 << avd_mds_svc_evt


<143>1 2019-06-11T15:16:48.332481+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38623"] 272:amf/amfd/clm.cc:226 >> clm_track_cb: '0' '4' '1'
<143>1 2019-06-11T15:16:48.33249+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38624"] 272:amf/amfd/clm.cc:242 TR numberOfMembers:'4', 
numberOfItems:'1'
<143>1 2019-06-11T15:16:48.3325+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38625"] 272:amf/amfd/clm.cc:248 TR i = 0, 
node:'safNode=PL-3,safCluster=myClmCluster', clusterChange:3
<143>1 2019-06-11T15:16:48.33251+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38626"] 272:amf/amfd/clm.cc:332 TR Node Left: 
rootCauseEntity safNode=PL-3,safCluster=myClmCluster for node 131855
<143>1 2019-06-11T15:16:48.332519+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38627"] 272:amf/amfd/clm.cc:188 >> clm_node_exit_complete: 2030f
<143>1 2019-06-11T15:16:48.332534+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38628"] 272:amf/amfd/ndproc.cc:1267 >> avd_node_failover: 
'safAmfNode=PL-3,safAmfCluster=myAmfCluster'
<143>1 2019-06-11T15:16:48.332545+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="38629"] 272:amf/amfd/ndfsm.cc:1153 >> avd_node_mark_absent


Node is coming up again

<143>1 2019-06-11T15:16:48.34867+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="39826"] 272:amf/amfd/clm.cc:226 >> clm_track_cb: '0' '4' '1'
<143>1 2019-06-11T15:16:48.348674+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="39827"] 272:amf/amfd/clm.cc:242 TR numberOfMembers:'5', 
numberOfItems:'1'
<143>1 2019-06-11T15:16:48.348678+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="39828"] 272:amf/amfd/clm.cc:248 TR i = 0, 
node:'safNode=PL-3,safCluster=myClmCluster', clusterChange:2
<143>1 2019-06-11T15:16:48.348685+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="39829"] 272:amf/amfd/node.cc:53 TR added node 131855
<143>1 2019-06-11T15:16:48.348689+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="39830"] 272:amf/amfd/clm.cc:417 TR Node Joined 
'safNode=PL-3,safCluster=myClmCluster' '36'


Now amfd processes the mds down in main thread, its a race here then the 
@node_info.member set to FALSE


<143>1 2019-06-11T15:16:48.351948+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="39971"] 272:amf/amfd/ndfsm.cc:779 >> avd_mds_avnd_down_evh: 
2030f, 0x558e549a1650
<143>1 2019-06-11T15:16:48.351954+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="39972"] 272:amf/amfd/ndproc.cc:1267 >> avd_node_failover: 
'safAmfNode=PL-3,safAmfCluster=myAmfCluster'
<143>1 2019-06-11T15:16:48.351959+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="39973"] 272:amf/amfd/ndfsm.cc:1153 >> avd_node_mark_absent


Now the mds up comes, node_up come, but the node is not a clm member

<143>1 2019-06-11T15:16:48.701771+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40552"] 275:amf/amfd/mds.cc:398 >> avd_mds_svc_evt
<143>1 2019-06-11T15:16:48.701791+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40553"] 275:amf/amfd/mds.cc:0 << avd_mds_svc_evt


<143>1 2019-06-11T15:16:48.706254+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40560"] 272:amf/amfd/ndfsm.cc:743 >> avd_mds_avnd_up_evh
<143>1 2019-06-11T15:16:48.706271+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40561"] 275:amf/amfd/ndmsg.cc:389 << avd_n2d_msg_rcv
<143>1 2019-06-11T15:16:48.706288+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40562"] 272:amf/amfd/ndfsm.cc:757 TR amfnd on 2030f is up
<143>1 2019-06-11T15:16:48.706298+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40563"] 272:amf/amfd/ndfsm.cc:0 << avd_mds_avnd_up_evh


<143>1 2019-06-11T15:16:48.707145+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40596"] 272:amf/amfd/ndfsm.cc:275 >> avd_node_up_evh: from 
2030f, safAmfNode=PL-3,safAmfCluster=myAmfCluster
<143>1 2019-06-11T15:16:48.707153+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40597"] 272:amf/amfd/ndfsm.cc:292 TR leds_set 0
<143>1 2019-06-11T15:16:48.70716+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40598"] 272:amf/amfd/ndfsm.cc:308 TR node_id '2030f' not in 
failover_list.
<141>1 2019-06-11T15:16:48.707185+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40599"] 272:amf/amfd/ndfsm.cc:232 NO Received node_up from 
2030f: msg_id 1
<140>1 2019-06-11T15:16:48.7072+10:00 SC-1 osafamfd 272 osafamfd [meta 
sequenceId="40600"] 272:amf/amfd/ndfsm.cc:387 WA Not a Cluster Member 
dropping the msg
<143>1 2019-06-11T15:16:48.707206+10:00 SC-1 osafamfd 272 osafamfd [meta