- **status**: assigned --> accepted


---

** [tickets:#2941] mds: Miss to send MDSDOWN after split brain detection**

**Status:** accepted
**Milestone:** 5.18.12
**Created:** Tue Oct 16, 2018 12:42 PM UTC by Minh Hon Chau
**Last Updated:** Wed Oct 17, 2018 01:12 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- 
[mds.pl3.osafamfnd.log](https://sourceforge.net/p/opensaf/tickets/2941/attachment/mds.pl3.osafamfnd.log)
 (1.4 MB; text/x-log)
- 
[mds.pl5.osafamfnd.log](https://sourceforge.net/p/opensaf/tickets/2941/attachment/mds.pl5.osafamfnd.log)
 (1.5 MB; text/x-log)
- 
[pl3.syslog](https://sourceforge.net/p/opensaf/tickets/2941/attachment/pl3.syslog)
 (34.5 kB; application/octet-stream)
- 
[pl5.syslog](https://sourceforge.net/p/opensaf/tickets/2941/attachment/pl5.syslog)
 (40.3 kB; application/octet-stream)


**Configuration:**
 - Cluster with 2SCs, 6 PLs with MDS/TCP
- Split cluster to two partitions: #1 {SC1, PL3, PL4}, #2 {SC2, PL5, PL6, PL7, 
PL8)
- Network merge, both SC go reboot dueto split-brain detection

**Observation:**
PL3, PL4 did not get NCSMDS_DOWN vdest, thuse PL3, PL4 did not get headless and 
resync afterward.

**Outlined Logs:**
After network merge:

<143>1 2018-10-16T22:53:42.184459+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9668"] >> mds_mcm_svc_up
<143>1 2018-10-16T22:53:42.184463+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9669"] MCM:API: LOCAL SVC INFO  : svc_id = AVND(13) | PWE id = 1 | 
VDEST id = 65535 |
<143>1 2018-10-16T22:53:42.184467+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9670"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | 
VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 |
<143>1 2018-10-16T22:53:42.184471+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9671"] >> mds_svc_tbl_query
<143>1 2018-10-16T22:53:42.184475+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9672"] << mds_svc_tbl_query
<143>1 2018-10-16T22:53:42.184478+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9673"] >> mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:42.184482+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9674"] << mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:42.184485+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9675"] >> mds_mcm_validate_scope
<143>1 2018-10-16T22:53:42.184489+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9676"] << mds_mcm_validate_scope
<143>1 2018-10-16T22:53:42.184492+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9677"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:42.184496+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9678"] MDS:DB: Subscription Result not present
<143>1 2018-10-16T22:53:42.184499+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9679"] << mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:42.184502+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9680"] >> mds_subtn_res_tbl_get
<143>1 2018-10-16T22:53:42.184506+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9681"] << mds_subtn_res_tbl_get
<143>1 2018-10-16T22:53:42.184509+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9682"] >> mds_subtn_res_tbl_add
<143>1 2018-10-16T22:53:42.184515+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9683"] MDS:DB: adest_details: <rem_node[0x2020f]:dest_pid[253]> 
<143>1 2018-10-16T22:53:42.184519+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9684"] << get_subtn_adest_details
<143>1 2018-10-16T22:53:42.184524+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9685"] MDS:DB: sub_adest_details: <rem_node[0x2020f]:dest_pid[253]>
<143>1 2018-10-16T22:53:42.184527+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9686"] << mds_subtn_res_tbl_add
<143>1 2018-10-16T22:53:42.184531+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9687"] << mds_mcm_svc_up
...
<143>1 2018-10-16T22:53:45.3608+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9837"] >> mds_mcm_svc_down
<143>1 2018-10-16T22:53:45.360804+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9838"] MCM:API: LOCAL SVC INFO  : svc_id = AVND(13) | PWE id = 1 | 
VDEST id = 65535 |
<143>1 2018-10-16T22:53:45.360809+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9839"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | 
VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 | adest = 
565213401186557
<143>1 2018-10-16T22:53:45.360814+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9840"] >> mds_svc_tbl_query
<143>1 2018-10-16T22:53:45.360818+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9841"] << mds_svc_tbl_query
<143>1 2018-10-16T22:53:45.360822+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9842"] >> mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:45.360826+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9843"] << mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:45.360829+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9844"] >> mds_mcm_validate_scope
<143>1 2018-10-16T22:53:45.360833+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9845"] << mds_mcm_validate_scope
<143>1 2018-10-16T22:53:45.360836+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9846"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:45.360841+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9847"] << mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:45.360844+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9848"] >> mds_subtn_res_tbl_get_by_adest
<143>1 2018-10-16T22:53:45.360847+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9849"] << mds_subtn_res_tbl_get_by_adest
<143>1 2018-10-16T22:53:45.36085+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9850"] >> mds_subtn_res_tbl_get
<143>1 2018-10-16T22:53:45.360854+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9851"] << mds_subtn_res_tbl_get
<142>1 2018-10-16T22:53:45.360857+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9852"] active adest: 2020f000000fd
<143>1 2018-10-16T22:53:45.360861+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9853"] >> mds_subtn_res_tbl_del
<143>1 2018-10-16T22:53:45.360866+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9854"] << mds_subtn_res_tbl_del
<143>1 2018-10-16T22:53:45.360872+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9855"] >> mds_subtn_res_tbl_remove_active
<143>1 2018-10-16T22:53:45.360879+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9856"] 
MCM_DB:RemoveActive:TimerStart:AwaitActiveTmri:Hdl=0xff800004:SvcHdl=562945658454029:sbscr-svcid=AVD(12),vdest=1
<143>1 2018-10-16T22:53:45.360884+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9857"] << mds_subtn_res_tbl_remove_active
<143>1 2018-10-16T22:53:45.360888+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9858"] >> mds_mcm_user_event_callback
<143>1 2018-10-16T22:53:45.360891+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9859"] >> mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.360895+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9860"] << mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.360899+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9861"] >> mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.360903+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9862"] << mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.360907+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9863"] << mds_mcm_user_event_callback
<142>1 2018-10-16T22:53:45.36091+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9864"] MCM:API: svc_down : svc_id = AVND(13) on DEST id = 65535 got 
NO_ACTIVE for svc_id = AVD(12) on Vdest id = 1 Adest = , rem_svc_pvt_ver=7
<143>1 2018-10-16T22:53:45.360915+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9865"] >> mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.36092+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9866"] << mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.360924+11:00 PL-3 osafamfnd 193 mds.log [meta 
sequenceId="9867"] << mds_mcm_svc_down

=> At this point,  mds-PL3 should send NCSMDS_DOWN (vdest) to amfnd, but the 
mds_subtn_res_tbl_getnext_any() returned with another existing adest, thus mds 
did not send MDS_DOWN. The existing adest was added at the time network merge 
back, because mds saw the other being up.

Looking at PL-5's log as below, mds sent MDS_DOWN (vdest) to amfnd-PL5, because 
there is no other adest existed.

<143>1 2018-10-16T22:53:45.366186+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9903"] >> mds_mcm_svc_down
<143>1 2018-10-16T22:53:45.36619+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9904"] MCM:API: LOCAL SVC INFO  : svc_id = AVND(13) | PWE id = 1 | 
VDEST id = 65535 |
<143>1 2018-10-16T22:53:45.366194+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9905"] MCM:API: REMOTE SVC INFO : svc_id = AVD(12) | PWE id = 1 | 
VDEST id = 1 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 | adest = 
565213401186557
<143>1 2018-10-16T22:53:45.366199+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9906"] >> mds_svc_tbl_query
<143>1 2018-10-16T22:53:45.366203+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9907"] << mds_svc_tbl_query
<143>1 2018-10-16T22:53:45.366207+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9908"] >> mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:45.36621+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9909"] << mds_subtn_tbl_get_details
<143>1 2018-10-16T22:53:45.366214+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9910"] >> mds_mcm_validate_scope
<143>1 2018-10-16T22:53:45.366216+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9911"] << mds_mcm_validate_scope
<143>1 2018-10-16T22:53:45.366219+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9912"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:45.366223+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9913"] << mds_get_subtn_res_tbl_by_adest
<143>1 2018-10-16T22:53:45.366226+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9914"] >> mds_subtn_res_tbl_get_by_adest
<143>1 2018-10-16T22:53:45.366229+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9915"] << mds_subtn_res_tbl_get_by_adest
<143>1 2018-10-16T22:53:45.366232+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9916"] >> mds_subtn_res_tbl_get
<143>1 2018-10-16T22:53:45.366237+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9917"] << mds_subtn_res_tbl_get
<142>1 2018-10-16T22:53:45.366241+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9918"] active adest: 2020f000000fd
<143>1 2018-10-16T22:53:45.366245+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9919"] >> mds_subtn_res_tbl_del
<143>1 2018-10-16T22:53:45.366248+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9920"] << mds_subtn_res_tbl_del
<143>1 2018-10-16T22:53:45.366252+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9921"] >> mds_subtn_res_tbl_remove_active
<143>1 2018-10-16T22:53:45.366258+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9922"] 
MCM_DB:RemoveActive:TimerStart:AwaitActiveTmri:Hdl=0xff900004:SvcHdl=562945658454029:sbscr-svcid=AVD(12),vdest=1
<143>1 2018-10-16T22:53:45.366262+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9923"] << mds_subtn_res_tbl_remove_active
<143>1 2018-10-16T22:53:45.366266+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9924"] >> mds_mcm_user_event_callback
<143>1 2018-10-16T22:53:45.366269+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9925"] >> mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.366272+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9926"] << mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.366276+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9927"] >> mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.366279+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9928"] << mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.366283+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9929"] << mds_mcm_user_event_callback
<142>1 2018-10-16T22:53:45.366287+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9930"] MCM:API: svc_down : svc_id = AVND(13) on DEST id = 65535 got 
NO_ACTIVE for svc_id = AVD(12) on Vdest id = 1 Adest = , rem_svc_pvt_ver=7
<143>1 2018-10-16T22:53:45.366291+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9931"] >> mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.366296+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9932"] << mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.366301+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9933"] >> mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.366313+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9934"] MDS:DB: Subscription Result not present
<143>1 2018-10-16T22:53:45.366316+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9935"] << mds_subtn_res_tbl_getnext_any
<143>1 2018-10-16T22:53:45.36632+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9936"] >> mds_mcm_user_event_callback
<143>1 2018-10-16T22:53:45.366323+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9937"] >> mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.366327+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9938"] << mds_svc_tbl_get
<143>1 2018-10-16T22:53:45.36633+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9939"] >> mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.366333+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9940"] << mds_subtn_tbl_get
<143>1 2018-10-16T22:53:45.366343+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9941"] << mds_mcm_user_event_callback
<143>1 2018-10-16T22:53:45.366347+11:00 PL-5 osafamfnd 193 mds.log [meta 
sequenceId="9942"] << mds_mcm_svc_down



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to