- **Priority**: minor --> major
- **Milestone**: 5.18.04 --> 5.18.12
---
** [tickets:#2456] mds: NCSMDS_DOWN is sent with adest**
**Status:** unassigned
**Milestone:** 5.18.12
**Created:** Tue May 09, 2017 04:38 AM UTC by Long H Buu Nguyen
**Last Updated:** Fri Feb 02, 2018 12:17 PM UTC
**Owner:** nobody
**Attachments:**
-
[PL-4.zip](https://sourceforge.net/p/opensaf/tickets/2456/attachment/PL-4.zip)
(64.7 kB; application/x-zip-compressed)
- Description:
When SCs are rebooted repeatedly, there is a case that NCSMDS_DOWN is sent to
PLs with only adest. This causes amfnd can not detect both SCs down to get into
headless state.
As observed in logs:
At 10:48:00, SC1 was off , SC2 was still alive
2017-03-09 10:47:58 SC-1 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.4" x-pid="273" x-info="http://www.rsyslog.com"] exiting on
signal 15.
2017-03-09 10:48:06 SC-1 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.4" x-pid="268" x-info="http://www.rsyslog.com"] start
SC2 was going down at 10:48:02
2017-03-09 10:48:02 SC-2 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.4" x-pid="239" x-info="http://www.rsyslog.com"] exiting on
signal 15.
2017-03-09 10:48:03 SC-2 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.4" x-pid="270" x-info="http://www.rsyslog.com"] start
So after 10:48:04, the cluster actually went into headless. But NCSMDS_DOWN
events received at amfnd was ADEST, so it was ignored. That resulted in
@is_avd_down still FALSE
Mar 9 10:48:00.930820 osafamfnd [422:src/amf/amfnd/di.cc:0602] >>
avnd_evt_mds_avd_dn_evh
Mar 9 10:48:00.930826 osafamfnd [422:src/amf/amfnd/di.cc:0609] <<
avnd_evt_mds_avd_dn_evh
...
Mar 9 10:48:04.517247 osafamfnd [422:src/amf/amfnd/di.cc:0602] >>
avnd_evt_mds_avd_dn_evh
Mar 9 10:48:04.517254 osafamfnd [422:src/amf/amfnd/di.cc:0609] <<
avnd_evt_mds_avd_dn_evh
The next SC restart, amfnd received NCSMDS_NEW_ACTIVE.
Mar 9 10:48:10.187820 osafamfnd [422:src/amf/amfnd/mds.cc:0540] NO AVD
NEW_ACTIVE, adest:1
But with @is_avd_down as FALSE, amfnd did not send sync state info because
amfnd thought that the cluster were NOT going into headless.
Compare to previous SC restart in the same test, one of NCSMDS_DOWN events
must be vdest
Mar 9 10:47:50.565629 osafamfnd [422:src/amf/amfnd/di.cc:0602] >>
avnd_evt_mds_avd_dn_evh
Mar 9 10:47:50.565685 osafamfnd [422:src/amf/amfnd/di.cc:0617] WA AMF director
unexpectedly crashed
...
Mar 9 10:47:50.565874 osafamfnd [422:src/amf/amfnd/di.cc:0602] >>
avnd_evt_mds_avd_dn_evh
Mar 9 10:47:50.565879 osafamfnd [422:src/amf/amfnd/di.cc:0609] <<
avnd_evt_mds_avd_dn_evh
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets