develop:
commit 4cb4351920a16284ac3dfb40f055bab455e760dc
Author: Minh Chau <minh.c...@dektech.com.au>
Date: Wed Apr 26 15:02:48 2017 +1000
amfnd: Ignore second NCSMDS_DOWN [#2436]
If cluster goes into headless stage and wait up to 3 mins
which is currently the timeout of MDS_AWAIT_ACTIVE_TMR_VAL,
amfnd will receive another NCSMDS_DOWN, and then delete
all buffered messages. As a result, the headless recovery
is impossible because these buffered messages are deleted.
release:
commit ee0ae69f29bfd3672a4bfa3a55154d07948962ea
Author: Minh Chau <minh.c...@dektech.com.au>
Date: Wed Apr 26 15:02:48 2017 +1000
amfnd: Ignore second NCSMDS_DOWN [#2436]
If cluster goes into headless stage and wait up to 3 mins
which is currently the timeout of MDS_AWAIT_ACTIVE_TMR_VAL,
amfnd will receive another NCSMDS_DOWN, and then delete
all buffered messages. As a result, the headless recovery
is impossible because these buffered messages are deleted.
Patch ignores the second NCSMDS_DOWN.
changeset: 8790:c95a64cc4940
user: Minh Hon Chau <minh.c...@dektech.com.au>
date: Thu May 04 15:05:26 2017 +1000
summary: amfnd: Ignore second NCSMDS_DOWN [#2436]
---
** [tickets:#2436] amfnd: Buffered messages are unexpectedly deleted during SC
Absence period**
**Status:** review
**Milestone:** 5.17.06
**Created:** Mon Apr 24, 2017 10:58 AM UTC by Minh Hon Chau
**Last Updated:** Wed Apr 26, 2017 05:25 AM UTC
**Owner:** Minh Hon Chau
Stop both SCs so that cluster goes into headless. Trigger a su failover, so
su_oper message is buffered and supposedly will be sent to active amfd when SC
comes back. However, if cluster is waiting up to 3 mins, which is exactly the
MDS_AWAIT_ACTIVE_TMR_VAL timeout, amfnd will receive another NCSMDS_DOWN. At
this time, amfnd will delete all pending messages, which causes the headless
recovery impossible.
Some outline logs:
~~~
Apr 18 16:49:09.749428 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0603] >> avnd_evt_mds_avd_dn_evh
Apr 18 16:49:09.750094 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0618] WA AMF director
unexpectedly crashed
Apr 18 16:49:09.750103 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0662] TR Delete all pending
messages to be sent to AMFD
Apr 18 16:49:09.796138 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0756] NO avnd_di_oper_send()
deferred as AMF director is offline(1), or sync is required(1)
Apr 18 16:49:09.797440 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0756] NO avnd_di_oper_send()
deferred as AMF director is offline(1), or sync is required(1)
Apr 18 16:52:09.825457 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0603] >> avnd_evt_mds_avd_dn_evh
Apr 18 16:52:09.825489 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0618] WA AMF director
unexpectedly crashed
Apr 18 16:52:09.825495 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:0662] TR Delete all pending
messages to be sent to AMFD
Apr 18 16:52:09.825498 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:1273] >> avnd_diq_rec_del
Apr 18 16:52:09.825505 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:1290] << avnd_diq_rec_del
Apr 18 16:52:09.825508 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:1273] >> avnd_diq_rec_del
Apr 18 16:52:09.825512 osafamfnd
[10775:10775:../../opensaf/src/amf/amfnd/di.cc:1290] << avnd_diq_rec_del
~~~
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets