Summary: split-brain: select active SC from largest network
partition V3 [#2795]
Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review
--------------------------------
Impacted area Impact y/n
--------------------------------
Docs n
Build system n
RPM/packaging n
Configuration files n
Startup scripts n
SAF services n
OpenSAF services y
Core libraries y
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
---------------------------------------------
*** Changes from V2: ***
fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology
events to be processed first
osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements
revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author: Gary Lee <gary....@dektech.com.au>
Date: Wed, 11 Apr 2018 17:16:46 +1000
rded: adapt to new Consensus API [#2795]
- add 3 new internal message:
RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK
- subscribe to AMFND service up events to keep track of the number
of cluster members
- listen for takeover requests in KV store
revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author: Gary Lee <gary....@dektech.com.au>
Date: Wed, 11 Apr 2018 17:16:18 +1000
fmd: adapt to new Consensus API [#2795]
revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author: Gary Lee <gary....@dektech.com.au>
Date: Wed, 11 Apr 2018 17:15:41 +1000
amfd: adapt to new Consensus API [#2795]
revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author: Gary Lee <gary....@dektech.com.au>
Date: Wed, 11 Apr 2018 17:14:39 +1000
osaf: add lock takeover request fuction [#2795]
- add create and set (if previous value matches) functions to
KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to
answer takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted
to active, it will
create a takeover request in the KV store. An existing SC can
reject the lock takeover
revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author: Gary Lee <gary....@dektech.com.au>
Date: Wed, 11 Apr 2018 17:13:45 +1000
osaf: extend API to include a create key and an enhanced set key
function [#2795]
- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin
Added Files:
------------
src/osaf/consensus/plugins/etcd3.plugin
Complete diffstat:
------------------
src/amf/amfd/role.cc | 2 +-
src/fm/fmd/fm_cb.h | 2 +-
src/fm/fmd/fm_main.cc | 26 +-
src/fm/fmd/fm_mds.cc | 2 +
src/fm/fmd/fm_rda.cc | 27 +-
src/osaf/consensus/consensus.cc | 435
++++++++++++++++++++++++++-----
src/osaf/consensus/consensus.h | 55 +++-
src/osaf/consensus/key_value.cc | 105 +++++---
src/osaf/consensus/key_value.h | 19 +-
src/osaf/consensus/plugins/etcd.plugin | 86 +++++-
src/osaf/consensus/plugins/etcd3.plugin | 366
++++++++++++++++++++++++++
src/osaf/consensus/plugins/sample.plugin | 67 ++++-
src/rde/rded/rde_cb.h | 12 +-
src/rde/rded/rde_main.cc | 75 ++++--
src/rde/rded/rde_mds.cc | 39 ++-
src/rde/rded/rde_rda.cc | 2 +-
src/rde/rded/role.cc | 46 +++-
src/rde/rded/role.h | 2 +-
18 files changed, 1180 insertions(+), 188 deletions(-)
Testing Commands:
-----------------
1) SI swap of safSi=SC-2N,safApp=OpenSAF
2) Isolate standby cluster (eg. use iptables to block port 6700 on a
TCP system)
3) Isolate active cluster
Testing, Expected Results:
--------------------------
1) No error
2) Standby will fail to be promoted as active as the takeover
request is rejected
3) Standby will be promoted
Conditions of Submission:
-------------------------
Ack from any reviewer
Arch Built Started Linux distro
-------------------------------------------
mips n n
mips64 n n
x86 n n
x86_64 y y
powerpc n n
powerpc64 n n
Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any
checkmarks!]
Your checkin has not passed review because (see checked entries):
___ Your RR template is generally incomplete; it has too many blank
entries
that need proper data filled in.
___ You have failed to nominate the proper persons for review and push.
___ Your patches do not have proper short+long header
___ You have grammar/spelling in your header that is unacceptable.
___ You have exceeded a sensible line length in your
headers/comments/text.
___ You have failed to put in a proper Trac Ticket # into your commits.
___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)
___ You have not given any evidence of testing beyond basic build
tests.
Demonstrate some level of runtime or other sanity testing.
___ You have ^M present in some of your files. These have to be
removed.
___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.
___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.
___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.
___ You have extraneous garbage in your review (merge commits etc)
___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be
pulled.
___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.
___ You have resent this content multiple times without a clear
indication
of what has changed between each re-send.
___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial
review.
___ You have a misconfigured ~/.gitconfig file (i.e. user.name,
user.email etc)
___ Your computer have a badly configured date and time; confusing the
the threaded patch review.
___ Your changes affect IPC mechanism, and you don't present any
results
for in-service upgradability test.
___ Your changes affect user manual and documentation, your patch
series
do not contain the patch that updates the Doxygen manual.