Hi,


On 04/12/2018 04:15 PM, Gary Lee wrote:
Hi


On 12/04/18 23:34, Anders Widell wrote:
Ack with comments:

* There is no need to use "const" when passing function arguments by value. E.g. the argument "const uint64_t cluster_size" should be "uint64_t cluster_size".


[GL] Sure, but it doesn't do any harm, and would stop accidental assignments (that would be lost anyway).
[HansN] perhaps these functions should be const member functions? E.g. SaAisErrorT PromoteThisNode(bool graceful_takeover, uint64_t cluster_size) const;

* You assume that all nodes in the cluster have synchronized clocks (probably using NTP). Would it be possible to use an expiration time for the etcd key instead of writing a time stamp in the value, so that etcd automatically deletes the takeover request when it expires? That way we would not require synchronized clocks.


[GL] Good idea. I did question why I hadn't use TTL/lease once I had finished the ticket. :-) Will see what I can do!

regards,
Anders Widell

On 04/11/2018 09:35 AM, Gary Lee wrote:
Summary: split-brain: select active SC from largest network partition V3 [#2795]
Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review

--------------------------------
Impacted area       Impact y/n
--------------------------------
  Docs                    n
  Build system            n
  RPM/packaging           n
  Configuration files     n
  Startup scripts         n
  SAF services            n
  OpenSAF services        y
  Core libraries          y
  Samples                 n
  Tests                   n
  Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------

*** Changes from V2: ***

fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology events to be processed first
osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements


revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author:    Gary Lee <gary....@dektech.com.au>
Date:    Wed, 11 Apr 2018 17:16:46 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
   of cluster members

- listen for takeover requests in KV store



revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author:    Gary Lee <gary....@dektech.com.au>
Date:    Wed, 11 Apr 2018 17:16:18 +1000

fmd: adapt to new Consensus API [#2795]



revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author:    Gary Lee <gary....@dektech.com.au>
Date:    Wed, 11 Apr 2018 17:15:41 +1000

amfd: adapt to new Consensus API [#2795]



revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author:    Gary Lee <gary....@dektech.com.au>
Date:    Wed, 11 Apr 2018 17:14:39 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to KeyValue class - add Consensus::MonitorTakeoverRequest() function for use by RDE to answer takeover requests - add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, it will    create a takeover request in the KV store. An existing SC can reject the lock takeover



revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author:    Gary Lee <gary....@dektech.com.au>
Date:    Wed, 11 Apr 2018 17:13:45 +1000

osaf: extend API to include a create key and an enhanced set key function [#2795]

- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:
------------
  src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
------------------
  src/amf/amfd/role.cc                     |   2 +-
  src/fm/fmd/fm_cb.h                       |   2 +-
  src/fm/fmd/fm_main.cc                    |  26 +-
  src/fm/fmd/fm_mds.cc                     |   2 +
  src/fm/fmd/fm_rda.cc                     |  27 +-
  src/osaf/consensus/consensus.cc          | 435 ++++++++++++++++++++++++++-----
  src/osaf/consensus/consensus.h           |  55 +++-
  src/osaf/consensus/key_value.cc          | 105 +++++---
  src/osaf/consensus/key_value.h           |  19 +-
  src/osaf/consensus/plugins/etcd.plugin   |  86 +++++-
  src/osaf/consensus/plugins/etcd3.plugin  | 366 ++++++++++++++++++++++++++
  src/osaf/consensus/plugins/sample.plugin |  67 ++++-
  src/rde/rded/rde_cb.h                    |  12 +-
  src/rde/rded/rde_main.cc                 |  75 ++++--
  src/rde/rded/rde_mds.cc                  |  39 ++-
  src/rde/rded/rde_rda.cc                  |   2 +-
  src/rde/rded/role.cc                     |  46 +++-
  src/rde/rded/role.h                      |   2 +-
  18 files changed, 1180 insertions(+), 188 deletions(-)


Testing Commands:
-----------------
1) SI swap of safSi=SC-2N,safApp=OpenSAF
2) Isolate standby cluster (eg. use iptables to block port 6700 on a TCP system)
3) Isolate active cluster

Testing, Expected Results:
--------------------------
1) No error
2) Standby will fail to be promoted as active as the takeover request is rejected
3) Standby will be promoted

Conditions of Submission:
-------------------------
Ack from any reviewer


Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      y          y
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
     that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
     (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
     Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
     like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
     cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
     too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
     Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
     commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
     of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
     comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
     the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
     for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
     do not contain the patch that updates the Doxygen manual.





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to