Hi Gary

ACK (not tested)

Regards
Canh

-----Original Message-----
From: Gary Lee <gary....@dektech.com.au> 
Sent: Tuesday, July 9, 2019 1:21 PM
To: canh.v.tru...@dektech.com.au; minh.c...@dektech.com.au;
hans.nordeb...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; Gary Lee <gary....@dektech.com.au>
Subject: [PATCH 0/4] Review Request for amfd: improve controller failover
behavior V2 [#3029]

Summary: amfd: improve controller failover behavior [#3029]
Review request for Ticket(s): 3029
Peer Reviewer(s): Canh, Minh, Hans 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3029
Base revision: 71852f322b42437f074bfa4c618c021798357143
Personal repository: git://git.code.sf.net/u/userid-2226215/review

--------------------------------
Impacted area       Impact y/n
--------------------------------
 Docs                    n
 Build system            n
 RPM/packaging           n
 Configuration files     n
 Startup scripts         n
 SAF services            y 
 OpenSAF services        y
 Core libraries          y
 Samples                 n
 Tests                   n
 Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------

revision 4feee2b631afa3393ae9e53fd6575c3768861dca
Author: Gary Lee <gary....@dektech.com.au>
Date:   Tue, 9 Jul 2019 14:38:49 +1000

osaf: make wait time configurable [#3029]

If FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is enabled,
make the time that we wait for MDS node events configurable.



revision 2c419ba5fffb85272f0d15118b561bcfc1de4814
Author: Gary Lee <gary....@dektech.com.au>
Date:   Tue, 9 Jul 2019 14:38:49 +1000

amfd: improve controller failover behavior [#3029]

If consensus service is enabled, only perform node failover
after peer controller has self-fenced
(after 2 * FMS_TAKEOVER_REQUEST_VALID_TIME seconds).

This also means if node failover delay is set to a large value,
we do not unnecesarily wait too long before failing over assignments
previously assigned to the peer controller.

Remove unused fmd_conf_file variable.

Change some LOG_ER calls to LOG_WA.



revision 7c4fff483477082ca66a26f921a50b3bc1240538
Author: Gary Lee <gary....@dektech.com.au>
Date:   Tue, 9 Jul 2019 14:38:49 +1000

fmd: add active promotion supervision timer [#3029]

Add supervision timer so controller will reboot if it cannot obtain
consensus lock within the allocation period
(2* FMS_TAKEOVER_REQUEST_VALID_TIME).

The peer controller can then safely perform a node failover
after this period of time.



revision 8b596a228402ff99b26906138daf920c23e965e7
Author: Gary Lee <gary....@dektech.com.au>
Date:   Tue, 9 Jul 2019 14:38:49 +1000

osaf: add function to return takeover request expiry time [#3029]



Complete diffstat:
------------------
 src/amf/amfd/cb.h                  |  1 -
 src/amf/amfd/clm.cc                |  4 +-
 src/amf/amfd/main.cc               |  1 -
 src/amf/amfd/ndfsm.cc              |  8 ++--
 src/amf/amfd/ndproc.cc             | 19 ++++++++
 src/amf/amfd/node_state.cc         | 23 +++++-----
 src/amf/amfd/node_state_machine.cc | 19 ++++++++
 src/amf/amfd/node_state_machine.h  |  2 +
 src/amf/amfd/proc.h                |  1 +
 src/fm/fmd/fm_cb.h                 |  2 +
 src/fm/fmd/fm_main.cc              | 14 +++++-
 src/fm/fmd/fm_rda.cc               | 89
++++++++++++++++++++++++++------------
 src/fm/fmd/fmd.conf                |  5 +++
 src/osaf/consensus/consensus.cc    | 13 ++++++
 src/osaf/consensus/consensus.h     |  4 ++
 src/rde/rded/role.cc               |  4 +-
 16 files changed, 160 insertions(+), 49 deletions(-)


Testing Commands:
-----------------
1) Ensure a 2N application is active on standby controller,
   and standy on the active controller
2) Isolate active & standby controller


Testing, Expected Results:
--------------------------
amfd should failover 2N application only after
2 * FMS_TAKEOVER_REQUEST_VALID_TIME seconds

Conditions of Submission:
-------------------------
ack from any reviewer

Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      y          y 
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
    that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
    (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
    Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
    like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
    cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
    too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
    Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
    commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
    of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
    comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email
etc)

___ Your computer have a badly configured date and time; confusing the
    the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
    for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
    do not contain the patch that updates the Doxygen manual.




_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to