Summary: imm: fix amfd stuck when multi partitioned clusters rejoin [#3237] Review request for Ticket(s): 3237 Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE *** Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-3237 Base revision: c4091499e28980c732c8ac4136e10243617ac81d Personal repository: git://git.code.sf.net/u/thuantr/review
-------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): --------------------------------------------- N/A revision dda452c5486137f6bc9653e219d5ea16d6323def Author: thuan.tran <thuan.t...@dektech.com.au> Date: Wed, 18 Nov 2020 10:24:49 +0700 imm: fix amfd stuck when multi partitioned clusters rejoin [#3237] - IMMND coordinator take longer time to sync because incorrectly postpone sync to wait for incorrect number of down nodes. - IMMND should restart after being accepted re-intro and not be a new coordinator to sync again with new coordinator. - Active IMMD only update ex-IMMD from coordinator if info exist. Update ex-IMMD to node id itself when new coord announce sync. - IMMND on active IMMD node will start split-brain detected timer to reboot node if see another acitve IMMD, not reboot immedidately to avoid messing up RDE split-brain detection mechanism. - Quick reboot sometimes not quick then active IMMD on node may impact to new promoted Active node. Let stop AMFND, kill AMFD/IMMD to avoid any impact. Complete diffstat: ------------------ scripts/opensaf_reboot | 5 +++-- src/imm/immd/immd_evt.c | 16 +++++++++++++--- src/imm/immnd/immnd.h | 1 + src/imm/immnd/immnd_cb.h | 2 ++ src/imm/immnd/immnd_evt.c | 37 +++++++++++++++++++++++++++++-------- src/imm/immnd/immnd_main.c | 2 ++ 6 files changed, 50 insertions(+), 13 deletions(-) Testing Commands: ----------------- N/A Testing, Expected Results: -------------------------- N/A Conditions of Submission: ------------------------- ACK by reviewers Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel