Hi Anders,

Reviewed and tested the patch in 4.3.x.
Ack.

Neel.
On Friday 13 September 2013 06:48 PM, Anders Bjornerstedt wrote:
> Summary: IMM: New active IMMD sends redundant discard-node for payloads [#563]
> Review request for Trac Ticket(s): (#563)
> Peer Reviewer(s): Neel
> Pull request to:
> Affected branch(es): 4.2; 4.3; default(4.4)
> Development branch:
>
> --------------------------------
> Impacted area       Impact y/n
> --------------------------------
>   Docs                    n
>   Build system            n
>   RPM/packaging           n
>   Configuration files     n
>   Startup scripts         n
>   SAF services            y
>   OpenSAF services        n
>   Core libraries          n
>   Samples                 n
>   Tests                   n
>   Other                   n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
>
> changeset 213e6a6690e786364bce49242085f5151f720b36
> Author:       Anders Bjornerstedt <[email protected]>
> Date: Fri, 13 Sep 2013 14:29:42 +0200
>
>       IMM: New active IMMD sends redundant discard-node for payloads [#563]
>
>       Standby IMMD records IMMND down events for payloads including the epoch 
> it
>       occurred in. The recorded payload down events are discarded by standby 
> IMMD
>       when a new IMMND with same node-id is introduced. At failover, the new
>       active IMMD will generate a discard-node event for any recorded payload 
> down
>       events that are still in the same epoch. Normally these discard-node 
> events
>       will be redundant. In rare cases they plug the hole that this ticket
>       reported.
>
>
> Complete diffstat:
> ------------------
>   osaf/services/saf/immsv/immd/immd_cb.h    |   10 ++++++++-
>   osaf/services/saf/immsv/immd/immd_evt.c   |    1 +
>   osaf/services/saf/immsv/immd/immd_proc.c  |  104 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   osaf/services/saf/immsv/immd/immd_proc.h  |    2 +
>   osaf/services/saf/immsv/immd/immd_sbevt.c |   24 ++++++++++++++++++++++
>   5 files changed, 138 insertions(+), 3 deletions(-)
>
>
> Testing Commands:
> -----------------
> The patch sent out for review is applies cleanly on OpenSAF 4.2 and 4.3.
> It does not apply cleanly on default(4.4). I am sending the 4.2/4.3 patch
> because I see that one as more urgent to get tested. The default(4.4) patch
> can be provided on request.
>
> Very difficult to reproduce the problem.
> It was done by a "hacked cluster restart" where one SC was actually
> not restarted. THus the active SC and all payloads where ordered down
> at roughly the same time. Omitted discard node messages will still
> be reflected in the remaining new active SC and will get synced to
> restarted payloads, unless there is generated a new discard node for
> the departed payloads.
>
> System testing is needed to ensure that this patch does not
> introduce any problem. In particular, the main risk to watch for
> is the opposite problem of that which is fixed. The fixed problem is
> that in rare cases, the discard node message is never generated for
> some payloads that departed closely in time with the old active SC.
>
> The opposite problem would be if a redundant discard node message
> for a payload is generated late enough to arrive after that payload
> has rejoined. This should not happen due to two safe-guards.
>
> First any node that rejoins must first introduce itself. Such an
> intro message arriving to IMMD standby causes it to remove its
> record of the departed payload.
>
> Second, when standby becomes new active, it will dispense of all
> such recorded payload-down records and only generate a redundant
> discard-node message if the cluster is still in the same epoch as
> whhen the down event was recorded.
>
>
> Testing, Expected Results:
> --------------------------
> After fail-over where many payloads go down in close time proximity
> to the old active SC, there shall NOT be resources left "hanging"
> due to omitted discard-node message for some payloads. By "resources"
> is here ment allocated implemeners, admin-owners, active non critical ccbs.
> Note also that "go down" can either mean processor down or just IMMND
> process down.
>
> Conditions of Submission:
> -------------------------
> Ack from Neel.
> We will also try to rerun the test here that triggered the issue.
>
>
> Arch      Built     Started    Linux distro
> -------------------------------------------
> mips        n          n
> mips64      n          n
> x86         n          n
> x86_64      n          n
> powerpc     n          n
> powerpc64   n          n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>      that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>      (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>      Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>      like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>      cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>      too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>      Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>      commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>      of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
>      comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
>      the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
>      for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
>      do not contain the patch that updates the Doxygen manual.
>


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to