Hi AndersBj,

Reviewed and tested the patch.
Ack.

/Neel.
On Thursday 21 May 2015 06:54 PM, Anders Bjornerstedt wrote:
> Summary: IMM: Detach of PBE aborts all non-critical and non-empty CCBs [#1261]
> Review request for Trac Ticket(s): 1261
> Peer Reviewer(s): Neel; Zoran
> Pull request to:
> Affected branch(es): default(4.7)
> Development branch: default(4.7)
>
> --------------------------------
> Impacted area       Impact y/n
> --------------------------------
>   Docs                    n
>   Build system            n
>   RPM/packaging           n
>   Configuration files     n
>   Startup scripts         n
>   SAF services            y
>   OpenSAF services        n
>   Core libraries          n
>   Samples                 n
>   Tests                   n
>   Other                   n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
>
> changeset 8073b1de0515c7a82cc4cdc89aee39a682a86e06
> Author:       Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
> Date: Thu, 21 May 2015 14:26:58 +0200
>
>       IMM: Detach of PBE aborts all non-critical and non-empty CCBs [#1261]
>
>       If the PBE detaches and re-attaches while there are one or more open 
> non-
>       critical (not yet committing) but non-empty CCBs, then before this
>       enhancement one would see the following in the syslog at apply of the 
> CCB:
>
>        May 20 13:25:33 SC-2 local0.notice osafimmnd[406]: NO STARTING PBE 
> process.
>       ...... May 20 13:25:34 SC-2 local0.notice osafimmnd[406]: NO PBE-OI
>       established on this SC. Dumping incrementally to file imm.db May 20 
> 13:25:49
>       SC-2 local0.info osafimmnd[406]: IN GOING FROM IMM_CCB_PREPARE to
>       IMM_CCB_CRITICAL Ccb:4 May 20 13:25:49 SC-2 user.notice osafimmpbed: NO
>       Record for ccb 0x4 not found or found aborted in ok_for_critical May 20
>       13:25:49 SC-2 user.warn osafimmpbed: WA WARNING: CCB record for 4 does 
> not
>       have correct op-count May 20 13:25:49 SC-2 local0.notice 
> osafimmnd[406]: NO
>       Invalid error reported implementer 'OpenSafImmPBE', Ccb 4 will be 
> aborted
>
>       While this does catch the problem and aborts the CCB, the op-count 
> mechanism
>       that catches this is not intended for handling regular processing 
> cases. It
>       is an extra safety harness intended to catch bugs, lost messages, or
>       incorrect behavior of the PBE.
>
>       This enhancement avoids dependence on the op-count safety harness by 
> having
>       the restarted PBE (primary or slave) invoking the special 
> admin-operation
>       that aborts all non-critical CCBs in the immsv. See enhancement ticket 
> #1107
>       or the IMMSV README for details about his admin-operation.
>
>       The newly (re)started PBE invokes the admin-operation asynchronously to
>       avoid getting blocked waiting on reply for this admin-op. The risk of 
> the
>       admin-op failing is minimal and if it does fail then we end up in the 
> same
>       distributed logic as we have today. That is we would end up in the 
> op-count
>       safety-harness. No CCB can get applied without ack from the PBE and so 
> the
>       admin-operation, if it is successfully received by the IMMND coord, 
> should
>       result in all currently non-critical CCBS getting aborted before the 
> PBE can
>       get any completed/apply for such a CCB over FEVS.
>
>       With this enhancement, if the PBE detaches and re-attaches while there 
> are
>       one or more open non-critical (not yet committing) and non-empty CCBs, 
> then
>       these CCBs will be aborted. The newly attached PBE may possibly get an 
> abort
>       callback for such CCbs, but these are ignored by the PBE.
>
>       With this enhancement one will see something like the following in the
>       syslog at an attempt tp apply a CCB that was active during detach and 
> attach
>       of PBE:
>
>       May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Persistent Back 
> End OI
>       attached, pid: 764 May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO
>       Received: immadm -o 202 safRdn=immManagement,safApp=safImmService May 21
>       12:41:34 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs = 
> true;
>       May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Implementer 
> connected:
>       19 (OpenSafImmPBE) <332, 2020f> May 21 12:41:34 SC-2 user.info 
> osafimmpbed:
>       IN Admop for aborting CCBs result: 1, immsv returned 1 May 21 12:41:34 
> SC-2
>       user.notice osafimmpbed: NO Update epoch 21 committing with
>       ccbId:100000014/4294967316 May 21 12:41:34 SC-2 local0.notice 
> osafimmd[396]:
>       NO IMMND coord at 2020f May 21 12:41:34 SC-2 local0.info 
> osafimmnd[406]: IN
>       Update of epoch is PERSISTENT. May 21 12:41:35 SC-2 local0.notice
>       osafimmnd[406]: NO PBE-OI established on this SC. Dumping incrementally 
> to
>       file imm.db May 21 12:41:35 SC-2 local0.info osafimmnd[406]: IN
>       sAbortNonCriticalCcbs is true => set max_oi_timeout to 0 May 21 12:41:35
>       SC-2 local0.notice osafimmnd[406]: NO CCB 5 aborted by: immadm -o 202
>       safRdn=immManagement,safApp=safImmService May 21 12:41:35 SC-2 
> local0.info
>       osafimmnd[406]: IN sAbortNonCriticalCcbs reset to false May 21 12:41:35 
> SC-2
>       local0.warn osafimmnd[406]: WA Timeout while waiting for implementer,
>       aborting ccb:5 May 21 12:41:35 SC-2 user.warn osafimmpbed: WA Failed to 
> find
>       CCB object for 5/5 May 21 12:41:45 SC-2 local0.notice osafimmnd[406]: 
> NO Ccb
>       <5> not in correct state (12) for Apply ignoring request May 21 12:41:45
>       SC-2 local0.warn osafimmnd[406]: WA Spurious and redundant ccb-apply 
> request
>       ignored ccbId:5
>
>
> Complete diffstat:
> ------------------
>   osaf/services/saf/immsv/immpbed/immpbe.cc |  24 ++++++++++++++++++++++--
>   1 files changed, 22 insertions(+), 2 deletions(-)
>
>
> Testing Commands:
> -----------------
> I tested using immcfg in explicit commit mode and immapplier for
> having some OIs.
>
>
> Testing, Expected Results:
> --------------------------
> This enhancement should be tested on top of enhancement #1107.
> Killing the PBE (in 2PBE killing either primary or slave or both)
> results in the restarting PBE processes generating the abort ccbs
> admin-op. This should nearly always result in any CCBs getting aborted
> before the OM client can attempt to apply the CCB. An apply by the
> OM client very close in time with the re-attachement could get processed
> before the abort-admin-op is acted on by the IMMND coord, but it is
> low probability. If it happens then one could still see the old behavior,
> i.e. ending up in the op-count safety-harness. An attempt by the OM-client
> to apply when the PBE is absent will result in the abort of the CCB due
> to missing PBE.
>
>
> Conditions of Submission:
> -------------------------
> Ack from Neel and Zoran.
>
>
> Arch      Built     Started    Linux distro
> -------------------------------------------
> mips        n          n
> mips64      n          n
> x86         n          n
> x86_64      n          n
> powerpc     n          n
> powerpc64   n          n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>      that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>      (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>      Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>      like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>      cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>      too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>      Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>      commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>      of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
>      comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
>      the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
>      for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
>      do not contain the patch that updates the Doxygen manual.
>


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to