Hi AndersBj, Reviewed and tested the patch. Ack.
/Neel. On Thursday 21 May 2015 06:54 PM, Anders Bjornerstedt wrote: > Summary: IMM: Detach of PBE aborts all non-critical and non-empty CCBs [#1261] > Review request for Trac Ticket(s): 1261 > Peer Reviewer(s): Neel; Zoran > Pull request to: > Affected branch(es): default(4.7) > Development branch: default(4.7) > > -------------------------------- > Impacted area Impact y/n > -------------------------------- > Docs n > Build system n > RPM/packaging n > Configuration files n > Startup scripts n > SAF services y > OpenSAF services n > Core libraries n > Samples n > Tests n > Other n > > > Comments (indicate scope for each "y" above): > --------------------------------------------- > > changeset 8073b1de0515c7a82cc4cdc89aee39a682a86e06 > Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> > Date: Thu, 21 May 2015 14:26:58 +0200 > > IMM: Detach of PBE aborts all non-critical and non-empty CCBs [#1261] > > If the PBE detaches and re-attaches while there are one or more open > non- > critical (not yet committing) but non-empty CCBs, then before this > enhancement one would see the following in the syslog at apply of the > CCB: > > May 20 13:25:33 SC-2 local0.notice osafimmnd[406]: NO STARTING PBE > process. > ...... May 20 13:25:34 SC-2 local0.notice osafimmnd[406]: NO PBE-OI > established on this SC. Dumping incrementally to file imm.db May 20 > 13:25:49 > SC-2 local0.info osafimmnd[406]: IN GOING FROM IMM_CCB_PREPARE to > IMM_CCB_CRITICAL Ccb:4 May 20 13:25:49 SC-2 user.notice osafimmpbed: NO > Record for ccb 0x4 not found or found aborted in ok_for_critical May 20 > 13:25:49 SC-2 user.warn osafimmpbed: WA WARNING: CCB record for 4 does > not > have correct op-count May 20 13:25:49 SC-2 local0.notice > osafimmnd[406]: NO > Invalid error reported implementer 'OpenSafImmPBE', Ccb 4 will be > aborted > > While this does catch the problem and aborts the CCB, the op-count > mechanism > that catches this is not intended for handling regular processing > cases. It > is an extra safety harness intended to catch bugs, lost messages, or > incorrect behavior of the PBE. > > This enhancement avoids dependence on the op-count safety harness by > having > the restarted PBE (primary or slave) invoking the special > admin-operation > that aborts all non-critical CCBs in the immsv. See enhancement ticket > #1107 > or the IMMSV README for details about his admin-operation. > > The newly (re)started PBE invokes the admin-operation asynchronously to > avoid getting blocked waiting on reply for this admin-op. The risk of > the > admin-op failing is minimal and if it does fail then we end up in the > same > distributed logic as we have today. That is we would end up in the > op-count > safety-harness. No CCB can get applied without ack from the PBE and so > the > admin-operation, if it is successfully received by the IMMND coord, > should > result in all currently non-critical CCBS getting aborted before the > PBE can > get any completed/apply for such a CCB over FEVS. > > With this enhancement, if the PBE detaches and re-attaches while there > are > one or more open non-critical (not yet committing) and non-empty CCBs, > then > these CCBs will be aborted. The newly attached PBE may possibly get an > abort > callback for such CCbs, but these are ignored by the PBE. > > With this enhancement one will see something like the following in the > syslog at an attempt tp apply a CCB that was active during detach and > attach > of PBE: > > May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Persistent Back > End OI > attached, pid: 764 May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO > Received: immadm -o 202 safRdn=immManagement,safApp=safImmService May 21 > 12:41:34 SC-2 local0.info osafimmnd[406]: IN sAbortNonCriticalCcbs = > true; > May 21 12:41:34 SC-2 local0.notice osafimmnd[406]: NO Implementer > connected: > 19 (OpenSafImmPBE) <332, 2020f> May 21 12:41:34 SC-2 user.info > osafimmpbed: > IN Admop for aborting CCBs result: 1, immsv returned 1 May 21 12:41:34 > SC-2 > user.notice osafimmpbed: NO Update epoch 21 committing with > ccbId:100000014/4294967316 May 21 12:41:34 SC-2 local0.notice > osafimmd[396]: > NO IMMND coord at 2020f May 21 12:41:34 SC-2 local0.info > osafimmnd[406]: IN > Update of epoch is PERSISTENT. May 21 12:41:35 SC-2 local0.notice > osafimmnd[406]: NO PBE-OI established on this SC. Dumping incrementally > to > file imm.db May 21 12:41:35 SC-2 local0.info osafimmnd[406]: IN > sAbortNonCriticalCcbs is true => set max_oi_timeout to 0 May 21 12:41:35 > SC-2 local0.notice osafimmnd[406]: NO CCB 5 aborted by: immadm -o 202 > safRdn=immManagement,safApp=safImmService May 21 12:41:35 SC-2 > local0.info > osafimmnd[406]: IN sAbortNonCriticalCcbs reset to false May 21 12:41:35 > SC-2 > local0.warn osafimmnd[406]: WA Timeout while waiting for implementer, > aborting ccb:5 May 21 12:41:35 SC-2 user.warn osafimmpbed: WA Failed to > find > CCB object for 5/5 May 21 12:41:45 SC-2 local0.notice osafimmnd[406]: > NO Ccb > <5> not in correct state (12) for Apply ignoring request May 21 12:41:45 > SC-2 local0.warn osafimmnd[406]: WA Spurious and redundant ccb-apply > request > ignored ccbId:5 > > > Complete diffstat: > ------------------ > osaf/services/saf/immsv/immpbed/immpbe.cc | 24 ++++++++++++++++++++++-- > 1 files changed, 22 insertions(+), 2 deletions(-) > > > Testing Commands: > ----------------- > I tested using immcfg in explicit commit mode and immapplier for > having some OIs. > > > Testing, Expected Results: > -------------------------- > This enhancement should be tested on top of enhancement #1107. > Killing the PBE (in 2PBE killing either primary or slave or both) > results in the restarting PBE processes generating the abort ccbs > admin-op. This should nearly always result in any CCBs getting aborted > before the OM client can attempt to apply the CCB. An apply by the > OM client very close in time with the re-attachement could get processed > before the abort-admin-op is acted on by the IMMND coord, but it is > low probability. If it happens then one could still see the old behavior, > i.e. ending up in the op-count safety-harness. An attempt by the OM-client > to apply when the PBE is absent will result in the abort of the CCB due > to missing PBE. > > > Conditions of Submission: > ------------------------- > Ack from Neel and Zoran. > > > Arch Built Started Linux distro > ------------------------------------------- > mips n n > mips64 n n > x86 n n > x86_64 n n > powerpc n n > powerpc64 n n > > > Reviewer Checklist: > ------------------- > [Submitters: make sure that your review doesn't trigger any checkmarks!] > > > Your checkin has not passed review because (see checked entries): > > ___ Your RR template is generally incomplete; it has too many blank entries > that need proper data filled in. > > ___ You have failed to nominate the proper persons for review and push. > > ___ Your patches do not have proper short+long header > > ___ You have grammar/spelling in your header that is unacceptable. > > ___ You have exceeded a sensible line length in your headers/comments/text. > > ___ You have failed to put in a proper Trac Ticket # into your commits. > > ___ You have incorrectly put/left internal data in your comments/files > (i.e. internal bug tracking tool IDs, product names etc) > > ___ You have not given any evidence of testing beyond basic build tests. > Demonstrate some level of runtime or other sanity testing. > > ___ You have ^M present in some of your files. These have to be removed. > > ___ You have needlessly changed whitespace or added whitespace crimes > like trailing spaces, or spaces before tabs. > > ___ You have mixed real technical changes with whitespace and other > cosmetic code cleanup changes. These have to be separate commits. > > ___ You need to refactor your submission into logical chunks; there is > too much content into a single commit. > > ___ You have extraneous garbage in your review (merge commits etc) > > ___ You have giant attachments which should never have been sent; > Instead you should place your content in a public tree to be pulled. > > ___ You have too many commits attached to an e-mail; resend as threaded > commits, or place in a public tree for a pull. > > ___ You have resent this content multiple times without a clear indication > of what has changed between each re-send. > > ___ You have failed to adequately and individually address all of the > comments and change requests that were proposed in the initial review. > > ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) > > ___ Your computer have a badly configured date and time; confusing the > the threaded patch review. > > ___ Your changes affect IPC mechanism, and you don't present any results > for in-service upgradability test. > > ___ Your changes affect user manual and documentation, your patch series > do not contain the patch that updates the Doxygen manual. > ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel