Summary: IMM: CcbApply retry logic made fevs safe [#535] Review request for Trac Ticket(s): 535 Peer Reviewer(s): Neel Pull request to: Affected branch(es): 4.2, 4.3, default(4.4) Development branch:
-------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- Sorry for the previous botched review request. Note that the THIRD patch is only for 4.3 and default/4.4 but NOT for 4.2. changeset b4dac755b75231c4e79bf0375ee8b93163a693e4 Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> Date: Fri, 09 Aug 2013 08:42:01 +0200 IMM: CcbApply retry logic made fevs safe [#535] The function ImmModel::immNotPbeWritable() is made time insensitive. Root cause of the problem was that this function depended on a timeout in waiting on ccbs in critical, but the threshold for deciding on this timeout was not fevs safe. That is the timeout could be flagged on some nodes but not others, at the same fevs event i.e. the ccb-apply. The function instead counts the *number* of ccbs that are in critical i.e. waiting only on reply from PBE. changeset ea7cad2f1de4cbc3811e5592c98d42cd5e7731e8 Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> Date: Fri, 09 Aug 2013 09:27:57 +0200 IMM: Ignore spurious/redundant ccb-apply from client [#535] A spurious and redundant ccb-apply generated by a faulty code elsewhere shall be ignored by the immnd server. Bugs in the imma library or new bugs introduced in the immnd server in the future should never be able to cause such a serious error as an inconsistent commit. The problem here was that the arrival of the second apply reaching the ccb already in critical at some processor(s) caused the ccb to get *aborted* in imm-ram while being in critical. That should never be allowed to happen. A ccb in critical is waiting on the commit decision from PBE and PBE alone. Patch also fixes a potential source of redundant apply in imma_om_api.c potentially caused by mutithreaded applications (incorrect usage of handle). changeset 835508b5f0917710219948b9b79da9096ebdf5c1 Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> Date: Fri, 09 Aug 2013 09:56:53 +0200 IMM: Correction to ccb handling in immnd_fevs_local_checks() [#535] Patch only for 4.3 and default(4.4). NOT for 4.2 branch The logic in immnd_fevs_local_checks() in immnd_evt.c is corrected for CCB related messages to use immNotPbeWritable and not immNotWritable() as the precheck. The former is unnecessarily strict since the intention is to allow ongoing ccbs a period of grace to complete before sync starts, but not allow new ccb-id's to be generated. The logic here also did not recognize the special encoding of TRY_AGAIN needed towards the library for CcbApply. Complete diffstat: ------------------ osaf/libs/agents/saf/imma/imma_om_api.c | 9 +++++++++ osaf/services/saf/immsv/immnd/ImmModel.cc | 34 ++++++++++++++++++---------------- osaf/services/saf/immsv/immnd/immnd_evt.c | 25 +++++++++++++++++++++---- 3 files changed, 48 insertions(+), 20 deletions(-) Testing Commands: ----------------- Very difficult to reproduce. Requires heavy ccb traffic and PBE backlog. Test applicaiton must have a retry loop arround ccbApply. Perform immnd sync regularly, for example by killing an immnd. Testing, Expected Results: -------------------------- Watch for any immnd crashes. They should of course not happen. See ticket #535 for details of the symptoms. Conditions of Submission: ------------------------- Ack from Neel or someone from Oracle. Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel