Summary: IMM: CcbApply retry logic made fevs safe (#535)
Review request for Trac Ticket(s): #525
Peer Reviewer(s): Neel
Pull request to: 
Affected branch(es): 4.2, 4.3 default(4.4)
Development branch: 

--------------------------------
Impacted area       Impact y/n
--------------------------------
 Docs                    n
 Build system            n
 RPM/packaging           n
 Configuration files     n
 Startup scripts         n
 SAF services            n
 OpenSAF services        n
 Core libraries          n
 Samples                 n
 Tests                   n
 Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------

Note that the third patch is only for 4.3 and default(4.4) NOT for 4.2.

changeset b4dac755b75231c4e79bf0375ee8b93163a693e4
Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
Date:   Fri, 09 Aug 2013 08:42:01 +0200

        IMM: CcbApply retry logic made fevs safe [#535]

        The function ImmModel::immNotPbeWritable() is made time insensitive. 
Root
        cause of the problem was that this function depended on a timeout in 
waiting
        on ccbs in critical, but the threshold for deciding on this timeout was 
not
        fevs safe. That is the timeout could be flagged on some nodes but not
        others, at the same fevs event i.e. the ccb-apply. The function instead
        counts the *number* of ccbs that are in critical i.e. waiting only on 
reply
        from PBE.

changeset ea7cad2f1de4cbc3811e5592c98d42cd5e7731e8
Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
Date:   Fri, 09 Aug 2013 09:27:57 +0200

        IMM: Ignore spurious/redundant ccb-apply from client [#535]

        A spurious and redundant ccb-apply generated by a faulty code elsewhere
        shall be ignored by the immnd server. Bugs in the imma library or new 
bugs
        introduced in the immnd server in the future should never be able to 
cause
        such a serious error as an inconsistent commit. The problem here was 
that
        the arrival of the second apply reaching the ccb already in critical at 
some
        processor(s) caused the ccb to get *aborted* in imm-ram while being in
        critical. That should never be allowed to happen. A ccb in critical is
        waiting on the commit decision from PBE and PBE alone. Patch also fixes 
a
        potential source of redundant apply in imma_om_api.c potentially caused 
by
        mutithreaded applications (incorrect usage of handle).

changeset 835508b5f0917710219948b9b79da9096ebdf5c1
Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
Date:   Fri, 09 Aug 2013 09:56:53 +0200

        IMM: Correction to ccb handling in immnd_fevs_local_checks() [#535]

        Patch only for 4.3 and default(4.4). NOT for 4.2 branch

        The logic in immnd_fevs_local_checks() in immnd_evt.c is corrected for 
CCB
        related messages to use immNotPbeWritable and not immNotWritable() as 
the
        precheck. The former is unnecessarily strict since the intention is to 
allow
        ongoing ccbs a period of grace to complete before sync starts, but not 
allow
        new ccb-id's to be generated. The logic here also did not recognize the
        special encoding of TRY_AGAIN needed towards the library for CcbApply.


Complete diffstat:
------------------
 osaf/libs/agents/saf/imma/imma_om_api.c   |   9 +++++++++
 osaf/services/saf/immsv/immnd/ImmModel.cc |  34 
++++++++++++++++++----------------
 osaf/services/saf/immsv/immnd/immnd_evt.c |  25 +++++++++++++++++++++----
 3 files changed, 48 insertions(+), 20 deletions(-)


Testing Commands:
-----------------
Very difficult to reproduce.
This has only been seen once and requires heavy ccb traffic and probably also
backlog, or something that causes severely skewed fevs messaging in the cluster.
Virtualized systems are probably more vulnerable and of course systems where
the execution load is sverely skewed over the processors. 

Testing, Expected Results:
--------------------------
The test application must be rigged with retry loop arround ccbApply.


Conditions of Submission:
-------------------------
Ack from Neel (or someone at Oracle)


Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      n          n
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
    that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
    (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
    Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
    like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
    cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
    too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
    Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
    commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
    of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
    comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
    the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
    for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
    do not contain the patch that updates the Doxygen manual.


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to