Summary: IMM: 2PBE - PBE-A tolerates BAD_OPERATION reply on ccb-prepare from 
PBE-B [#830]
Review request for Trac Ticket(s): 830
Peer Reviewer(s): Neel
Pull request to: 
Affected branch(es): 4.4; default(4.5)
Development branch:

--------------------------------
Impacted area       Impact y/n
--------------------------------
 Docs                    n
 Build system            n
 RPM/packaging           n
 Configuration files     n
 Startup scripts         n
 SAF services            y
 OpenSAF services        n
 Core libraries          n
 Samples                 n
 Tests                   n
 Other                   n


Comments (indicate scope for each "y" above):
---------------------------------------------

changeset 31109d862e0218f9da84d3bbd152916f32aed31f
Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
Date:   Tue, 01 Apr 2014 19:16:46 +0200

        IMM: 2PBE - PBE-A tolerates BAD_OPERATION reply on ccb-prepare from 
PBE-B
        [#830]

        An SMF campaign enables the PBE (with 2PBE) and immediately attempts to
        update a PRTA. This fails because the slave PBE (PBE-B) has not 
completed
        its initialization when it receives the prepare message (for the PRTA
        update). This causes the PRTA update to be rejected. It also causes the 
PBE
        slave to exit and restart again due to an erroneous abort of an empty 
sqlite
        transaction.

        Mar 31 10:33:19 SC-2-1 osafimmnd[13967]: NO ERR_BAD_OPERATION: Mismatch 
on
        administrative owner '' != 'safImmService' Mar 31 10:33:19 SC-2-1
        osafimmpbed: WA Start prepare for ccb: 100000078/4294967416 towards 
slave
        PBE returned: '20' from Immsv Mar 31 10:33:19 SC-2-1 osafimmpbed: WA 
PBE-A
        failed to prepare PRTA update Ccb:100000078/4294967416 towards PBE-B 
Mar 31
        10:33:19 SC-2-1 osafimmpbed: NO 2PBE Error (20) in PRTA update
        (ccbId:100000078) Mar 31 10:33:19 SC-2-1 osafimmnd[13967]: WA update of
        PERSISTENT runtime attributes in object 'safSmfCampaign=ERIC-
        TestAppInstall,safApp=safSmfService' REVERTED. PBE rc:20

        Mar 31 10:33:22 SC-2-2 osafimmpbed: IN PBE slave waiting for prepare 
from
        primary on PRTA update ccb:100000078 Mar 31 10:33:22 SC-2-2 
osafimmnd[5243]:
        WA update of PERSISTENT runtime attributes in object 
'safSmfCampaign=ERIC-
        TestAppInstall,safApp=safSmfService' REVERTED. PBE rc:20 Mar 31 10:33:24
        SC-2-2 osafimmpbed: IN PBE slave waiting for prepare from primary on 
PRTA
        update ccb:100000078 Mar 31 10:33:24 SC-2-2 osafimmpbed: NO Slave PBE 
time-
        out in waiting on porepare for PRTA update ccb:100000078 
dn:safSmfCampaign
        =ERIC-TestAppInstall,safApp=safSmfService Mar 31 10:33:24 SC-2-2
        osafimmpbed: ER SQL statement ('ROLLBACK') failed because: cannot 
rollback -
        no transaction is active Mar 31 10:33:24 SC-2-2 osafimmpbed: ER Exiting
        (line:2827)

        The problem is the time gap between the creation of the RTO 
representing the
        slave PBE and the setting of admin-owner by the slave PBE for that RTO.
        Admin owner must be set for an admin-operation on the object to succeed.

        The fix is to have the primary PBE be tolerant of receiving
        ERR_BAD_OPERATION on the prepare request, treating it the same way it 
treats
        ERR_NOT_EXIST for the slave PBE RTO not existing, or ERR_TRY_AGAIN for 
the
        slave PBE still being busy with some other transaction. A fix is also 
made
        to the pbeAbortTrans function to do nothing if the transaction is empty.


Complete diffstat:
------------------
 osaf/libs/common/immsv/immpbe_dump.cc            |   9 +++++----
 osaf/services/saf/immsv/immpbed/immpbe_daemon.cc |  12 ++++++++----
 2 files changed, 13 insertions(+), 8 deletions(-)


Testing Commands:
-----------------
Really difficult to test.
I had to do elaborate fault injection of sleeps in the PBE and use
a hacked version of immapplier to generate PRTO operations. 


Testing, Expected Results:
--------------------------
Any attempt to perform a PRTO create/delete/update or any CCB apply,
that arrives at the slave PBE (PE-B) after it has created its PBE
runtime object, but before it has set admin-owner for it (hint 
insert a delay between these); will result in the primary
receiving ERR_BAD_OPERATION from the slave on the admin-op 
request for preparing the transaction. Without this path the
primary PBE (PBE-A) will regard the error as failure to process
the PRTO operation and revert it. With this patch. the primary
PBE will treat ERR_BAD_OPERATION the same way as TRY_AGAIN,
with an upper time limit. 

In addition, the slave PBE will not restart due to failure 
to abort an empty sqlite transaction. The abort occurs in the
slave without this patch because the slave has received the
operational callback, but never receives the prepare. 


Conditions of Submission:
-------------------------
Ack from Neel.


Arch      Built     Started    Linux distro
-------------------------------------------
mips        n          n
mips64      n          n
x86         n          n
x86_64      n          n
powerpc     n          n
powerpc64   n          n


Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
    that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
    (i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
    Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
    like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
    cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
    too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
    Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
    commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
    of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
    comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
    the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
    for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
    do not contain the patch that updates the Doxygen manual.


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to