Re: [devel] [PATCH 0 of 1] Review Request for IMM: 2PBE - PBE-A tolerates BAD_OPERATION reply on ccb-prepare from PBE-B [#830]

Neelakanta Reddy Tue, 08 Apr 2014 02:21:07 -0700

Hi AndersBj,

Reviewed and tested the patch.
Ack.


while testing in one of the scenario(when a delay is introduced in slave 
saImmOiRtObjectCreate_2 and saImmOmAdminOwnerSet) ERR_NOT_EXIST is returned.

syslog:
---------
Apr  8 14:41:35 Slot-3 osafimmpbed: IN saImmRepositoryInit: 
SA_IMM_KEEP_REPOSITORY - attaching to repository
Apr  8 14:41:35 Slot-3 osafimmpbed: NO pbeDaemon starting with obj-count:352
Apr  8 14:41:35 Slot-3 osafimmnd[23043]: NO Persistent Back End OI 
attached, pid: 23440
Apr  8 14:41:35 Slot-3 osafimmnd[23043]: NO Implementer connected: 12 
(OpenSafImmPBE) <438, 2010f>
Apr  8 14:41:35 Slot-3 osafimmnd[23043]: NO Persistent Back End OI 
attached, pid: 23440
Apr  8 14:41:35 Slot-3 osafimmnd[23043]: NO Implementer connected: 13 
(OsafImmPbeRt_A) <439, 2010f>
Apr  8 14:41:35 Slot-3 osafimmnd[23043]: NO implementer for class 
'OpensafImm' is OpenSafImmPBE => class extent is safe.
Apr  8 14:41:35 Slot-3 osafimmpbed: IN Primary PBE got ERR_NOT_EXIST on 
atempt to update epoch towards slave PBE - ignoring
Apr  8 14:41:35 Slot-3 osafimmpbed: NO Update epoch 4 committing with 
ccbId:100000002/4294967298
Apr  8 14:41:36 Slot-3 osafimmnd[23043]: NO PBE-OI established on this 
SC. Dumping incrementally to file imm.db
Apr  8 14:41:36 Slot-3 osafimmnd[23043]: NO Epoch set to 5 in ImmModel
Apr  8 14:41:36 Slot-3 osafimmd[23028]: NO Successfully announced dump 
at node 2020f. New Epoch:5
Apr  8 14:41:36 Slot-3 osafimmd[23028]: NO ACT: New Epoch for IMMND 
process at node 2010f old epoch: 4  new epoch:5
Apr  8 14:41:36 Slot-3 osafimmpbed: IN Primary PBE got ERR_NOT_EXIST on 
atempt to update epoch towards slave PBE - ignoring
Apr  8 14:41:36 Slot-3 osafimmpbed: NO Update epoch 5 committing with 
ccbId:100000003/4294967299
Apr  8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer connected: 14 
(implementer_rt_app) <441, 2010f>
Apr  8 14:41:36 Slot-3 osafimmpbed: IN Starting distributed PBE commit 
for PRTO create Ccb:100000004/4294967300
Apr  8 14:41:36 Slot-3 osafimmpbed: WA Start prepare for ccb: 
100000004/4294967300 towards slave PBE returned: '12' from Immsv
Apr  8 14:41:36 Slot-3 osafimmpbed: WA PBE-A failed to prepare PRTO 
create Ccb:100000004/4294967300 towards PBE-B
Apr  8 14:41:36 Slot-3 osafimmpbed: NO 2PBE Error (20) in PRTO create 
(ccbId:100000004)
Apr  8 14:41:36 Slot-3 osafimmnd[23043]: WA Create of PERSISTENT runtime 
object 'parent' REVERTED. PBE rc:20
Apr  8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer locally 
disconnected. Marking it as doomed 14 <441, 2010f> (implementer_rt_app)
Apr  8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer disconnected 14 
<441, 2010f> (implementer_rt_app)
Apr  8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer (applier) 
connected: 15 (@OpenSafImmPBE) <0, 2020f>
Apr  8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer connected: 16 
(OsafImmPbeRt_B) <0, 2020f>
Apr  8 14:41:37 Slot-3 osafimmnd[23043]: NO PBE slave established on 
other SC. Dumping incrementally to file imm.db
Apr  8 14:41:40 Slot-3 osafimmpbed: IN Slave PBE replied with OK on 
attempt to update epoch


The ERR_NOT_EXIST is returned because the delay  of following error (the 
joining of slave PBE is arrived late or joined slowly):

Apr  8 14:41:36.227575 osafimmnd [23043:ImmModel.cc:9840] T5 Admin op on 
objectName:osafImmPbeRt=B,opensafImm=opensafImm,safApp=safImmService
Apr  8 14:41:36.227591 osafimmnd [23043:ImmModel.cc:9891] T7 
ERR_NOT_EXIST: object 
'osafImmPbeRt=B,opensafImm=opensafImm,safApp=safImmService' does not exist


In this case, also the TRY_AGAIN may be supported for ERR_NOT_EXIST

/Neel.

On Tuesday 01 April 2014 10:57 PM, Anders Bjornerstedt wrote:
> Summary: IMM: 2PBE - PBE-A tolerates BAD_OPERATION reply on ccb-prepare from 
> PBE-B [#830]
> Review request for Trac Ticket(s): 830
> Peer Reviewer(s): Neel
> Pull request to:
> Affected branch(es): 4.4; default(4.5)
> Development branch:
>
> --------------------------------
> Impacted area       Impact y/n
> --------------------------------
>   Docs                    n
>   Build system            n
>   RPM/packaging           n
>   Configuration files     n
>   Startup scripts         n
>   SAF services            y
>   OpenSAF services        n
>   Core libraries          n
>   Samples                 n
>   Tests                   n
>   Other                   n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
>
> changeset 31109d862e0218f9da84d3bbd152916f32aed31f
> Author:       Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
> Date: Tue, 01 Apr 2014 19:16:46 +0200
>
>       IMM: 2PBE - PBE-A tolerates BAD_OPERATION reply on ccb-prepare from 
> PBE-B
>       [#830]
>
>       An SMF campaign enables the PBE (with 2PBE) and immediately attempts to
>       update a PRTA. This fails because the slave PBE (PBE-B) has not 
> completed
>       its initialization when it receives the prepare message (for the PRTA
>       update). This causes the PRTA update to be rejected. It also causes the 
> PBE
>       slave to exit and restart again due to an erroneous abort of an empty 
> sqlite
>       transaction.
>
>       Mar 31 10:33:19 SC-2-1 osafimmnd[13967]: NO ERR_BAD_OPERATION: Mismatch 
> on
>       administrative owner '' != 'safImmService' Mar 31 10:33:19 SC-2-1
>       osafimmpbed: WA Start prepare for ccb: 100000078/4294967416 towards 
> slave
>       PBE returned: '20' from Immsv Mar 31 10:33:19 SC-2-1 osafimmpbed: WA 
> PBE-A
>       failed to prepare PRTA update Ccb:100000078/4294967416 towards PBE-B 
> Mar 31
>       10:33:19 SC-2-1 osafimmpbed: NO 2PBE Error (20) in PRTA update
>       (ccbId:100000078) Mar 31 10:33:19 SC-2-1 osafimmnd[13967]: WA update of
>       PERSISTENT runtime attributes in object 'safSmfCampaign=ERIC-
>       TestAppInstall,safApp=safSmfService' REVERTED. PBE rc:20
>
>       Mar 31 10:33:22 SC-2-2 osafimmpbed: IN PBE slave waiting for prepare 
> from
>       primary on PRTA update ccb:100000078 Mar 31 10:33:22 SC-2-2 
> osafimmnd[5243]:
>       WA update of PERSISTENT runtime attributes in object 
> 'safSmfCampaign=ERIC-
>       TestAppInstall,safApp=safSmfService' REVERTED. PBE rc:20 Mar 31 10:33:24
>       SC-2-2 osafimmpbed: IN PBE slave waiting for prepare from primary on 
> PRTA
>       update ccb:100000078 Mar 31 10:33:24 SC-2-2 osafimmpbed: NO Slave PBE 
> time-
>       out in waiting on porepare for PRTA update ccb:100000078 
> dn:safSmfCampaign
>       =ERIC-TestAppInstall,safApp=safSmfService Mar 31 10:33:24 SC-2-2
>       osafimmpbed: ER SQL statement ('ROLLBACK') failed because: cannot 
> rollback -
>       no transaction is active Mar 31 10:33:24 SC-2-2 osafimmpbed: ER Exiting
>       (line:2827)
>
>       The problem is the time gap between the creation of the RTO 
> representing the
>       slave PBE and the setting of admin-owner by the slave PBE for that RTO.
>       Admin owner must be set for an admin-operation on the object to succeed.
>
>       The fix is to have the primary PBE be tolerant of receiving
>       ERR_BAD_OPERATION on the prepare request, treating it the same way it 
> treats
>       ERR_NOT_EXIST for the slave PBE RTO not existing, or ERR_TRY_AGAIN for 
> the
>       slave PBE still being busy with some other transaction. A fix is also 
> made
>       to the pbeAbortTrans function to do nothing if the transaction is empty.
>
>
> Complete diffstat:
> ------------------
>   osaf/libs/common/immsv/immpbe_dump.cc            |   9 +++++----
>   osaf/services/saf/immsv/immpbed/immpbe_daemon.cc |  12 ++++++++----
>   2 files changed, 13 insertions(+), 8 deletions(-)
>
>
> Testing Commands:
> -----------------
> Really difficult to test.
> I had to do elaborate fault injection of sleeps in the PBE and use
> a hacked version of immapplier to generate PRTO operations.
>
>
> Testing, Expected Results:
> --------------------------
> Any attempt to perform a PRTO create/delete/update or any CCB apply,
> that arrives at the slave PBE (PE-B) after it has created its PBE
> runtime object, but before it has set admin-owner for it (hint
> insert a delay between these); will result in the primary
> receiving ERR_BAD_OPERATION from the slave on the admin-op
> request for preparing the transaction. Without this path the
> primary PBE (PBE-A) will regard the error as failure to process
> the PRTO operation and revert it. With this patch. the primary
> PBE will treat ERR_BAD_OPERATION the same way as TRY_AGAIN,
> with an upper time limit.
>
> In addition, the slave PBE will not restart due to failure
> to abort an empty sqlite transaction. The abort occurs in the
> slave without this patch because the slave has received the
> operational callback, but never receives the prepare.
>
>
> Conditions of Submission:
> -------------------------
> Ack from Neel.
>
>
> Arch      Built     Started    Linux distro
> -------------------------------------------
> mips        n          n
> mips64      n          n
> x86         n          n
> x86_64      n          n
> powerpc     n          n
> powerpc64   n          n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>      that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>      (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>      Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>      like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>      cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>      too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>      Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>      commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>      of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
>      comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
>      the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
>      for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
>      do not contain the patch that updates the Doxygen manual.
>


------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 0 of 1] Review Request for IMM: 2PBE - PBE-A tolerates BAD_OPERATION reply on ccb-prepare from PBE-B [#830]

Reply via email to