Hi Neel Thanks for the comment on ERR_NOT_EXIST. I will ad that error code also to the ones checked as reply from the immsv (rc2B) before pushing.
Seen in retrospect, it would have been nice to have had #799 (admin-op directly targeting implementer) implemented before 2PBE, since the 2PBE implementation could then have communicated directly with the slave PBE implementer, avoiding these added initialization problems just because an object is needed for the admin-op. I may possibly reroganize this code after #799. /AndersBj Neelakanta Reddy wrote: > Hi AndersBj, > > Reviewed and tested the patch. > Ack. > > while testing in one of the scenario(when a delay is introduced in > slave saImmOiRtObjectCreate_2 and saImmOmAdminOwnerSet) ERR_NOT_EXIST > is returned. > > syslog: > --------- > Apr 8 14:41:35 Slot-3 osafimmpbed: IN saImmRepositoryInit: > SA_IMM_KEEP_REPOSITORY - attaching to repository > Apr 8 14:41:35 Slot-3 osafimmpbed: NO pbeDaemon starting with > obj-count:352 > Apr 8 14:41:35 Slot-3 osafimmnd[23043]: NO Persistent Back End OI > attached, pid: 23440 > Apr 8 14:41:35 Slot-3 osafimmnd[23043]: NO Implementer connected: 12 > (OpenSafImmPBE) <438, 2010f> > Apr 8 14:41:35 Slot-3 osafimmnd[23043]: NO Persistent Back End OI > attached, pid: 23440 > Apr 8 14:41:35 Slot-3 osafimmnd[23043]: NO Implementer connected: 13 > (OsafImmPbeRt_A) <439, 2010f> > Apr 8 14:41:35 Slot-3 osafimmnd[23043]: NO implementer for class > 'OpensafImm' is OpenSafImmPBE => class extent is safe. > Apr 8 14:41:35 Slot-3 osafimmpbed: IN Primary PBE got ERR_NOT_EXIST > on atempt to update epoch towards slave PBE - ignoring > Apr 8 14:41:35 Slot-3 osafimmpbed: NO Update epoch 4 committing with > ccbId:100000002/4294967298 > Apr 8 14:41:36 Slot-3 osafimmnd[23043]: NO PBE-OI established on this > SC. Dumping incrementally to file imm.db > Apr 8 14:41:36 Slot-3 osafimmnd[23043]: NO Epoch set to 5 in ImmModel > Apr 8 14:41:36 Slot-3 osafimmd[23028]: NO Successfully announced dump > at node 2020f. New Epoch:5 > Apr 8 14:41:36 Slot-3 osafimmd[23028]: NO ACT: New Epoch for IMMND > process at node 2010f old epoch: 4 new epoch:5 > Apr 8 14:41:36 Slot-3 osafimmpbed: IN Primary PBE got ERR_NOT_EXIST > on atempt to update epoch towards slave PBE - ignoring > Apr 8 14:41:36 Slot-3 osafimmpbed: NO Update epoch 5 committing with > ccbId:100000003/4294967299 > Apr 8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer connected: 14 > (implementer_rt_app) <441, 2010f> > Apr 8 14:41:36 Slot-3 osafimmpbed: IN Starting distributed PBE commit > for PRTO create Ccb:100000004/4294967300 > Apr 8 14:41:36 Slot-3 osafimmpbed: WA Start prepare for ccb: > 100000004/4294967300 towards slave PBE returned: '12' from Immsv > Apr 8 14:41:36 Slot-3 osafimmpbed: WA PBE-A failed to prepare PRTO > create Ccb:100000004/4294967300 towards PBE-B > Apr 8 14:41:36 Slot-3 osafimmpbed: NO 2PBE Error (20) in PRTO create > (ccbId:100000004) > Apr 8 14:41:36 Slot-3 osafimmnd[23043]: WA Create of PERSISTENT > runtime object 'parent' REVERTED. PBE rc:20 > Apr 8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer locally > disconnected. Marking it as doomed 14 <441, 2010f> (implementer_rt_app) > Apr 8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer disconnected > 14 <441, 2010f> (implementer_rt_app) > Apr 8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer (applier) > connected: 15 (@OpenSafImmPBE) <0, 2020f> > Apr 8 14:41:36 Slot-3 osafimmnd[23043]: NO Implementer connected: 16 > (OsafImmPbeRt_B) <0, 2020f> > Apr 8 14:41:37 Slot-3 osafimmnd[23043]: NO PBE slave established on > other SC. Dumping incrementally to file imm.db > Apr 8 14:41:40 Slot-3 osafimmpbed: IN Slave PBE replied with OK on > attempt to update epoch > > > The ERR_NOT_EXIST is returned because the delay of following error > (the joining of slave PBE is arrived late or joined slowly): > > Apr 8 14:41:36.227575 osafimmnd [23043:ImmModel.cc:9840] T5 Admin op > on objectName:osafImmPbeRt=B,opensafImm=opensafImm,safApp=safImmService > Apr 8 14:41:36.227591 osafimmnd [23043:ImmModel.cc:9891] T7 > ERR_NOT_EXIST: object > 'osafImmPbeRt=B,opensafImm=opensafImm,safApp=safImmService' does not > exist > > > In this case, also the TRY_AGAIN may be supported for ERR_NOT_EXIST > > /Neel. > > On Tuesday 01 April 2014 10:57 PM, Anders Bjornerstedt wrote: >> Summary: IMM: 2PBE - PBE-A tolerates BAD_OPERATION reply on >> ccb-prepare from PBE-B [#830] >> Review request for Trac Ticket(s): 830 >> Peer Reviewer(s): Neel >> Pull request to: >> Affected branch(es): 4.4; default(4.5) >> Development branch: >> >> -------------------------------- >> Impacted area Impact y/n >> -------------------------------- >> Docs n >> Build system n >> RPM/packaging n >> Configuration files n >> Startup scripts n >> SAF services y >> OpenSAF services n >> Core libraries n >> Samples n >> Tests n >> Other n >> >> >> Comments (indicate scope for each "y" above): >> --------------------------------------------- >> >> changeset 31109d862e0218f9da84d3bbd152916f32aed31f >> Author: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> >> Date: Tue, 01 Apr 2014 19:16:46 +0200 >> >> IMM: 2PBE - PBE-A tolerates BAD_OPERATION reply on ccb-prepare >> from PBE-B >> [#830] >> >> An SMF campaign enables the PBE (with 2PBE) and immediately >> attempts to >> update a PRTA. This fails because the slave PBE (PBE-B) has not >> completed >> its initialization when it receives the prepare message (for the >> PRTA >> update). This causes the PRTA update to be rejected. It also >> causes the PBE >> slave to exit and restart again due to an erroneous abort of an >> empty sqlite >> transaction. >> >> Mar 31 10:33:19 SC-2-1 osafimmnd[13967]: NO ERR_BAD_OPERATION: >> Mismatch on >> administrative owner '' != 'safImmService' Mar 31 10:33:19 SC-2-1 >> osafimmpbed: WA Start prepare for ccb: 100000078/4294967416 >> towards slave >> PBE returned: '20' from Immsv Mar 31 10:33:19 SC-2-1 osafimmpbed: >> WA PBE-A >> failed to prepare PRTA update Ccb:100000078/4294967416 towards >> PBE-B Mar 31 >> 10:33:19 SC-2-1 osafimmpbed: NO 2PBE Error (20) in PRTA update >> (ccbId:100000078) Mar 31 10:33:19 SC-2-1 osafimmnd[13967]: WA >> update of >> PERSISTENT runtime attributes in object 'safSmfCampaign=ERIC- >> TestAppInstall,safApp=safSmfService' REVERTED. PBE rc:20 >> >> Mar 31 10:33:22 SC-2-2 osafimmpbed: IN PBE slave waiting for >> prepare from >> primary on PRTA update ccb:100000078 Mar 31 10:33:22 SC-2-2 >> osafimmnd[5243]: >> WA update of PERSISTENT runtime attributes in object >> 'safSmfCampaign=ERIC- >> TestAppInstall,safApp=safSmfService' REVERTED. PBE rc:20 Mar 31 >> 10:33:24 >> SC-2-2 osafimmpbed: IN PBE slave waiting for prepare from primary >> on PRTA >> update ccb:100000078 Mar 31 10:33:24 SC-2-2 osafimmpbed: NO Slave >> PBE time- >> out in waiting on porepare for PRTA update ccb:100000078 >> dn:safSmfCampaign >> =ERIC-TestAppInstall,safApp=safSmfService Mar 31 10:33:24 SC-2-2 >> osafimmpbed: ER SQL statement ('ROLLBACK') failed because: cannot >> rollback - >> no transaction is active Mar 31 10:33:24 SC-2-2 osafimmpbed: ER >> Exiting >> (line:2827) >> >> The problem is the time gap between the creation of the RTO >> representing the >> slave PBE and the setting of admin-owner by the slave PBE for >> that RTO. >> Admin owner must be set for an admin-operation on the object to >> succeed. >> >> The fix is to have the primary PBE be tolerant of receiving >> ERR_BAD_OPERATION on the prepare request, treating it the same >> way it treats >> ERR_NOT_EXIST for the slave PBE RTO not existing, or >> ERR_TRY_AGAIN for the >> slave PBE still being busy with some other transaction. A fix is >> also made >> to the pbeAbortTrans function to do nothing if the transaction is >> empty. >> >> >> Complete diffstat: >> ------------------ >> osaf/libs/common/immsv/immpbe_dump.cc | 9 +++++---- >> osaf/services/saf/immsv/immpbed/immpbe_daemon.cc | 12 ++++++++---- >> 2 files changed, 13 insertions(+), 8 deletions(-) >> >> >> Testing Commands: >> ----------------- >> Really difficult to test. >> I had to do elaborate fault injection of sleeps in the PBE and use >> a hacked version of immapplier to generate PRTO operations. >> >> >> Testing, Expected Results: >> -------------------------- >> Any attempt to perform a PRTO create/delete/update or any CCB apply, >> that arrives at the slave PBE (PE-B) after it has created its PBE >> runtime object, but before it has set admin-owner for it (hint >> insert a delay between these); will result in the primary >> receiving ERR_BAD_OPERATION from the slave on the admin-op >> request for preparing the transaction. Without this path the >> primary PBE (PBE-A) will regard the error as failure to process >> the PRTO operation and revert it. With this patch. the primary >> PBE will treat ERR_BAD_OPERATION the same way as TRY_AGAIN, >> with an upper time limit. >> >> In addition, the slave PBE will not restart due to failure >> to abort an empty sqlite transaction. The abort occurs in the >> slave without this patch because the slave has received the >> operational callback, but never receives the prepare. >> >> >> Conditions of Submission: >> ------------------------- >> Ack from Neel. >> >> >> Arch Built Started Linux distro >> ------------------------------------------- >> mips n n >> mips64 n n >> x86 n n >> x86_64 n n >> powerpc n n >> powerpc64 n n >> >> >> Reviewer Checklist: >> ------------------- >> [Submitters: make sure that your review doesn't trigger any checkmarks!] >> >> >> Your checkin has not passed review because (see checked entries): >> >> ___ Your RR template is generally incomplete; it has too many blank >> entries >> that need proper data filled in. >> >> ___ You have failed to nominate the proper persons for review and push. >> >> ___ Your patches do not have proper short+long header >> >> ___ You have grammar/spelling in your header that is unacceptable. >> >> ___ You have exceeded a sensible line length in your >> headers/comments/text. >> >> ___ You have failed to put in a proper Trac Ticket # into your commits. >> >> ___ You have incorrectly put/left internal data in your comments/files >> (i.e. internal bug tracking tool IDs, product names etc) >> >> ___ You have not given any evidence of testing beyond basic build tests. >> Demonstrate some level of runtime or other sanity testing. >> >> ___ You have ^M present in some of your files. These have to be removed. >> >> ___ You have needlessly changed whitespace or added whitespace crimes >> like trailing spaces, or spaces before tabs. >> >> ___ You have mixed real technical changes with whitespace and other >> cosmetic code cleanup changes. These have to be separate commits. >> >> ___ You need to refactor your submission into logical chunks; there is >> too much content into a single commit. >> >> ___ You have extraneous garbage in your review (merge commits etc) >> >> ___ You have giant attachments which should never have been sent; >> Instead you should place your content in a public tree to be >> pulled. >> >> ___ You have too many commits attached to an e-mail; resend as threaded >> commits, or place in a public tree for a pull. >> >> ___ You have resent this content multiple times without a clear >> indication >> of what has changed between each re-send. >> >> ___ You have failed to adequately and individually address all of the >> comments and change requests that were proposed in the initial >> review. >> >> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) >> >> ___ Your computer have a badly configured date and time; confusing the >> the threaded patch review. >> >> ___ Your changes affect IPC mechanism, and you don't present any results >> for in-service upgradability test. >> >> ___ Your changes affect user manual and documentation, your patch series >> do not contain the patch that updates the Doxygen manual. >> > ------------------------------------------------------------------------------ Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test & Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel