Hi AndersBj, > Should the slogan of this enhancement be something like: "Avoid > BAD_HANDLE result on requests on imm handle after a timeout on a ccb- > commit request" ? Sorry, my bad. Your suggested slogan is exactly about the enhancement this ticket is trying to address. The case rarely happen as you said. It might come only when PBE is hung for a while at the time of CCB apply. If accessing the sqlite database via NFS, the case might appear frequently, I guess.
> I am assuming that in this enhancement, which seems to be driven by the > handling of failure to receive the result of a ccb-commit, > then the handle is made to survive ONLY if the state uncertainty between > client and serve has been resolved. Yes. It is my intention. When getting TIMEOUT on CCB commit, IMMA does not return that error code to caller right away but holds that code for a while, then secretly check the CCB result with the local IMMND. If the CCB is committed successfully, returning SA_AIS_OK to caller, BAD_HANDLE otherwise. From user view, I think that, SA_AIS_OK returned code means there is nothing wrong happened to the call, In other words, with this case, CCB handle should still be valid and re-usable for subsequent calls. Regards, Vu > -----Original Message----- > From: and...@acm.org <anders.bjornerst...@telia.com> > Sent: Tuesday, July 24, 2018 8:05 PM > To: gary....@dektech.com.au; lennart.l...@ericsson.com; > hans.nordeb...@ericsson.com; vu.m.ngu...@dektech.com.au > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [devel] [PATCH 0/1] Review Request for imm: do not purge sync > request on client having critical CCB [#2889] > > Hi, > > I still have trouble to understand what problem this (now enhancement) > ticket is trying to fix. > > The slogan of the ticket still seems to describe a detailed "solution" to some > problem. > The ticket slogan also seems to have been changed, e.g. "do not purge...." to > "purge...". > > (The slogan change is not reflected in the review request subject line "do not > purge..", > but maybe the review request being discussed is the old review request ?) > > If the PROBLEM is that the IMM client gets BAD_HANDLE in some case, then > that is not a major problem, it is a minor problem, assuming it is rare. > Should the slogan of this enhancement be something like: "Avoid > BAD_HANDLE result on requests on imm handle after a timeout on a ccb- > commit request" ? > > One very basic and general rule about imm-handlles which I assume still holds > is that you are not allowed to multithread over a handle. > Now if you get ERR_TIMEOUT on a handle, then normally that does not > mean that the handle has gone bad. > > But handles may go bad (according to the interface spec) and the user then > knows that they can no longer use the handle and any > resources tied to the handle have been decoupled from it. Dont forget the > simplicity of that, it is valuable. > > One reason for a handle being flagged as bad is that the client handle has(or > could have) lost sync in its state relative to the servers state for that > connection. > You have to be very carefull when trying to "smooth over" handle state > confusion between client and server, especially in relation to CCB > requests and ESPECIALLY especially in relation to ccb-commit requests. The > risk of the client (all layers up) misunderstanding if the > CCB comitted or aborted must be zero. The risk of the user not knowing the > result of a ccb commit should be made as low as is realistically possible, > but can not be made zero. In case the result is unknown then the user needs > to unambiguously understand that they lost contact with the CCB while > it was in critical. > > I am assuming that in this enhancement, which seems to be driven by the > handling of failure to receive the result of a ccb-commit, > then the handle is made to survive ONLY if the state uncertainty between > client and serve has been resolved. > > For the case where the client handle can not obtain the result of a ccb-apply, > i.e. has to convey either ERR_TIMEOUT or ERR_BAD_HANDLE > to signal the failure, then ERR_BAD_HANDLE is probably better/simpler for > the user. > Both error codes are error codes that tell the user that they could not get > the > result of the ccb-commit. > But user code handling of BAD_HANDLE should be more general and > ubiquitous than user code handling ERR_TIMEOUT. > The risk is that if the handle is allowed to survive and be open for a > "chained > new CCB request", that the user will assume that > the CCB was comitted (or aborrted) neither of which is truly known at the > level of the client ccb handle. > A user in that situation being unsure may "decide" or program for invoking a > new ccb-commit attempt. > If the handle is open then possibly this would result in a commit of an empty > ccb... successfully... giving the user > an erroneous impression that the (first) ccb was now commited. To avoid the > posibillity of such confusion, it may be better > to make the handle bad so that the user unambiguously gets the ambiguity (I > hope that was not ambiguous). > > Thanks > AndersBj > > > > > > > > >----Ursprungligt meddelande---- > >Från : lennart.l...@ericsson.com > >Datum : 2018-07-24 - 10:05 (GMT) > >Till : vu.m.ngu...@dektech.com.au, hans.nordeb...@ericsson.com, > gary....@dektech.com.au > >Kopia : opensaf-devel@lists.sourceforge.net > >Ämne : Re: [devel] [PATCH 0/1] Review Request for imm: do not purge sync > request on client having critical CCB [#2889] > > > >Hi Vu, > > > >Ack. Has not tested. Should be reviewed but someone else as well > > > >Some notes: > >1. > >Tests and test cases are insufficiently described. This includes the new test > case. I know that this is the case for almost all tests also in other OpenSAF > services but I think this should be changed. At least for new test cases and > when test cases are changed documentation should be done. > >The following should always be documented: > >- What the test case is testing and criteria for a PASS > >- Prerequisites, e.g. in this case I assume that PBE must be activated and > that it must be executed on the active controller node? > >- Other dependencies if any > > > >2. > >In general it is not a good idea to handle states like; mState >= > IMM_CCB_CRITICAL > >This makes it possible to make errors like adding a new state after > IMM_CCB_CRITICAL where an abort is allowed. A reader must also assume > that all states after IMM_CCB_CRITICAL also are critical. > >At least add a comment in the ImmCcbState enum before the line " > IMM_CCB_CRITICAL = 9, // Unilateral abort no longer allowed (except by > PBE). " saying; > >// Unilateral abort no longer allowed (except by PBE). > >to make it clear that all states after IMM_CCB_CRITICAL also means that > the CCB is in a critical state and abort is not allowed > >IMM_CCB_CRITICAL = 9, // Unilateral abort no longer allowed (except by > PBE). > > > >/Lennart > > > >> -----Original Message----- > >> From: Vu Minh Nguyen <vu.m.ngu...@dektech.com.au> > >> Sent: den 6 juli 2018 08:19 > >> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Lennart Lund > >> <lennart.l...@ericsson.com>; Gary Lee <gary....@dektech.com.au> > >> Cc: opensaf-devel@lists.sourceforge.net; Vu Minh Nguyen > >> <vu.m.ngu...@dektech.com.au> > >> Subject: [PATCH 0/1] Review Request for imm: do not purge sync request > on > >> client having critical CCB [#2889] > >> > >> Summary: imm: do not purge sync request on client having critical CCB > >> [#2889] > >> Review request for Ticket(s): 2889 > >> Peer Reviewer(s): Hans, Gary > >> Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** > >> Affected branch(es): develop > >> Development branch: ticket-2889 > >> Base revision: ca6067f31038b9d7076b5836d691591e992302ee > >> Personal repository: git://git.code.sf.net/u/winhvu/review > >> > >> -------------------------------- > >> Impacted area Impact y/n > >> -------------------------------- > >> Docs n > >> Build system n > >> RPM/packaging n > >> Configuration files n > >> Startup scripts n > >> SAF services y > >> OpenSAF services n > >> Core libraries n > >> Samples n > >> Tests n > >> Other n > >> > >> NOTE: Patch(es) contain lines longer than 80 characers > >> > >> Comments (indicate scope for each "y" above): > >> --------------------------------------------- > >> *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** > >> > >> revision b61fc5f3d163b9d2d3d58662e336ed844a7be658 > >> Author: Vu Minh Nguyen <vu.m.ngu...@dektech.com.au> > >> Date: Fri, 6 Jul 2018 12:01:06 +0700 > >> > >> imm: do not purge sync request on client having critical CCB [#2889] > >> > >> When CCB apply (IMMND_EVT_A2ND_CCB_APPLY) gets TIMEOUT, > IMMND > >> will be > >> informed the timeout via msg type IMMND_EVT_A2ND_CL_TIMEOUT, > and > >> then > >> doing following things: > >> 1) Attemp to abort CCB even the CCB state is in critical. > >> 2) Remove client node from client dabase. > >> > >> Once the CCB has been comitted, saImmOmCcbApply() returns SA_AIS_OK > >> to user. > >> As the client is removed at #2 above, any IMM operations on the IMM OM > >> handle > >> such as saImmOmCcbObjectDelete() or saImmOmFinalize() will result in > >> BAD_HANDLE. > >> > >> With this fix, the client having critical CCB will not be purged when > >> timeout happens at IMM sync requests. > >> > >> > >> > >> Complete diffstat: > >> ------------------ > >> .../apitest/management/test_saImmOmCcbInitialize.c | 4 + > >> .../management/test_saImmOmCcbObjectDelete.c | 120 > >> +++++++++++++++++++++ > >> src/imm/immnd/ImmModel.cc | 18 ++-- > >> 3 files changed, 133 insertions(+), 9 deletions(-) > >> > >> > >> Testing Commands: > >> ----------------- > >> Run the newly added test case `immomtest 6 39` on the active controller. > >> > >> > >> Testing, Expected Results: > >> -------------------------- > >> The test case is PASSED. > >> > >> > >> Conditions of Submission: > >> ------------------------- > >> Ack from reviewers > >> > >> > >> Arch Built Started Linux distro > >> ------------------------------------------- > >> mips n n > >> mips64 n n > >> x86 n n > >> x86_64 n n > >> powerpc n n > >> powerpc64 n n > >> > >> > >> Reviewer Checklist: > >> ------------------- > >> [Submitters: make sure that your review doesn't trigger any checkmarks!] > >> > >> > >> Your checkin has not passed review because (see checked entries): > >> > >> ___ Your RR template is generally incomplete; it has too many blank > entries > >> that need proper data filled in. > >> > >> ___ You have failed to nominate the proper persons for review and push. > >> > >> ___ Your patches do not have proper short+long header > >> > >> ___ You have grammar/spelling in your header that is unacceptable. > >> > >> ___ You have exceeded a sensible line length in your > >> headers/comments/text. > >> > >> ___ You have failed to put in a proper Trac Ticket # into your commits. > >> > >> ___ You have incorrectly put/left internal data in your comments/files > >> (i.e. internal bug tracking tool IDs, product names etc) > >> > >> ___ You have not given any evidence of testing beyond basic build tests. > >> Demonstrate some level of runtime or other sanity testing. > >> > >> ___ You have ^M present in some of your files. These have to be removed. > >> > >> ___ You have needlessly changed whitespace or added whitespace crimes > >> like trailing spaces, or spaces before tabs. > >> > >> ___ You have mixed real technical changes with whitespace and other > >> cosmetic code cleanup changes. These have to be separate commits. > >> > >> ___ You need to refactor your submission into logical chunks; there is > >> too much content into a single commit. > >> > >> ___ You have extraneous garbage in your review (merge commits etc) > >> > >> ___ You have giant attachments which should never have been sent; > >> Instead you should place your content in a public tree to be pulled. > >> > >> ___ You have too many commits attached to an e-mail; resend as > threaded > >> commits, or place in a public tree for a pull. > >> > >> ___ You have resent this content multiple times without a clear indication > >> of what has changed between each re-send. > >> > >> ___ You have failed to adequately and individually address all of the > >> comments and change requests that were proposed in the initial review. > >> > >> ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email > >> etc) > >> > >> ___ Your computer have a badly configured date and time; confusing the > >> the threaded patch review. > >> > >> ___ Your changes affect IPC mechanism, and you don't present any results > >> for in-service upgradability test. > >> > >> ___ Your changes affect user manual and documentation, your patch > series > >> do not contain the patch that updates the Doxygen manual. > > > > > >------------------------------------------------------------------------------ > >Check out the vibrant tech community on one of the world's most > >engaging tech sites, Slashdot.org! http://sdm.link/slashdot > >_______________________________________________ > >Opensaf-devel mailing list > >Opensaf-devel@lists.sourceforge.net > >https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel