Hi Nhat Pham, We are all (Oracle) using the same repository which includes services (log, amf, clm, ntf) , for your information I only created the repository for Oracle internal use :)
-AVM On 2/15/2016 3:08 PM, Nhat Pham wrote: > Hi Mahesh, > > It's good. Thank you. :) > > [AVM] Up on rejoining of the SC`s The replica should be re-created regardless > of another application opens it on PL4. > ( Note : this comment is based on your explanation have not yet > reviewed/tested , > currently i am struggling with SC`s not rejoining > after headless state , i can provide you more on this once i complte my > review/testing) > > [Nhat] To make cloud resilience works, you need the patches from other > services (log, amf, clm, ntf). > @Minh: I heard that you created tar file which includes all patches. Could you > please send it to Mahesh? Thanks > > [AVM] I understand that , before I comment more on this please allow me to > understand > I am not still not very clear of the headless design in detail. > For example cluster membership of PL`s during headless state , > In the absence of SC`s (CLMD) dose the PLs is considered as > cluster nodes or not (cluster membership) ? > > [Nhat] I don't know much about this. > @ Anders: Could you please have comment about this? Thanks > > Best regards, > Nhat Pham > > -----Original Message----- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: Monday, February 15, 2016 11:19 AM > To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com > Cc: opensaf-devel@lists.sourceforge.net; 'Beatriz Brandao' > <beatriz.bran...@ericsson.com> > Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support preserving and > recovering checkpoint replicas during headless state V2 [#1621] > > Hi Nhat Pham, > > How is your holiday went > > Please find my comments below > > On 2/15/2016 8:43 AM, Nhat Pham wrote: >> Hi Mahesh, >> >> For the comment 1, the patch will be updated accordingly. > [AVM] Please hold , I will provide more comments in this week , so we can > have consolidated V3 >> For the comment 2, I think the CKPT service will not be backward >> compatible if the scAbsenceAllowed is true. >> The client can't create non-collocated checkpoint on SCs. >> >> Furthermore, this solution only protects the CKPT service from the >> case "The non-collocated checkpoint is created on a SC" >> there are still the cases where the replicas are completely lost. Ex: >> >> - The non-collocated checkpoint created on a PL. The PL reboots. Both >> replicas now locate on SCs. Then, headless state happens. All replicas are >> lost. >> - The non-collocated checkpoint has active replica locating on a PL >> and this PL restarts during headless state >> - The non-collocated checkpoint is created on PL3. This checkpoint is >> also opened on PL4. Then SCs and PL3 reboot. > [AVM] Up on rejoining of the SC`s The replica should be re-created regardless > of another application opens it on PL4. > ( Note : this comment is based on your explanation have not yet > reviewed/tested , > currently i am struggling with SC`s not rejoining > after headless state , i can provide you more on this once i complte my > review/testing) >> In this case, all replicas are lost and the client has to create it again. >> >> In case multiple nodes (which including SCs) reboot, losing replicas >> is unpreventable. The patch is to recover the checkpoints in possible cases. >> How do you think? > [AVM] I understand that , before I comment more on this please allow > me to understand > I am not still not very clear of the headless design in detail. > > For example cluster membership of PL`s during headless > state , > In the absence of SC`s (CLMD) dose the PLs is considered as > cluster nodes or not (cluster membership) ? > > - if not consider as NON cluster nodes Checkpoint > Service > API should leverage the SA Forum Cluster > Membership Service and API's can fail with > SA_AIS_ERR_UNAVAILABLE > > - if considers as cluster nodes we need to follow all > the > defined rules which are defined in SAI-AIS-CKPT-B.02.02 specification > > so give me some more time to review it completely , so that we > can have consolidated patch V3 > > -AVM >> Best regards, >> Nhat Pham >> >> -----Original Message----- >> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >> Sent: Friday, February 12, 2016 11:10 AM >> To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com >> Cc: opensaf-devel@lists.sourceforge.net; Beatriz Brandao >> <beatriz.bran...@ericsson.com> >> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support >> preserving and recovering checkpoint replicas during headless state V2 >> [#1621] >> >> >> Comment 2 : >> >> After incorporating the comment one all the Limitations should be >> prevented based on Hydra configuration is enabled in IMM status. >> >> Foe example : if some application is trying to create >> >> non-collocated checkpoint active replica getting generated/locating on >> SC then ,regardless of the heads (SC`s) status exist not exist should >> return SA_AIS_ERR_NOT_SUPPORTED >> >> In other words, rather that allowing to created non-collocated >> checkpoint when >> heads(SC`s) are exit , and non-collocated checkpoint getting >> unrecoverable after heads(SC`s) rejoins. >> >> ====================================================================== >> ======================= >>> Limitation: The CKPT service doesn't support recovering checkpoints in >>> following cases: >>> . The checkpoint which is unlinked before headless. >>> . The non-collocated checkpoint has active replica locating on SC. >>> . The non-collocated checkpoint has active replica locating on a PL >>> and this PL >>> restarts during headless state. In this cases, the checkpoint replica is >>> destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned when the >>> client >>> accesses the checkpoint in these cases. The client must re-open the >>> checkpoint. >> ====================================================================== >> ======================= >> >> -AVM >> >> >> On 2/11/2016 12:52 PM, A V Mahesh wrote: >>> Hi, >>> >>> I jut starred reviewing patch , I will be giving comments as soon as >>> I crossover any , to save some time. >>> >>> Comment 1 : >>> This functionality should be under checks if Hydra configuration is >>> enabled in IMM attrName = >>> const_cast<SaImmAttrNameT>("scAbsenceAllowed") >>> >>> Please see example how LOG/AMF services implemented it. >>> >>> -AVM >>> >>> >>> On 1/29/2016 1:02 PM, Nhat Pham wrote: >>>> Hi Mahesh, >>>> >>>> As described in the README, the CKPT service returns >>>> SA_AIS_ERR_TRY_AGAIN fault code in this case. >>>> I guess it's same for other services. >>>> >>>> @Anders: Could you please confirm this? >>>> >>>> Best regards, >>>> Nhat Pham >>>> >>>> -----Original Message----- >>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>> Sent: Friday, January 29, 2016 2:11 PM >>>> To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com >>>> Cc: opensaf-devel@lists.sourceforge.net >>>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support >>>> preserving and recovering checkpoint replicas during headless state >>>> V2 [#1621] >>>> >>>> Hi, >>>> >>>> On 1/29/2016 11:45 AM, Nhat Pham wrote: >>>>> - The behavior of application will be consistent with other >>>>> saf services like imm/amf behavior during headless state. >>>>> [Nhat] I'm not clear what you mean about "consistent"? >>>> In the obscene of Director (SC's) , what is expected return values >>>> of SAF API should ( all services ) , >>>> which are not in aposition to provide service at that moment. >>>> >>>> I think all services should return same SAF ERRS., I thinks >>>> currently we don't have it , may be Anders Widel will help us. >>>> >>>> -AVM >>>> >>>> >>>> On 1/29/2016 11:45 AM, Nhat Pham wrote: >>>>> Hi Mahesh, >>>>> >>>>> Please see the attachment for the README. Let me know if there is >>>>> any more information required. >>>>> >>>>> Regarding your comments: >>>>> - during headless state applications may behave like during >>>>> CPND restart case [Nhat] Headless state and CPND restart are >>>>> different events. Thus, the behavior is different. >>>>> Headless state is a case where both SCs go down. >>>>> >>>>> - The behavior of application will be consistent with other >>>>> saf services like imm/amf behavior during headless state. >>>>> [Nhat] I'm not clear what you mean about "consistent"? >>>>> >>>>> Best regards, >>>>> Nhat Pham >>>>> >>>>> -----Original Message----- >>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>>> Sent: Friday, January 29, 2016 11:12 AM >>>>> To: Nhat Pham <nhat.p...@dektech.com.au>; >>>>> anders.wid...@ericsson.com >>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support >>>>> preserving and recovering checkpoint replicas during headless state >>>>> V2 [#1621] >>>>> >>>>> Hi Nhat Pham, >>>>> >>>>> I stared reviewing this patch , so can please provide README file >>>>> with scope and limitations , that will help to define >>>>> testing/reviewing scope . >>>>> >>>>> Following are minimum things we can keep in mind while >>>>> reviewing/accepting patch , >>>>> >>>>> - Not effecting existing functionality >>>>> - during headless state applications may behave like during >>>>> CPND restart case >>>>> - The minimum functionally of application works >>>>> - The behavior of application will be consistent with >>>>> other saf services like imm/amf behavior during headless state. >>>>> >>>>> So please do provide any additional detailed in README if any of >>>>> the above is deviated , that allow users to know about the >>>>> limitations/deviation. >>>>> >>>>> -AVM >>>>> >>>>> On 1/4/2016 3:15 PM, Nhat Pham wrote: >>>>>> Summary: cpsv: Support preserving and recovering checkpoint >>>>>> replicas during headless state [#1621] Review request for Trac >>>>>> Ticket(s): >>>>>> #1621 Peer Reviewer(s): mahesh.va...@oracle.com; >>>>>> anders.wid...@ericsson.com Pull request to: >>>>>> mahesh.va...@oracle.com Affected branch(es): default Development >>>>>> branch: default >>>>>> >>>>>> -------------------------------- >>>>>> Impacted area Impact y/n >>>>>> -------------------------------- >>>>>> Docs n >>>>>> Build system n >>>>>> RPM/packaging n >>>>>> Configuration files n >>>>>> Startup scripts n >>>>>> SAF services y >>>>>> OpenSAF services n >>>>>> Core libraries n >>>>>> Samples n >>>>>> Tests n >>>>>> Other n >>>>>> >>>>>> >>>>>> Comments (indicate scope for each "y" above): >>>>>> --------------------------------------------- >>>>>> >>>>>> changeset faec4a4445a4c23e8f630857b19aabb43b5af18d >>>>>> Author: Nhat Pham <nhat.p...@dektech.com.au> >>>>>> Date: Mon, 04 Jan 2016 16:34:33 +0700 >>>>>> >>>>>> cpsv: Support preserving and recovering checkpoint replicas >>>>>> during headless state [#1621] >>>>>> >>>>>> Background: >>>>>> ---------- This enhancement supports to preserve checkpoint >>>>>> replicas >>>>> in case >>>>>> both SCs down (headless state) and recover replicas in case >>>>>> one of >>>>> SCs up >>>>>> again. If both SCs goes down, checkpoint replicas on >>>>>> surviving nodes >>>>> still >>>>>> remain. When a SC is available again, surviving replicas are >>>>> automatically >>>>>> registered to the SC checkpoint database. Content in >>>>>> surviving >>>>> replicas are >>>>>> intacted and synchronized to new replicas. >>>>>> >>>>>> When no SC is available, client API calls changing checkpoint >>>>> configuration >>>>>> which requires SC communication, are rejected. Client API >>>>>> calls >>>>> reading and >>>>>> writing existing checkpoint replicas still work. >>>>>> >>>>>> Limitation: The CKPT service does not support recovering >>>>>> checkpoints >>>>> in >>>>>> following cases: >>>>>> - The checkpoint which is unlinked before headless. >>>>>> - The non-collocated checkpoint has active replica locating >>>>>> on SC. >>>>>> - The non-collocated checkpoint has active replica locating >>>>>> on a PL >>>>> and this >>>>>> PL restarts during headless state. In this cases, the >>>>>> checkpoint >>>>> replica is >>>>>> destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned >>>>>> when the >>>>> client >>>>>> accesses the checkpoint in these cases. The client must >>>>>> re-open the >>>>>> checkpoint. >>>>>> >>>>>> While in headless state, accessing checkpoint replicas does >>>>>> not work >>>>> if the >>>>>> node which hosts the active replica goes down. It will back >>>>>> working >>>>> when a >>>>>> SC available again. >>>>>> >>>>>> Solution: >>>>>> --------- The solution for this enhancement includes 2 parts: >>>>>> >>>>>> 1. To destroy un-recoverable checkpoint described above when >>>>>> both >>>>> SCs are >>>>>> down: When both SCs are down, the CPND deletes un-recoverable >>>>> checkpoint >>>>>> nodes and replicas on PLs. Then it requests CPA to destroy >>>>> corresponding >>>>>> checkpoint node by using new message >>>>>> CPA_EVT_ND2A_CKPT_DESTROY >>>>>> >>>>>> 2. To update CPD with checkpoint information When an active >>>>>> SC is up >>>>> after >>>>>> headless, CPND will update CPD with checkpoint information by >>>>>> using >>>>> new >>>>>> message CPD_EVT_ND2D_CKPT_INFO_UPDATE instead of using >>>>>> CPD_EVT_ND2D_CKPT_CREATE. This is because the CPND will >>>>>> create new >>>>> ckpt_id >>>>>> for the checkpoint which might be different with the current >>>>>> ckpt id >>>>> if the >>>>>> CPD_EVT_ND2D_CKPT_CREATE is used. The CPD collects checkpoint >>>>> information >>>>>> within 6s. During this updating time, following requests is >>>>>> rejected >>>>> with >>>>>> fault code SA_AIS_ERR_TRY_AGAIN: >>>>>> - CPD_EVT_ND2D_CKPT_CREATE >>>>>> - CPD_EVT_ND2D_CKPT_UNLINK >>>>>> - CPD_EVT_ND2D_ACTIVE_SET >>>>>> - CPD_EVT_ND2D_CKPT_RDSET >>>>>> >>>>>> >>>>>> Complete diffstat: >>>>>> ------------------ >>>>>> osaf/libs/agents/saf/cpa/cpa_proc.c | 52 >>>>> +++++++++++++++++++++++++++++++++++ >>>>>> osaf/libs/common/cpsv/cpsv_edu.c | 43 >>>>> +++++++++++++++++++++++++++++ >>>>>> osaf/libs/common/cpsv/include/cpd_cb.h | 3 ++ >>>>>> osaf/libs/common/cpsv/include/cpd_imm.h | 1 + >>>>>> osaf/libs/common/cpsv/include/cpd_proc.h | 7 ++++ >>>>>> osaf/libs/common/cpsv/include/cpd_tmr.h | 3 +- >>>>>> osaf/libs/common/cpsv/include/cpnd_cb.h | 1 + >>>>>> osaf/libs/common/cpsv/include/cpnd_init.h | 2 + >>>>>> osaf/libs/common/cpsv/include/cpsv_evt.h | 20 +++++++++++++ >>>>>> osaf/services/saf/cpsv/cpd/Makefile.am | 3 +- >>>>>> osaf/services/saf/cpsv/cpd/cpd_evt.c | 229 >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> ++++ >>>>>> osaf/services/saf/cpsv/cpd/cpd_imm.c | 112 >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>>> osaf/services/saf/cpsv/cpd/cpd_init.c | 20 ++++++++++++- >>>>>> osaf/services/saf/cpsv/cpd/cpd_proc.c | 309 >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> osaf/services/saf/cpsv/cpd/cpd_tmr.c | 7 ++++ >>>>>> osaf/services/saf/cpsv/cpnd/cpnd_db.c | 16 ++++++++++ >>>>>> osaf/services/saf/cpsv/cpnd/cpnd_evt.c | 22 +++++++++++++++ >>>>>> osaf/services/saf/cpsv/cpnd/cpnd_init.c | 23 ++++++++++++++- >>>>>> osaf/services/saf/cpsv/cpnd/cpnd_mds.c | 13 ++++++++ >>>>>> osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 314 >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- >>>>>> 20 files changed, 1189 insertions(+), 11 deletions(-) >>>>>> >>>>>> >>>>>> Testing Commands: >>>>>> ----------------- >>>>>> - >>>>>> >>>>>> Testing, Expected Results: >>>>>> -------------------------- >>>>>> - >>>>>> >>>>>> >>>>>> Conditions of Submission: >>>>>> ------------------------- >>>>>> <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>> >>>>>> >>>>>> >>>>>> Arch Built Started Linux distro >>>>>> ------------------------------------------- >>>>>> mips n n >>>>>> mips64 n n >>>>>> x86 n n >>>>>> x86_64 n n >>>>>> powerpc n n >>>>>> powerpc64 n n >>>>>> >>>>>> >>>>>> Reviewer Checklist: >>>>>> ------------------- >>>>>> [Submitters: make sure that your review doesn't trigger any >>>>>> checkmarks!] >>>>>> >>>>>> >>>>>> Your checkin has not passed review because (see checked entries): >>>>>> >>>>>> ___ Your RR template is generally incomplete; it has too many >>>>>> blank >>>>> entries >>>>>> that need proper data filled in. >>>>>> >>>>>> ___ You have failed to nominate the proper persons for review and >>>>>> push. >>>>>> >>>>>> ___ Your patches do not have proper short+long header >>>>>> >>>>>> ___ You have grammar/spelling in your header that is unacceptable. >>>>>> >>>>>> ___ You have exceeded a sensible line length in your >>>>> headers/comments/text. >>>>>> ___ You have failed to put in a proper Trac Ticket # into your >>>>>> commits. >>>>>> >>>>>> ___ You have incorrectly put/left internal data in your comments/files >>>>>> (i.e. internal bug tracking tool IDs, product names etc) >>>>>> >>>>>> ___ You have not given any evidence of testing beyond basic build >>>>>> tests. >>>>>> Demonstrate some level of runtime or other sanity testing. >>>>>> >>>>>> ___ You have ^M present in some of your files. These have to be >>>>>> removed. >>>>>> >>>>>> ___ You have needlessly changed whitespace or added whitespace crimes >>>>>> like trailing spaces, or spaces before tabs. >>>>>> >>>>>> ___ You have mixed real technical changes with whitespace and other >>>>>> cosmetic code cleanup changes. These have to be separate >>>>>> commits. >>>>>> >>>>>> ___ You need to refactor your submission into logical chunks; there is >>>>>> too much content into a single commit. >>>>>> >>>>>> ___ You have extraneous garbage in your review (merge commits etc) >>>>>> >>>>>> ___ You have giant attachments which should never have been sent; >>>>>> Instead you should place your content in a public tree to >>>>>> be pulled. >>>>>> >>>>>> ___ You have too many commits attached to an e-mail; resend as >>>>>> threaded >>>>>> commits, or place in a public tree for a pull. >>>>>> >>>>>> ___ You have resent this content multiple times without a clear >>>>>> indication >>>>>> of what has changed between each re-send. >>>>>> >>>>>> ___ You have failed to adequately and individually address all of the >>>>>> comments and change requests that were proposed in the >>>>>> initial >>>>> review. >>>>>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email >>>>>> etc) >>>>>> >>>>>> ___ Your computer have a badly configured date and time; confusing the >>>>>> the threaded patch review. >>>>>> >>>>>> ___ Your changes affect IPC mechanism, and you don't present any >>>>>> results >>>>>> for in-service upgradability test. >>>>>> >>>>>> ___ Your changes affect user manual and documentation, your patch >>>>>> series >>>>>> do not contain the patch that updates the Doxygen manual. >>>>>> > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel