Hi Mahesh,

It's good. Thank you. :)

[AVM]  Up on rejoining of the SC`s The replica should be re-created regardless 
of another application opens it on PL4.
              ( Note : this comment is based on your explanation have not yet 
reviewed/tested  ,
                 currently i am struggling with  SC`s    not rejoining
after headless state , i can provide you more on this once i  complte my
review/testing)

[Nhat] To make cloud resilience works, you need the patches from other 
services (log, amf, clm, ntf).
@Minh: I heard that you created tar file which includes all patches. Could you 
please send it to Mahesh? Thanks

[AVM] I understand that , before I comment more on this   please allow me to 
understand
             I am not still not very clear of the headless design in detail.
             For example cluster membership of PL`s   during headless state ,
              In the absence of  SC`s  (CLMD) dose the PLs is considered as 
cluster nodes or not (cluster membership) ?

[Nhat] I don't know much about this.
@ Anders: Could you please have comment about this? Thanks

Best regards,
Nhat Pham

-----Original Message-----
From: A V Mahesh [mailto:mahesh.va...@oracle.com]
Sent: Monday, February 15, 2016 11:19 AM
To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net; 'Beatriz Brandao' 
<beatriz.bran...@ericsson.com>
Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support preserving and 
recovering checkpoint replicas during headless state V2 [#1621]

Hi Nhat Pham,

How is your holiday went

Please find my comments below

On 2/15/2016 8:43 AM, Nhat Pham wrote:
> Hi Mahesh,
>
> For the comment 1, the patch will be updated accordingly.
[AVM]  Please hold , I will provide more comments in this week , so we can 
have consolidated V3
>
> For the comment 2, I think the CKPT service will not be backward
> compatible if the scAbsenceAllowed is true.
> The client can't create non-collocated checkpoint on SCs.
>
> Furthermore, this solution only protects the CKPT service from the
> case "The non-collocated checkpoint  is created on a SC"
> there are still the cases where the replicas are completely lost. Ex:
>
> - The non-collocated checkpoint created on a PL. The PL reboots. Both
> replicas now locate on SCs. Then, headless state happens. All replicas are 
> lost.
> - The non-collocated checkpoint has active replica locating on a PL
> and this PL restarts during headless state
> - The non-collocated checkpoint is created on PL3. This checkpoint is
> also opened on PL4. Then SCs and PL3 reboot.
[AVM]  Up on rejoining of the SC`s The replica should be re-created regardless 
of another application opens it on PL4.
              ( Note : this comment is based on your explanation have not yet 
reviewed/tested  ,
                 currently i am struggling with  SC`s    not rejoining
after headless state , i can provide you more on this once i  complte my
review/testing)
> In this case, all replicas are lost and the client has to create it again.
>
> In case multiple nodes (which including SCs) reboot, losing replicas
> is unpreventable. The patch is to recover the checkpoints in possible cases.
> How do you think?
[AVM] I understand that , before I comment more on this   please allow
me to understand
             I am not still not very clear of the headless design in detail.

             For example cluster membership of PL`s   during headless
state ,
              In the absence of  SC`s  (CLMD) dose the PLs is considered as 
cluster nodes or not (cluster membership) ?

                    - if not consider as  NON cluster nodes Checkpoint Service 
API  should  leverage the SA Forum Cluster
                      Membership Service  and API's can fail with 
SA_AIS_ERR_UNAVAILABLE

                    - if considers as cluster nodes  we need to follow all the 
defined rules which are defined in SAI-AIS-CKPT-B.02.02 specification

             so give me some more time to review it completely , so that we 
can  have consolidated patch V3

-AVM
>
> Best regards,
> Nhat Pham
>
> -----Original Message-----
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Friday, February 12, 2016 11:10 AM
> To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com
> Cc: opensaf-devel@lists.sourceforge.net; Beatriz Brandao
> <beatriz.bran...@ericsson.com>
> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
> preserving and recovering checkpoint replicas during headless state V2
> [#1621]
>
>
> Comment 2 :
>
> After incorporating the comment one all the Limitations should be
> prevented based on Hydra configuration is enabled in IMM status.
>
> Foe example :  if some application is trying to create
>
> non-collocated checkpoint active replica getting generated/locating on
> SC then ,regardless of the heads (SC`s) status exist not exist should
> return SA_AIS_ERR_NOT_SUPPORTED
>
> In other words, rather that allowing to created non-collocated
> checkpoint when
> heads(SC`s)  are exit , and non-collocated checkpoint getting
> unrecoverable after heads(SC`s) rejoins.
>
> ======================================================================
> =======================
>>      Limitation: The CKPT service doesn't support recovering checkpoints in
>>      following cases:
>>      . The checkpoint which is unlinked before headless.
>>      . The non-collocated checkpoint has active replica locating on SC.
>>      . The non-collocated checkpoint has active replica locating on a PL
>> and this PL
>>      restarts during headless state. In this cases, the checkpoint replica is
>>      destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned when the 
>> client
>>      accesses the checkpoint in these cases. The client must re-open the
>>      checkpoint.
> ======================================================================
> =======================
>
> -AVM
>
>
> On 2/11/2016 12:52 PM, A V Mahesh wrote:
>> Hi,
>>
>> I jut starred reviewing patch , I will be  giving comments as soon as
>> I crossover any , to save some time.
>>
>> Comment 1 :
>> This functionality should be under  checks if Hydra configuration is
>> enabled in IMM attrName =
>> const_cast<SaImmAttrNameT>("scAbsenceAllowed")
>>
>> Please see example how  LOG/AMF  services implemented it.
>>
>> -AVM
>>
>>
>> On 1/29/2016 1:02 PM, Nhat Pham wrote:
>>> Hi Mahesh,
>>>
>>> As described in the README, the CKPT service returns
>>> SA_AIS_ERR_TRY_AGAIN fault code in this case.
>>> I guess it's same for other services.
>>>
>>> @Anders: Could you please confirm this?
>>>
>>> Best regards,
>>> Nhat Pham
>>>
>>> -----Original Message-----
>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>> Sent: Friday, January 29, 2016 2:11 PM
>>> To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
>>> preserving and recovering checkpoint replicas during headless state
>>> V2 [#1621]
>>>
>>> Hi,
>>>
>>> On 1/29/2016 11:45 AM, Nhat Pham wrote:
>>>>      -  The behavior of application will be consistent with other
>>>> saf services like imm/amf behavior  during headless state.
>>>> [Nhat] I'm not clear what you mean about "consistent"?
>>> In the obscene of  Director (SC's) , what is expected return values
>>> of SAF API should ( all services ) ,
>>>     which are not in aposition to  provide service at that moment.
>>>
>>> I think all services should return same  SAF ERRS., I thinks
>>> currently we don't have  it , may be  Anders Widel  will help us.
>>>
>>> -AVM
>>>
>>>
>>> On 1/29/2016 11:45 AM, Nhat Pham wrote:
>>>> Hi Mahesh,
>>>>
>>>> Please see the attachment for the README. Let me know if there is
>>>> any more information required.
>>>>
>>>> Regarding your comments:
>>>>      -  during headless state  applications may behave like during
>>>> CPND restart case [Nhat] Headless state and CPND restart are
>>>> different events. Thus, the behavior is different.
>>>> Headless state is a case where both SCs go down.
>>>>
>>>>      -  The behavior of application will be consistent with other
>>>> saf services like imm/amf behavior  during headless state.
>>>> [Nhat] I'm not clear what you mean about "consistent"?
>>>>
>>>> Best regards,
>>>> Nhat Pham
>>>>
>>>> -----Original Message-----
>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>> Sent: Friday, January 29, 2016 11:12 AM
>>>> To: Nhat Pham <nhat.p...@dektech.com.au>;
>>>> anders.wid...@ericsson.com
>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>> Subject: Re: [PATCH 0 of 1] Review Request for cpsv: Support
>>>> preserving and recovering checkpoint replicas during headless state
>>>> V2 [#1621]
>>>>
>>>> Hi Nhat Pham,
>>>>
>>>> I stared reviewing this patch , so can please provide  README file
>>>> with scope and limitations , that will help to define
>>>> testing/reviewing scope .
>>>>
>>>> Following are minimum things we can keep in mind while
>>>> reviewing/accepting patch ,
>>>>
>>>> - Not effecting existing functionality
>>>>      -  during headless state  applications may behave like during
>>>> CPND restart case
>>>>      -  The minimum functionally of application works
>>>>      -  The behavior of application will be consistent with
>>>>         other saf services like imm/amf behavior  during headless state.
>>>>
>>>> So please do provide any additional detailed in README if any of
>>>> the above is deviated , that allow users to know about the
>>>> limitations/deviation.
>>>>
>>>> -AVM
>>>>
>>>> On 1/4/2016 3:15 PM, Nhat Pham wrote:
>>>>> Summary: cpsv: Support preserving and recovering checkpoint
>>>>> replicas during headless state [#1621] Review request for Trac 
>>>>> Ticket(s):
>>>>> #1621 Peer Reviewer(s): mahesh.va...@oracle.com;
>>>>> anders.wid...@ericsson.com Pull request to:
>>>>> mahesh.va...@oracle.com Affected branch(es): default Development
>>>>> branch: default
>>>>>
>>>>> --------------------------------
>>>>> Impacted area       Impact y/n
>>>>> --------------------------------
>>>>>      Docs                    n
>>>>>      Build system            n
>>>>>      RPM/packaging           n
>>>>>      Configuration files     n
>>>>>      Startup scripts         n
>>>>>      SAF services            y
>>>>>      OpenSAF services        n
>>>>>      Core libraries          n
>>>>>      Samples                 n
>>>>>      Tests                   n
>>>>>      Other                   n
>>>>>
>>>>>
>>>>> Comments (indicate scope for each "y" above):
>>>>> ---------------------------------------------
>>>>>
>>>>> changeset faec4a4445a4c23e8f630857b19aabb43b5af18d
>>>>> Author:    Nhat Pham <nhat.p...@dektech.com.au>
>>>>> Date:    Mon, 04 Jan 2016 16:34:33 +0700
>>>>>
>>>>>      cpsv: Support preserving and recovering checkpoint replicas
>>>>> during headless state [#1621]
>>>>>
>>>>>      Background:
>>>>>      ---------- This enhancement supports to preserve checkpoint
>>>>> replicas
>>>> in case
>>>>>      both SCs down (headless state) and recover replicas in case
>>>>> one of
>>>> SCs up
>>>>>      again. If both SCs goes down, checkpoint replicas on
>>>>> surviving nodes
>>>> still
>>>>>      remain. When a SC is available again, surviving replicas are
>>>> automatically
>>>>>      registered to the SC checkpoint database. Content in
>>>>> surviving
>>>> replicas are
>>>>>      intacted and synchronized to new replicas.
>>>>>
>>>>>      When no SC is available, client API calls changing checkpoint
>>>> configuration
>>>>>      which requires SC communication, are rejected. Client API
>>>>> calls
>>>> reading and
>>>>>      writing existing checkpoint replicas still work.
>>>>>
>>>>>      Limitation: The CKPT service does not support recovering
>>>>> checkpoints
>>>> in
>>>>>      following cases:
>>>>>       - The checkpoint which is unlinked before headless.
>>>>>       - The non-collocated checkpoint has active replica locating
>>>>> on SC.
>>>>>       - The non-collocated checkpoint has active replica locating
>>>>> on a PL
>>>> and this
>>>>>      PL restarts during headless state. In this cases, the
>>>>> checkpoint
>>>> replica is
>>>>>      destroyed. The fault code SA_AIS_ERR_BAD_HANDLE is returned
>>>>> when the
>>>> client
>>>>>      accesses the checkpoint in these cases. The client must
>>>>> re-open the
>>>>>      checkpoint.
>>>>>
>>>>>      While in headless state, accessing checkpoint replicas does
>>>>> not work
>>>> if the
>>>>>      node which hosts the active replica goes down. It will back
>>>>> working
>>>> when a
>>>>>      SC available again.
>>>>>
>>>>>      Solution:
>>>>>      --------- The solution for this enhancement includes 2 parts:
>>>>>
>>>>>      1. To destroy un-recoverable checkpoint described above when
>>>>> both
>>>> SCs are
>>>>>      down: When both SCs are down, the CPND deletes un-recoverable
>>>> checkpoint
>>>>>      nodes and replicas on PLs. Then it requests CPA to destroy
>>>> corresponding
>>>>>      checkpoint node by using new message
>>>>> CPA_EVT_ND2A_CKPT_DESTROY
>>>>>
>>>>>      2. To update CPD with checkpoint information When an active
>>>>> SC is up
>>>> after
>>>>>      headless, CPND will update CPD with checkpoint information by
>>>>> using
>>>> new
>>>>>      message CPD_EVT_ND2D_CKPT_INFO_UPDATE instead of using
>>>>>      CPD_EVT_ND2D_CKPT_CREATE. This is because the CPND will
>>>>> create new
>>>> ckpt_id
>>>>>      for the checkpoint which might be different with the current
>>>>> ckpt id
>>>> if the
>>>>>      CPD_EVT_ND2D_CKPT_CREATE is used. The CPD collects checkpoint
>>>> information
>>>>>      within 6s. During this updating time, following requests is
>>>>> rejected
>>>> with
>>>>>      fault code SA_AIS_ERR_TRY_AGAIN:
>>>>>      - CPD_EVT_ND2D_CKPT_CREATE
>>>>>      - CPD_EVT_ND2D_CKPT_UNLINK
>>>>>      - CPD_EVT_ND2D_ACTIVE_SET
>>>>>      - CPD_EVT_ND2D_CKPT_RDSET
>>>>>
>>>>>
>>>>> Complete diffstat:
>>>>> ------------------
>>>>>      osaf/libs/agents/saf/cpa/cpa_proc.c       |   52
>>>> +++++++++++++++++++++++++++++++++++
>>>>> osaf/libs/common/cpsv/cpsv_edu.c          |   43
>>>> +++++++++++++++++++++++++++++
>>>>> osaf/libs/common/cpsv/include/cpd_cb.h    |    3 ++
>>>>>      osaf/libs/common/cpsv/include/cpd_imm.h   |    1 +
>>>>>      osaf/libs/common/cpsv/include/cpd_proc.h  |    7 ++++
>>>>>      osaf/libs/common/cpsv/include/cpd_tmr.h   |    3 +-
>>>>>      osaf/libs/common/cpsv/include/cpnd_cb.h   |    1 +
>>>>>      osaf/libs/common/cpsv/include/cpnd_init.h |    2 +
>>>>>      osaf/libs/common/cpsv/include/cpsv_evt.h  |   20 +++++++++++++
>>>>>      osaf/services/saf/cpsv/cpd/Makefile.am    |    3 +-
>>>>>      osaf/services/saf/cpsv/cpd/cpd_evt.c      |  229
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> ++++
>>>>> osaf/services/saf/cpsv/cpd/cpd_imm.c      |  112
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>> osaf/services/saf/cpsv/cpd/cpd_init.c     |   20 ++++++++++++-
>>>>>      osaf/services/saf/cpsv/cpd/cpd_proc.c     |  309
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> osaf/services/saf/cpsv/cpd/cpd_tmr.c      |    7 ++++
>>>>>      osaf/services/saf/cpsv/cpnd/cpnd_db.c     |   16 ++++++++++
>>>>>      osaf/services/saf/cpsv/cpnd/cpnd_evt.c    |   22 +++++++++++++++
>>>>>      osaf/services/saf/cpsv/cpnd/cpnd_init.c   |   23 ++++++++++++++-
>>>>>      osaf/services/saf/cpsv/cpnd/cpnd_mds.c    |   13 ++++++++
>>>>>      osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  314
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>>      20 files changed, 1189 insertions(+), 11 deletions(-)
>>>>>
>>>>>
>>>>> Testing Commands:
>>>>> -----------------
>>>>> -
>>>>>
>>>>> Testing, Expected Results:
>>>>> --------------------------
>>>>> -
>>>>>
>>>>>
>>>>> Conditions of Submission:
>>>>> -------------------------
>>>>>      <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
>>>>>
>>>>>
>>>>> Arch      Built     Started    Linux distro
>>>>> -------------------------------------------
>>>>> mips        n          n
>>>>> mips64      n          n
>>>>> x86         n          n
>>>>> x86_64      n          n
>>>>> powerpc     n          n
>>>>> powerpc64   n          n
>>>>>
>>>>>
>>>>> Reviewer Checklist:
>>>>> -------------------
>>>>> [Submitters: make sure that your review doesn't trigger any
>>>>> checkmarks!]
>>>>>
>>>>>
>>>>> Your checkin has not passed review because (see checked entries):
>>>>>
>>>>> ___ Your RR template is generally incomplete; it has too many
>>>>> blank
>>>> entries
>>>>>         that need proper data filled in.
>>>>>
>>>>> ___ You have failed to nominate the proper persons for review and
>>>>> push.
>>>>>
>>>>> ___ Your patches do not have proper short+long header
>>>>>
>>>>> ___ You have grammar/spelling in your header that is unacceptable.
>>>>>
>>>>> ___ You have exceeded a sensible line length in your
>>>> headers/comments/text.
>>>>> ___ You have failed to put in a proper Trac Ticket # into your
>>>>> commits.
>>>>>
>>>>> ___ You have incorrectly put/left internal data in your comments/files
>>>>>         (i.e. internal bug tracking tool IDs, product names etc)
>>>>>
>>>>> ___ You have not given any evidence of testing beyond basic build
>>>>> tests.
>>>>>         Demonstrate some level of runtime or other sanity testing.
>>>>>
>>>>> ___ You have ^M present in some of your files. These have to be
>>>>> removed.
>>>>>
>>>>> ___ You have needlessly changed whitespace or added whitespace crimes
>>>>>         like trailing spaces, or spaces before tabs.
>>>>>
>>>>> ___ You have mixed real technical changes with whitespace and other
>>>>>         cosmetic code cleanup changes. These have to be separate
>>>>> commits.
>>>>>
>>>>> ___ You need to refactor your submission into logical chunks; there is
>>>>>         too much content into a single commit.
>>>>>
>>>>> ___ You have extraneous garbage in your review (merge commits etc)
>>>>>
>>>>> ___ You have giant attachments which should never have been sent;
>>>>>         Instead you should place your content in a public tree to
>>>>> be pulled.
>>>>>
>>>>> ___ You have too many commits attached to an e-mail; resend as
>>>>> threaded
>>>>>         commits, or place in a public tree for a pull.
>>>>>
>>>>> ___ You have resent this content multiple times without a clear
>>>>> indication
>>>>>         of what has changed between each re-send.
>>>>>
>>>>> ___ You have failed to adequately and individually address all of the
>>>>>         comments and change requests that were proposed in the
>>>>> initial
>>>> review.
>>>>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email
>>>>> etc)
>>>>>
>>>>> ___ Your computer have a badly configured date and time; confusing the
>>>>>         the threaded patch review.
>>>>>
>>>>> ___ Your changes affect IPC mechanism, and you don't present any
>>>>> results
>>>>>         for in-service upgradability test.
>>>>>
>>>>> ___ Your changes affect user manual and documentation, your patch
>>>>> series
>>>>>         do not contain the patch that updates the Doxygen manual.
>>>>>
>



------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to