Hi Mahesh, The patch was updated with your comment. Could you please help to run CPSV regression test? Thanks.
Best regards, Nhat Pham -----Original Message----- From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: Monday, December 7, 2015 3:48 PM To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] [PATCH 1 of 1] cpsv: cpd broadcasts CPND_EVT_D2ND_CKPT_RDSET with STOP [#1615] Hi Nhat , I did code review following are my comment : Optional : Not required to have function for this `cpd_proc_broadcast_RDSET_STOP(() you can do write the event construction . If you feel it will be useful in another context in future, just make all lower case character. If you are not immediate requirement , I will complete CPSV regeneration testing and provide you the result/observation , other wise ACK form me not tested. -AVM On 12/7/2015 2:03 PM, Nhat Pham wrote: > Hi Mahesh, > > Have you had chance to look into the patch? Thanks. > > Best regards, > Nhat Pham > > -----Original Message----- > From: Nhat Pham [mailto:nhat.p...@dektech.com.au] > Sent: Wednesday, December 2, 2015 4:48 PM > To: 'A V Mahesh' <mahesh.va...@oracle.com>; anders.wid...@ericsson.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [devel] [PATCH 1 of 1] cpsv: cpd broadcasts > CPND_EVT_D2ND_CKPT_RDSET with STOP [#1615] > > Hi Mahesh, > > > > [AVM] If no process has the checkpoint open (S2 & S5 closed checkpoint ) > when saCkptCheckpointUnlink() is > invoked, the checkpoint is immediately deleted. > > I hope in your test case case after the S5 & before S6 no process > has the checkpoint opened , so we need to fix this issue fist. this > fix will fix IMM objects are not created issue. > > > > [Nhat Pham] Yes, I agree that the fix should solve the problem after > S5. I'm working on that. > > The S6 was mentioned as a phenomenon because the user doesn't see any > problem after S5. > > > > [AVM] I mean Open same checkpoint ( S5. Open checkpoint on PL-3 ) > , while retention timer running we can reopen the checkpoint > then it will stop retention timer. > > > > [Nhat Pham] I got that. The step description for #1615 is correct. > > > > BTW, could you please have a look at the patch for #1615 again? Thanks. > > > > Best regards, > > Nhat Pham > > > > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: Wednesday, December 2, 2015 4:08 PM > To: Nhat Pham <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] cpsv: cpd broadcasts > CPND_EVT_D2ND_CKPT_RDSET with STOP [#1615] > > > > Hi Nhat, > > Please see below for my comments tagged with [AVM] > > -AVM > > On 12/2/2015 12:33 PM, Nhat Pham wrote: > > Hi Mahesh, > > > > The ticket #1615 and 1616 report 2 different problems although the > steps to reproduce the problem are quite similar. > > The problem scenarios are quite tricky I think. J > > > > Please see below for my comments. > > > > Best regards, > > Nhat Pham > > > > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: Wednesday, December 2, 2015 12:04 PM > To: Nhat Pham <mailto:nhat.p...@dektech.com.au> > <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com > <mailto:anders.wid...@ericsson.com> > Cc: opensaf-devel@lists.sourceforge.net > <mailto:opensaf-devel@lists.sourceforge.net> > Subject: Re: [PATCH 1 of 1] cpsv: cpd broadcasts > CPND_EVT_D2ND_CKPT_RDSET with STOP [#1615] > > > > Hi Nhat, > > On 12/2/2015 7:42 AM, Nhat Pham wrote: > > Problem: >>>> -------- >>>> A non-collocated checkpoint is firstly created on SC-2. Then the >>>> checkpoint is closed on SC-2. >>>> The CPD broadcasts CPND_EVT_D2ND_CKPT_RDSET with START to start >>>> retention duration timer on CPND because there is no user. During >>>> that time the checkpoint is opened again and using on PL-3. >>>> After retention duration, the checkpoint is destroyed on both SC-1 >>>> and SC-2. > > I interpreted `again using on PL-3` as the PL-3 was also opened the > non-collocated checkpoint at fist time it self both SC-2 & PL-3 > (opened/closed) in sequence > so I thought PL3 CPND already has the database of checkpoint , and > then we are trying to re-open the ckpt again on PL-3. > > In ticket #1616 reproducible step, it was mentioned PL-3 was also > opened ckpt , so I carried out the same test > and interpreted the #1615 & #1616 test cases are same except the > UNLINK > is called , so I was suggesting > to address both together in one go with and with out Unlink called. > > Now I got it they are NOT related testcases. my current > understanding test cases of of is Ticket #1616 & Ticket #1615 is as > follows and you addressed > #1615 in this patch , please confirm : > > ========================================== > Ticket #1616 - Non-collocated ckpt > > S1. Create a checkpoint on SC-2 ( OpenFlags with > SA_CKPT_CHECKPOINT_CREATE ) success > S2. Close the checkpoint on SC-2 ( retention timer starts because of > simple > Close ) success > S3. Open checkpoint on PL-3 success > S4. Unlink the checkpoint on PL-3 success > S5. Close the checkpoint on PL-3 success > > After this replicas will be deleted immediately on SC-1 & SC-2 ( no > retention timer starts because of Unlink called ) , the subsequent > checkpoint Create with same name on any node (PL-3/SC-2/SC-1) > > [Nhat Pham] Actually, the replicas on SC-1 and SC-2 are not deleted > immediately after this step. This leads the problem after S6. > > [AVM] If no process has the checkpoint open (S2 & S5 closed checkpoint ) > when saCkptCheckpointUnlink() is > invoked, the checkpoint is immediately deleted. > > I hope in your test case case after the S5 & before S6 no process > has the checkpoint opened , so we need to fix this issue fist. this > fix will fix IMM objects are not created issue. > > > > [Nhat Pham] Yes, I agree that the fix should solve the problem after > S5. I'm working on that. > > The S6 was mentioned as a phenomenon because the user doesn't see any > problem after S5. > > > > > > > S6. Create the checkpoint on PL-3 ( OpenFlags with > SA_CKPT_CHECKPOINT_CREATE ) > > checkpoint is created successfully with replicas BUT the IMM objects > is not created imm database > > [Nhat Pham] It should be "Create the checkpoint on SC-2" (not PL-3). > > Actually, the checkpoint is not created in this case because the CPND > on SC-2 finds that checkpoint exists. > > It just informs the CPD with CPD_EVT_ND2D_CKPT_USR_INFO. Thus the IMM > objects are not created. > > > > [AVM] This shouldn't happen if no process has the checkpoint open > when > saCkptCheckpointUnlink() is invoked > > ========================================== > > > ========================================== > Ticket #1615 - Non-collocated ckpt > > S1. Create a checkpoint on SC-1 ( OpenFlags with > SA_CKPT_CHECKPOINT_CREATE ) success > S2. Open a checkpoint on SC-2 success > S3. Close the checkpoint on SC-1 success > S4. Close the checkpoint on SC-2 success ( After this replicas Still > exist > on SC-1 & SC-2 and the retention timer started ) > > [Nhat Pham] S2 and S4 might not be necessary to trigger the retention > timer started. > > S5. Open checkpoint on PL-3 success ( retention timer still running on > SC-1 & SC-2 it was not stopped ) , > S6. Create Section Failed with SA_AIS_ERR_NOT_EXIST > > The reason for the Section Create SA_AIS_ERR_NOT_EXIST is the > retention timer SC-1 & SC-2 was not stopped even after step 5 (S5), > the subsequent checkpoint Create with same name on any node > (PL-3/SC-2/SC-1) > > [Nhat Pham] regarding "the subsequent checkpoint Create with same > name on any node (PL-3/SC-2/SC-1) ", do you mean that: > > The checkpoint with same name can be re-created on any node after this step. > > [AVM] I mean Open same checkpoint ( S5. Open checkpoint on PL-3 ) > , while retention timer running we can reopen the checkpoint > then it will stop retention timer. > > [Nhat Pham] I got that. The step description above is correct. > > > > ========================================== > > -AVM > > On 12/2/2015 7:42 AM, Nhat Pham wrote: > > Hi Mahesh, > > I'm not clear about the your proposal below. Could you please help to > make it clearer? Thanks. > > My understanding about the existing implementation: > In case the non-collocated checkpoint exist on controllers, when the > checkpoint is opened on PL first time (i.e the PL doesn't know that if > the checkpoint exist and the cp_node doesn't exist in CPND database) > the cpnd on PL sends CPD_EVT_ND2D_CKPT_CREATE to CPD to create the > checkpoint. > The CPD finds the checkpoint existing so it returns the message > CPND_EVT_D2ND_CKPT_INFO with create_replica == false. > The CPND updates its database with new checkpoint node without > creating a replica on PL. > > Dec 1 9:03:35.780616 osafckptnd [468:cpsv_evt.c:2199] TR cpnd <<== > CPND_EVT_A2ND_CKPT_OPEN(hdl=1, safCkpt=test3) from node 0x2030F Dec 1 > 9:03:35.780874 osafckptnd [468:cpsv_evt.c:2195] TR cpnd ==>> > CPD_EVT_ND2D_CKPT_CREATE(safCkpt=test3, creationFlags=0x2) to CPD Dec > 1 > 9:03:35.782806 osafckptnd [468:cpsv_evt.c:2201] TR cpnd <<== [3] > CPND_EVT_D2ND_CKPT_INFO(err=1, active=0x2020F, create_rep=false) from > CPD > > So, the flow in this case is: > cpnd_evt_proc_ckpt_open() --> CPD_EVT_ND2D_CKPT_CREATE --> > cpd_evt_proc_ckpt_create() > > Best regards, > Nhat Pham > > -----Original Message----- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: Tuesday, December 1, 2015 5:51 PM > To: Nhat Pham <mailto:nhat.p...@dektech.com.au> > <nhat.p...@dektech.com.au>; anders.wid...@ericsson.com > <mailto:anders.wid...@ericsson.com> > Cc: opensaf-devel@lists.sourceforge.net > <mailto:opensaf-devel@lists.sourceforge.net> > Subject: Re: [PATCH 1 of 1] cpsv: cpd broadcasts > CPND_EVT_D2ND_CKPT_RDSET with STOP [#1615] > > Hi, > we need to check wha cpnd_ckpt_node_find_by_name() is returning on > PL-3 if a no-collocated ckpt replicas exist on controller with > unlinked , > > If it returns null we also need to find any non-collated replica > exist on Controller nodes , while opening a checkpoint from PL-3, We > are not suppose to create new Replica on PL-3 if replica exist on > controllers ( sc-1 & sc-2 ) > > -AVM > > On 12/1/2015 3:47 PM, A V Mahesh wrote: > > Hi , > > We may need to handle else condition of below with > `cp_node->is_unlink == true` case in function > cpnd_evt_proc_ckpt_open() > > `if(((cp_node = cpnd_ckpt_node_find_by_name(cb, ckpt_name)) != NULL) > && cp_node->is_unlink == false) {` > > -AVM > > On 12/1/2015 3:25 PM, A V Mahesh wrote: > > Hi , > > The approach of stopping existing ckpt is different , it should be > through > > cpnd_evt_proc_ckpt_open() --> cpnd_send_ckpt_usr_info_to_cpd --> > CPD_EVT_ND2D_CKPT_USR_INFO --> cpd_evt_proc_ckpt_usr_info() So please > do change based on this flow in > cpd_evt_proc_ckpt_usr_info() and republish the patch . > > > -AVM > > > On 12/1/2015 12:25 PM, Nhat Pham wrote: > > osaf/libs/common/cpsv/include/cpd_proc.h | 2 ++ > osaf/services/saf/cpsv/cpd/cpd_evt.c | 8 +++++++- > osaf/services/saf/cpsv/cpd/cpd_proc.c | 22 ++++++++++++++++++++++ > 3 files changed, 31 insertions(+), 1 deletions(-) > > > Problem: > -------- > A non-collocated checkpoint is firstly created on SC-2. Then the > checkpoint is closed on SC-2. > The CPD broadcasts CPND_EVT_D2ND_CKPT_RDSET with START to start > retention duration timer on CPND because there is no user. During that > time the checkpoint is opened again and using on PL-3. > After retention duration, the checkpoint is destroyed on both SC-1 and SC-2. > > Solution: > --------- > The problem happens because the CPD doesn't broadcasts > CPND_EVT_D2ND_CKPT_RDSET with STOP when the checkpoint is opened again > on PL-3. The CPD is updated to broadcasts CPND_EVT_D2ND_CKPT_RDSET > with STOP when the checkpoint is opened again. > > diff --git a/osaf/libs/common/cpsv/include/cpd_proc.h > b/osaf/libs/common/cpsv/include/cpd_proc.h > --- a/osaf/libs/common/cpsv/include/cpd_proc.h > +++ b/osaf/libs/common/cpsv/include/cpd_proc.h > @@ -71,6 +71,8 @@ uint32_t cpd_proc_retention_set(CPD_CB * > uint32_t cpd_proc_unlink_set(CPD_CB *cb, CPD_CKPT_INFO_NODE **ckpt_node, > CPD_CKPT_MAP_INFO *map_info, SaNameT *ckpt_name); > +void cpd_proc_broadcast_RDSET_STOP(SaCkptCheckpointHandleT > ckpt_id, CPD_CB *cb); > + > void cpd_cb_dump(void); > uint32_t cpd_mbcsv_chgrole(CPD_CB *cb); diff --git > a/osaf/services/saf/cpsv/cpd/cpd_evt.c > b/osaf/services/saf/cpsv/cpd/cpd_evt.c > --- a/osaf/services/saf/cpsv/cpd/cpd_evt.c > +++ b/osaf/services/saf/cpsv/cpd/cpd_evt.c > @@ -355,8 +355,14 @@ static uint32_t cpd_evt_proc_ckpt_create > } > if (is_first_rep) > TRACE_2("cpd ckpt create success for first replica > ckpt_id:%llx,dest :%"PRIu64,map_info->ckpt_id,sinfo->dest); > - else > + else > TRACE_2("cpd ckpt create success ckpt_id:%llx,dest > :%"PRIu64,map_info->ckpt_id,sinfo->dest); > + > + > + /* In case the first user re-creates the existing > non-collocated checkpoint. All CPND should stop RD timer */ > + if ((is_first_rep == false) && > (!(map_info->attributes.creationFlags & > SA_CKPT_CHECKPOINT_COLLOCATED))) > + if (ckpt_node->num_users == 1) > + cpd_proc_broadcast_RDSET_STOP(ckpt_node->ckpt_id, cb); > TRACE_LEAVE(); > return proc_rc; > diff --git a/osaf/services/saf/cpsv/cpd/cpd_proc.c > b/osaf/services/saf/cpsv/cpd/cpd_proc.c > --- a/osaf/services/saf/cpsv/cpd/cpd_proc.c > +++ b/osaf/services/saf/cpsv/cpd/cpd_proc.c > @@ -1251,3 +1251,25 @@ uint32_t cpd_ckpt_reploc_imm_object_dele > } > return NCSCC_RC_SUCCESS; > } > + > +/****************************************************************** > +************************ > > + * Name : cpd_proc_broadcast_RDSET_STOP > + * > + * Description : This routine broadcast message > CPND_EVT_D2ND_CKPT_RDSET with STOP > + * > + * Return Values : None > + * > + * Notes : None > +******************************************************************* > +***********************/ > > + > +void cpd_proc_broadcast_RDSET_STOP(SaCkptCheckpointHandleT ckpt_id, > CPD_CB *cb) > +{ > + CPSV_EVT send_evt; > + > + memset(&send_evt, 0, sizeof(CPSV_EVT)); > + send_evt.type = CPSV_EVT_TYPE_CPND; > + send_evt.info.cpnd.type = CPND_EVT_D2ND_CKPT_RDSET; > + send_evt.info.cpnd.info.rdset.ckpt_id = ckpt_id; > + send_evt.info.cpnd.info.rdset.type = CPSV_CKPT_RDSET_STOP; > + cpd_mds_bcast_send(cb, &send_evt, NCSMDS_SVC_ID_CPND); } > > > > > > > > > > > > > ---------------------------------------------------------------------- > -------- Go from Idea to Many App Stores Faster with Intel(R) XDK Give > your users amazing mobile app experiences with Intel(R) XDK. > Use one codebase in this all-in-one HTML5 development environment. > Design, debug & build mobile apps & 2D/3D high-impact games for multiple > OSs. > http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140 > _______________________________________________ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > begin 666 cpd_broadcast_RDSET_STOP_v2.zip M4$L#!!0````(`'Q%B$>L^;$+N@0``+4,```A````8W!D7V)R;V%D8V%S=%]2 M1%-%5%]35$]07W8R+G!A=&-HM5;[;]LV$/[9^BMN#9K9EF77=IS7T**>["[! M4L>(O ##,! 41=ML)5$0J;3=D/]]1\KR8TD;9P\"MDCQN^/==P_J`"Y^`K:D MZ8(KKB&CFBV=`YC2G*<:>KV3/CT^II2=]6E_$'+VZHQVHUZ?=N>OSMC1X.0T M/)V?SAV6J;MS8%D$82YIQ*C2"OSI9$3&MS,RZN'$_WDZ(S>C8#R#3T(O(9A= M3^&W@^YQ=_"[XTQS&<8\.7>\U7"&D,K48S*.):.:1V@E9Q\S*= NH6 N<J7C M+\!R;G=E"H'O]=HP6_(4])+_#<]BJ;9@#L+0P-%S[!W>S$!+4)KF&G*ND2&! M^J(BIW:B1<)S<X)1Y"!9M%#<F))S8T$J`==Y&T9%+M(%;E!M91ZQ5F8\16OI M@HH4:!JAI!%!W=,KK]]VAG.-1SVTH?6(KH@KG<LOI?.A-*[X7M=J+:EP`AD7 M1GK#OF?YR<J@P))F:(^"+9<L=Y'D*OU>/X=#C/FG58"<;[A<.0I5F!!09)&- M-(9@WP.=G0._P3&2$(GY'#QO(330CE1TWHE%J#I,)HE,.R;!.R)E<1%Q7$0$ MN6'M)81[0PVWS]#LN*[['.UOWX)WTFT=@XO_IX#+`OWL]XB&"D76^4*PUNM( M*_%_A*;S"+)(8Y%^W(&QL 5V;BB^G+R[)I/KT1B:3:14DU1&O.5 #0? !O=^ M.+58:"8T(R*=RQ8$=$(3/H.5(,X;/SC@N'=21!L+UB$F>81F$*5E5@^HCS+^ M.HP7F,0QJK*:1+0RT!J+.ET'UCI92*(BR>KFA3UNU^DD9.J.L.4BES&O[VAY MF!=8Q'>"<=4Q"QL/5&%CP>]TFU51>P*VG0]/0;=RX2FHR8/^8- Z`]<\N@.3 M"MBRM&"[+B.\9-J25[91#. ]_L0<ZD(1VV(Q:;*&">SL9NB/2:_^PG1Y([-J MO: *AO9@1Y9YV96Q+V6Q8+0*R_G+./[<,FT(SE^^F-Y<%L='K2H?O#=5\%2Y M-#BDW:OQ6'''M0_8RX!_<QSF7ZW3A,L4,.?*#E?Z8EHV.N259RF[PS\+I4T_ M_NH-U3+ZH DTCFV'`K6411R!R>*O7QW-#DH9\G?8A]>O84Z1A08<'D+]N_K& M%ZIU+L("[6I;^U#5NY@N%!Q","PKT+\8X_/Z<H+3ZZNK:W\X&X\:C88QT)ZU M+E_O35HDQ/BKS)'=AD'4OEV06\+K&JQJ;Q6PJ_'P=EPW%-?0\2)/H>Q%#-_< M/ZNZ;*?;H[Q*W'[UM<+N56 KK*FP;F_0;?7!M<_>T8-V:\DP98">BB0A,OS MF281CZL:6W$Q\0/?)S<^"7[Q_7$06%)<Q^TT_[?AN)B8I@7#>IP_UG?M-4K, M#6HDK-2(*Y:+S*:LD9HM\1[-)7X\I'QS*4."U4@7?(^/@;7FFY*.6QH76&3G M,)&H<;T[D:;T-M::7<=]RM%_/CK_^6WDN']B.?G3X-80`HJGM@.;0G%K"4_, M37M8O6W!JQ8H\0>7\WHETC!%5:L0;?TEX_ :JETR^W4Z)H;O'93I$FV6I=$& M_WA$OB)E9];C]LHIU+&:/2VR;>/F*)M11MC>NY$BH>75J*J;;XPM%K XWH\" M$MSZY')DO3,DW#M_`5!+`0(4`!0````(`'Q%B$>L^;$+N@0``+4,```A```` M``````$`( ````````!C<&1?8G)O861C87-T7U)$4T547U-43U!?=C(N<&%T 88VA02P4&``````$``0!/````^00````` ` end ------------------------------------------------------------------------------ Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel