Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162]

2017-01-15 Thread minh chau
Hi Nagu,

I misunderstood your point, and now I get it.
In my test I see it works as expected - SU2 becomes Act and no 
assignment for SU1
I guess in your test some how the cluster initiation timer has not been 
started on SC2 (new active), there could be a missing case in the patch.
Could you please share me the trace?

Thanks,
Minh

On 13/01/17 21:48, Nagendra Kumar wrote:
> Hi Minh,
>   Please check my response inlined with [Nagu].
>
> Thanks
> -Nagu
>> -Original Message-
>> From: minh chau [mailto:minh.c...@dektech.com.au]
>> Sent: 13 January 2017 03:53
>> To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya;
>> gary@dektech.com.au
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
>> before standby AMFD comes up [#2162]
>>
>> Hi Nagu,
>>
>> Thanks for reviewing, please see comments inline.
>>
>> Thanks,
>> Minh
>>
>> On 12/01/17 21:48, Nagendra Kumar wrote:
>>> Hi Minh,
>>>  Though I am not able to simulate the problem, I tested as below:
>>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and SU2 on
>> PL-4 as Standby.
>>> 2. Stop SC1 and SC2 and then stop PL-3.
>>> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop SC1. SC2
>> becomes Act.
>> [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then only
>> SU2 has active assignment
> [Nagu]: PL-3 is stopped in step #2.
>>> In this case, SC-2 contains both SU1(Act) and SU2(Standby) assignments.
>>> Ideally, SU2 assignments should have been Act and there shouldn't be SU1
>> assignment.
>> [M]: This seems to be another test where SU1 and SU2 are hosted on SC2,
>> then both SU1 and SU2 should get assignment
> [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays both SU1 
> and SU2 assignments.
>  SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
> This is similar test case, which is mentioned in the ticket?
>>>
>> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>> mo,safApp=AmfDemo1
>>>   saAmfSISUHAState=ACTIVE(1)
>>>   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>>
>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>> mo,safApp=AmfDemo1
>>>   saAmfSISUHAState=STANDBY(2)
>>>   saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>>
>>> Please check.
>>>
>>> Thanks
>>> -Nagu
>>>
 -Original Message-
 From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
 Sent: 08 November 2016 08:53
 To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya;
 gary@dektech.com.au; minh.c...@dektech.com.au
 Cc: opensaf-devel@lists.sourceforge.net
 Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless sync
 before standby AMFD comes up [#2162]

osaf/services/saf/amf/amfnd/di.cc   |  7 +--
osaf/services/saf/amf/amfnd/susm.cc |  6 ++
2 files changed, 11 insertions(+), 2 deletions(-)


 This case of SC failover causes new active AMFD getting stuck in
 node_up messages

 Say first active controller is SC1, which goes down during headless sync.
 Therefore, the amfnd on SC2 receives mds_down of AVD, then both
 is_avd_down and amfd_sync_required are set to true. When SC2 takes
 over active role, amfnd on SC2 receives mds_up, but only is_avd_down
 is set to false and the variable amfd_sync_required remains true.
 When amfnd-SC2 finishes initiating middleware SU, it needs to send
 su_oper message to AMFD, but it is failed to send out due to
>> amfd_sync_required.
 In this scenario of SC failover, amfd_sync_required needs to set to
 false when amfnd on SC2 receives su_pres message on middleware SUs.
 That means amfnd on active controller does not need to wait for
 set_leds message, to be informed that cluster initiation is done, so
 that amfnd can sen su_oper messages to AMFD. This logic also aligns
 with normal headless scenario, where amfnd on active controller has
 amfd_sync_required initially marked as false because no middleware
 SUs are initiated. When amfd_sync_required is true that means amfnd
 all middleware SUs are initiated and assigned before headless, thus
 amfnd needs to wait for cluster initiation after headless.

 diff --git a/osaf/services/saf/amf/amfnd/di.cc
 b/osaf/services/saf/amf/amfnd/di.cc
 --- a/osaf/services/saf/amf/amfnd/di.cc
 +++ b/osaf/services/saf/amf/amfnd/di.cc
 @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
if (avnd_diq_rec_add(cb, &msg) == nullptr) {
rc = NCSCC_RC_FAILURE;
}
 -  LOG_NO("avnd_di_oper_send() deferred as AMF director is
 offline");
 +  LOG_NO("avnd_di_oper_send() deferred as AMF director is
 offline(%d),"
 +  " or sync is required(%d)", cb->is_avd_d

Re: [devel] [PATCH 1 of 1] ckpt: fix extended name issues in the library [#2128]

2017-01-15 Thread Vo Minh Hoang
Hi Zoran,

ACK from me.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Friday, January 13, 2017 10:15 AM
To: Zoran Milinkovic 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] ckpt: fix extended name issues in the
library [#2128]

Hi Zora,

ACK.

LONG DN not tested .

-AVM


On 1/10/2017 8:57 PM, Zoran Milinkovic wrote:
>   src/ckpt/agent/cpa_api.c   |  26 +++---
>   src/ckpt/ckptnd/cpnd_evt.c |  12 
>   2 files changed, 19 insertions(+), 19 deletions(-)
>
>
> Fix string temination issues when SaNameT value is provided to
saCkptCheckpointOpen(), saCkptCheckpointOpenAsync() and
saCkptCheckpointUnlink().
>
> diff --git a/src/ckpt/agent/cpa_api.c b/src/ckpt/agent/cpa_api.c
> --- a/src/ckpt/agent/cpa_api.c
> +++ b/src/ckpt/agent/cpa_api.c
> @@ -871,13 +871,17 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   SaTimeT time_out=0;
>   CPA_GLOBAL_CKPT_NODE *gc_node = NULL;
>   SaConstStringT ckpt_name = NULL;
> + size_t ckpt_name_len;
>   
>   TRACE_ENTER2("SaCkptCheckpointHandleT passed is %llx",ckptHandle);
>   if ((checkpointName == NULL) || (checkpointHandle == NULL) ||
(osaf_extended_name_length(checkpointName) == 0)) {
>   TRACE_4("Cpa CkptOpen Api failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_INVALID_PARAM, ckptHandle);
>   TRACE_LEAVE2("API return code = %u",
SA_AIS_ERR_INVALID_PARAM);
>   return SA_AIS_ERR_INVALID_PARAM;
> - } else if (osaf_extended_name_length(checkpointName) >
kOsafMaxDnLength) {
> + }
> +
> + ckpt_name_len = osaf_extended_name_length(checkpointName);
> + if (ckpt_name_len > kOsafMaxDnLength) {
>   TRACE_4("Cpa CkptOpen Api failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_TOO_BIG, ckptHandle);
>   TRACE_LEAVE2("API return code = %u", SA_AIS_ERR_TOO_BIG);
>   return SA_AIS_ERR_TOO_BIG;
> @@ -962,7 +966,7 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   lc_node->cl_hdl = ckptHandle;
>   lc_node->open_flags = checkpointOpenFlags;
>   
> - lc_node->ckpt_name = strdup(ckpt_name);
> + lc_node->ckpt_name = strndup(ckpt_name, ckpt_name_len);
>   
>   /* Add CPA_LOCAL_CKPT_NODE to lcl_ckpt_hdl_tree */
>   proc_rc = cpa_lcl_ckpt_node_add(&cb->lcl_ckpt_tree, lc_node); @@ 
> -981,7 +985,7 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH
>   evt.info.cpnd.info.openReq.client_hdl = ckptHandle;
>   evt.info.cpnd.info.openReq.lcl_ckpt_hdl = lc_node->lcl_ckpt_hdl;
>   
> - osaf_extended_name_lend(ckpt_name,
&evt.info.cpnd.info.openReq.ckpt_name);
> + osaf_extended_name_lend(lc_node->ckpt_name, 
> +&evt.info.cpnd.info.openReq.ckpt_name);
>   
>   if (checkpointCreationAttributes) {
>   evt.info.cpnd.info.openReq.ckpt_attrib = 
> *checkpointCreationAttributes; @@ -1178,6 +1182,7 @@ SaAisErrorT
saCkptCheckpointOpenAsync(Sa
>   CPA_CLIENT_NODE *cl_node = NULL;
>   uint32_t proc_rc = NCSCC_RC_SUCCESS;
>   SaConstStringT ckpt_name = NULL;
> + size_t ckpt_name_len;
>   
>   TRACE_ENTER2("SaCkptCheckpointHandleT passed is %llx",ckptHandle);
>   
> @@ -1185,7 +1190,10 @@ SaAisErrorT saCkptCheckpointOpenAsync(Sa
>   TRACE_4("cpa CkptOpenAsync Api failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_INVALID_PARAM, ckptHandle);
>   TRACE_LEAVE2("API return code = %u",
SA_AIS_ERR_INVALID_PARAM);
>   return SA_AIS_ERR_INVALID_PARAM;
> - } else if (osaf_extended_name_length(checkpointName) >
kOsafMaxDnLength) {
> + }
> +
> + ckpt_name_len = osaf_extended_name_length(checkpointName);
> + if (ckpt_name_len > kOsafMaxDnLength) {
>   TRACE_4("Cpa CkptOpenAsync Api failed with return
value:%d,ckptHandle:%llx", SA_AIS_ERR_TOO_BIG, ckptHandle);
>   TRACE_LEAVE2("API return code = %u", SA_AIS_ERR_TOO_BIG);
>   return SA_AIS_ERR_TOO_BIG;
> @@ -1262,7 +1270,7 @@ SaAisErrorT saCkptCheckpointOpenAsync(Sa
>   lc_node->lcl_ckpt_hdl = NCS_PTR_TO_UNS64_CAST(lc_node);
>   lc_node->cl_hdl = ckptHandle;
>   lc_node->open_flags = checkpointOpenFlags;
> - lc_node->ckpt_name = strdup(ckpt_name);
> + lc_node->ckpt_name = strndup(ckpt_name, ckpt_name_len);
>   
>   /* Add CPA_LOCAL_CKPT_NODE to lcl_ckpt_hdl_tree */
>   proc_rc = cpa_lcl_ckpt_node_add(&cb->lcl_ckpt_tree, lc_node); @@ 
> -1281,7 +1289,7 @@ SaAisErrorT saCkptCheckpointOpenAsync(Sa
>   evt.info.cpnd.info.openReq.client_hdl = ckptHandle;
>   evt.info.cpnd.info.openReq.lcl_ckpt_hdl = lc_node->lcl_ckpt_hdl;
>   
> - osaf_extended_name_lend(ckpt_name,
&evt.info.cpnd.info.openReq.ckpt_name);
> + osaf_extended_name_lend(lc_node->ckpt_name, 
> +&evt.info.cpnd.info.openReq.ckpt_name);
>   
>   if (checkpointCreationAttributes) {
>   evt.info.cpnd.info.openReq.ckpt_attrib = 
> *checkpointCreationAttributes; @@ -1585,7 

Re: [devel] [PATCH 1 of 1] ckpt: fix memory leak in saCkptCheckpointRead [#2256]

2017-01-15 Thread Vo Minh Hoang
Hi Zoran,

ACK.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, January 11, 2017 11:07 AM
To: Zoran Milinkovic 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] ckpt: fix memory leak in
saCkptCheckpointRead [#2256]

Hi Zoran,

ACK Not tested.

-AVM


On 1/10/2017 9:58 PM, Zoran Milinkovic wrote:
>   src/ckpt/agent/cpa_proc.c |  22 --
>   1 files changed, 16 insertions(+), 6 deletions(-)
>
>
> Allocated memory in cpa_proc_build_data_access_evt() is freed.
>
> diff --git a/src/ckpt/agent/cpa_proc.c b/src/ckpt/agent/cpa_proc.c
> --- a/src/ckpt/agent/cpa_proc.c
> +++ b/src/ckpt/agent/cpa_proc.c
> @@ -1124,11 +1124,13 @@ uint32_t cpa_proc_build_data_access_evt(
>SaSizeT maxSectionSize, SaUint32T
*errflag, CPSV_CKPT_DATA **ckpt_data)
>   {
>   CPSV_CKPT_DATA *tmp_ckpt_data = NULL;
> + *ckpt_data = NULL;
>   if (numberOfElements > 0) {
>   while (numberOfElements > 0) {
>   tmp_ckpt_data = m_MMGR_ALLOC_CPSV_CKPT_DATA;
> - if (tmp_ckpt_data == NULL)
> - return NCSCC_RC_FAILURE;
> + if (tmp_ckpt_data == NULL) {
> + goto free_mem;
> + }
>   memset(tmp_ckpt_data, '\0', sizeof(CPSV_CKPT_DATA));
>   
>   switch (data_access_type) {
> @@ -1150,7 +1152,7 @@ uint32_t cpa_proc_build_data_access_evt(
>   ioVector[numberOfElements -
1].dataSize) > maxSectionSize) {
>   if (errflag != NULL)
>   *errflag =
(numberOfElements - 1);
> - return NCSCC_RC_FAILURE;
> + goto free_mem;
>   } else
>   tmp_ckpt_data->dataSize =
ioVector[numberOfElements - 1].dataSize;
>   }
> @@ -1159,7 +1161,7 @@ uint32_t cpa_proc_build_data_access_evt(
>   break;
>   
>   default:
> - return NCSCC_RC_FAILURE;
> + goto free_mem;
>   }
>   
>   if (*ckpt_data == NULL)
> @@ -1171,9 +1173,17 @@ uint32_t cpa_proc_build_data_access_evt(
>   numberOfElements--;
>   }
>   return NCSCC_RC_SUCCESS;
> - } else {
> - return NCSCC_RC_FAILURE;
>   }
> +
> +free_mem:
> + free(tmp_ckpt_data);
> + while(*ckpt_data) {
> + tmp_ckpt_data = *ckpt_data;
> + *ckpt_data = tmp_ckpt_data->next;
> + m_MMGR_FREE_CPSV_CKPT_DATA(tmp_ckpt_data);
> + }
> +
> + return NCSCC_RC_FAILURE;
>   }
>   
>   
> /*
> ***



--
Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon
Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] ckpt: fix memory leak in cpd_a2s_ckpt_usr_info [#2257]

2017-01-15 Thread Vo Minh Hoang
Hi Zoran,

ACK.

Sincerely,
Hoang

-Original Message-
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: Wednesday, January 11, 2017 10:59 AM
To: Zoran Milinkovic 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [devel] [PATCH 1 of 1] ckpt: fix memory leak in
cpd_a2s_ckpt_usr_info [#2257]

Hi Zoran,

ACK Not tested.

-AVM


On 1/10/2017 9:18 PM, Zoran Milinkovic wrote:
>   src/ckpt/ckptd/cpd_red.c |  3 +++
>   1 files changed, 3 insertions(+), 0 deletions(-)
>
>
> Fix memory leak in cpd_a2s_ckpt_usr_info() in cpd_red.c
>
> diff --git a/src/ckpt/ckptd/cpd_red.c b/src/ckpt/ckptd/cpd_red.c
> --- a/src/ckpt/ckptd/cpd_red.c
> +++ b/src/ckpt/ckptd/cpd_red.c
> @@ -339,5 +339,8 @@ void cpd_a2s_ckpt_usr_info(CPD_CB *cb, C
>   TRACE_4("cpd A2S ckpt user info async update failed");
>   else
>   TRACE_1("cpd A2S ckpt user info async update success");
> +
> + free(cpd_msg.info.usr_info_2.node_list);
> +
>   TRACE_LEAVE();
>   }



--
Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon
Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel