Re: [devel] checkpoint problems
Hi Alex, Use the below patch as workaround for you to proceed your testing . This patch just increases the MDS internal fragmentation value to ~ TIPC_MAX_USER_MSG_SIZE define in tipc.h I will work with Hans to have final patch by considering the both TIPC TCP transports, and testing involved as a part of ticket `#654 MDS improvements` (https://sourceforge.net/p/opensaf/tickets/654/ ). I tested this patch with 10K sections checkpoint memory used was : 10136000 on TIPC transport. == diff --git a/osaf/libs/core/mds/include/mds_dt.h b/osaf/libs/core/mds/include/mds_dt.h --- a/osaf/libs/core/mds/include/mds_dt.h +++ b/osaf/libs/core/mds/include/mds_dt.h @@ -32,6 +32,7 @@ #include ncs_main_papi.h #include ncssysf_mem.h #include ncspatricia.h +#include linux/tipc.h /* This file is private to the MDTM layer. */ @@ -109,7 +110,7 @@ typedef struct mdtm_reassembly_queue { #define MDTM_MAX_DIRECT_BUFF_SIZE MDTM_MAX_SEGMENT_SIZE -#define MDTM_NORMAL_MSG_FRAG_SIZE 1400 +#define MDTM_NORMAL_MSG_FRAG_SIZE (TIPC_MAX_USER_MSG_SIZE-1000) /* TIPC_MAX_USER_MSG_SIZE = 66000 define linux/tipc.h */ #define MDTM_RECV_BUFFER_SIZE ((MDS_DIRECT_BUF_MAXSIZEMDTM_NORMAL_MSG_FRAG_SIZE)? \ (MDS_DIRECT_BUF_MAXSIZE+SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN):(MDTM_NORMAL_MSG_FRAG_SIZE+SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN)) == -AVM On 1/8/2014 10:42 PM, Alex Jones wrote: Hi Hans, Changing rmem_default and rmem_max has no effect on the problem. I even tried up to 2M to no avail. However, after looking at the cpnd_transfer_replica function in cpnd_evt.c, I found the following in cpsv_evt.h which controls how large the packets are which are sent through MDS: #define MAX_SYNC_TRANSFER_SIZE (30 * 1024 * 1024) 30M? What is the rationale for this number? This seems way too high. When I change it to (4*1024*1024) (4M) it solves my problem, and doesn't appear to affect performance. Alex On 01/08/2014 08:30 AM, Hans Feldt wrote: sysctl -a | grep rmem set rmem_default to 256K or so /Hans -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: den 8 januari 2014 14:01 To: A V Mahesh; Alex Jones Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] checkpoint problems The socket receive buffer size used is the system default. It can be too small, pump it up. I plan todo some change in MDS for this (and other stuff). /Hans -Original Message- From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: den 8 januari 2014 11:29 To: Alex Jones Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] checkpoint problems Hi Alex, I suggest you increase and try the following TIPC values ( tipc code ) and rebuild `tipc.ko`: net/tipc/tipc_socket.c:#define OVERLOAD_LIMIT_BASE 5000 You can increase it to 5 and try again. - AVM. On 1/8/2014 4:16 AM, Alex Jones wrote: After doing some deep debugging I am seeing the following in the MDS log on node B. This is when the CPND_EVT_ND2ND_CKPT_ACTIVE_SYNC is sent from the active replica on node A to the replica on node B. The sync message never gets up to the CPND layer on node B because it is dropped. This is with 10k sections, each section 1k. Jan 7 21:32:32.772347 1789648919 ERR|MDTM: Frag recd is not next frag so dropping adest=0x010010023922604c Jan 7 21:32:32.772399 1789648919 ERR|MDTM: Message is dropped as msg is out of seq TRANSPOR-ID=0x010010023922604c I've turned on MDS debug on node B, and the packet being sent over is gigantic. It starts failing at fragment number 2703. The next fragment that comes in is 2707, then 2722. The last fragment that comes in is 7444. I've done a cursory look at the hardware stats, and nothing is being rate-limited or dropped. I'm going to take a deeper look at this, but I'm mentioning it in case it rings any bells. I am using TIPC as the transport. Alex On 01/07/2014 07:24 AM, Alex Jones wrote: AVM, I get SA_AIS_ERR_TIMEOUT even when I pass SA_TIME_END as the timeout value. Is this not a bug? the synchronous CheckpointOpen call doesn't work at all in this scenario. It never succeeds. I can reproduce the problem with sectionCreationAttributes.expirationTime set to SA_TIME_ONE_DAY. You should be able to reproduce the problem with the code I sent in the last e-mail. Alex On 01/06/2014 10:31 PM, A V Mahesh wrote: Hi Alex, CheckpointOpen call failing with SA_AIS_ERR_TIMEOUT NOT a bug , it is expected if you pass less time out value `timeout = 10` to saCkptCheckpointOpen(,timeout ...) call ,when ckpt has very large data/section. just increasing timeout will avoids the SA_AIS_ERR_TIMEOUT. Let us focus on your original issue/scenario, are you able to reproduce the
Re: [devel] [PATCH 1 of 1] base: memory leaks in base [#712]
Also please push this fix in 4.3.x too. Thanks, Ramesh. On 1/9/2014 1:49 PM, Ramesh Betham wrote: Ack. Thanks, Ramesh. On 1/8/2014 7:56 PM, reddy.neelaka...@oracle.com wrote: osaf/libs/core/leap/sysf_mem.c| 1 + osaf/libs/core/mbcsv/mbcsv_util.c | 12 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/osaf/libs/core/leap/sysf_mem.c b/osaf/libs/core/leap/sysf_mem.c --- a/osaf/libs/core/leap/sysf_mem.c +++ b/osaf/libs/core/leap/sysf_mem.c @@ -426,6 +426,7 @@ USRBUF *sysf_alloc_pkt(unsigned char poo m_PMGR_LK(gl_ub_pool_mgr.lock); if (pool_id = UB_MAX_POOLS) { +m_NCS_MEM_FREE(ub, NCS_MEM_REGION_IO_DATA_HDR, NCS_SERVICE_ID_OS_SVCS, 2); m_PMGR_UNLK(gl_ub_pool_mgr.lock); m_LEAP_DBG_SINK(0); return NULL; diff --git a/osaf/libs/core/mbcsv/mbcsv_util.c b/osaf/libs/core/mbcsv/mbcsv_util.c --- a/osaf/libs/core/mbcsv/mbcsv_util.c +++ b/osaf/libs/core/mbcsv/mbcsv_util.c @@ -340,6 +340,7 @@ uint32_t mbcsv_send_ckpt_data_to_all_pee NCS_UBAID *uba = NULL; USRBUF *dup_ub = NULL; MBCSV_EVT evt_msg; +uint32_t rc = NCSCC_RC_SUCCESS; TRACE_ENTER(); if (NULL == ckpt_inst-peer_list) { @@ -460,7 +461,7 @@ uint32_t mbcsv_send_ckpt_data_to_all_pee switch (msg_to_send-i_send_type) { case NCS_MBCSV_SND_SYNC: { - m_NCS_MBCSV_MDS_SYNC_SEND(evt_msg, +rc = m_NCS_MBCSV_MDS_SYNC_SEND(evt_msg, tmp_ptr-my_ckpt_inst, tmp_ptr-peer_anchor); } break; @@ -468,15 +469,22 @@ uint32_t mbcsv_send_ckpt_data_to_all_pee case NCS_MBCSV_SND_USR_ASYNC: case NCS_MBCSV_SND_MBC_ASYNC: { - m_NCS_MBCSV_MDS_ASYNC_SEND(evt_msg, +rc = m_NCS_MBCSV_MDS_ASYNC_SEND(evt_msg, tmp_ptr-my_ckpt_inst, tmp_ptr-peer_anchor); } break; default: +m_MMGR_FREE_BUFR_LIST(dup_ub); TRACE_LEAVE2(unsupported send type); return NCSCC_RC_FAILURE; } tmp_ptr-ckpt_msg_sent = true; + /* In failure scenario, there is a chance that MDS may not free dup_ub */ +if (rc != NCSCC_RC_SUCCESS) +{ +TRACE_LEAVE2(MBCSv SEND Failed); +m_MMGR_FREE_BUFR_LIST(dup_ub); +} } tmp_ptr = tmp_ptr-next; } -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] amfd : clear node level admin op related flags during sufailover [#663]
I have a similar problem and solution in my pending patch for http://sourceforge.net/p/opensaf/tickets/601/ I don't think we should return TIMEOUT, this can cause upgrades to fail. I mean the purpose of the LOCK operation (to un-assign the service) has been fulfilled albeit with some errors. So AMF should return OK to the IMM admin operation. Another comment, if you move the node_complete_admin_op up you can remove the forward declaration and make it static (file local). Will you send another patch with the above comments? Thanks, Hans On 01/09/2014 07:39 AM, praveen.malv...@oracle.com wrote: osaf/services/saf/amf/amfd/include/proc.h | 1 + osaf/services/saf/amf/amfd/sgproc.cc | 31 +++ 2 files changed, 32 insertions(+), 0 deletions(-) When admin operation like lock and shutdown are performed on Node and this leads to sufailover, AMFD is not clearing the operation related flag in node pointer. This patch ensures if sufailover gets escalated when admin operation is performed on the node, then AMF will clear operation related flag. Also AMF will respond to IMM for the completion of operation. diff --git a/osaf/services/saf/amf/amfd/include/proc.h b/osaf/services/saf/amf/amfd/include/proc.h --- a/osaf/services/saf/amf/amfd/include/proc.h +++ b/osaf/services/saf/amf/amfd/include/proc.h @@ -161,6 +161,7 @@ extern AVD_SU *get_other_su_from_oper_li extern void su_complete_admin_op(AVD_SU *su, SaAisErrorT result); extern void comp_complete_admin_op(AVD_COMP *comp, SaAisErrorT result); extern void su_disable_comps(AVD_SU *su, SaAisErrorT result); +extern void node_complete_admin_op(AVD_AVND *node, SaAisErrorT result); extern bool cluster_su_instantiation_done(AVD_CL_CB *cb, AVD_SU *su); extern void cluster_startup_expiry_event_generate(AVD_CL_CB *cb); diff --git a/osaf/services/saf/amf/amfd/sgproc.cc b/osaf/services/saf/amf/amfd/sgproc.cc --- a/osaf/services/saf/amf/amfd/sgproc.cc +++ b/osaf/services/saf/amf/amfd/sgproc.cc @@ -277,6 +277,20 @@ static uint32_t sg_su_failover_func(AVD_ avd_su_readiness_state_set(su, SA_AMF_READINESS_OUT_OF_SERVICE); su_complete_admin_op(su, SA_AIS_ERR_TIMEOUT); su_disable_comps(su, SA_AIS_ERR_TIMEOUT); + if (su-su_on_node-admin_node_pend_cbk.invocation != 0) { + + /* Node level operation is going on the node hosting the SU for which +sufailover got escalated. Sufailover event will always come after the +initiation of node level operation on the list of SUs. So if this SU has +list of SUSIs, AMF would have sent assignment as a part of node level operation. +Now SUSIs in this SU are going to be deleted so we can decrement the counter in node. + */ + su-su_on_node-su_cnt_admin_oper--; + + /* If node level operation is finished on all the SUs, reply to imm.*/ + if (su-su_on_node-su_cnt_admin_oper == 0) + node_complete_admin_op(su-su_on_node, SA_AIS_ERR_TIMEOUT); + } /*If the AvD is in AVD_APP_STATE then reassign all the SUSI assignments for this SU */ if (avd_cb-init_state == AVD_APP_STATE) { @@ -2116,3 +2130,20 @@ void su_disable_comps(AVD_SU *su, SaAisE m_AVSV_SEND_CKPT_UPDT_ASYNC_UPDT(avd_cb, comp, AVSV_CKPT_AVD_COMP_CONFIG); } } + +/** + * @briefThis function completes admin operation on node. + * It responds IMM with the result of admin operation on node. + * @paramptr to node + * @paramresult + * + */ +void node_complete_admin_op(AVD_AVND *node, SaAisErrorT result) +{ + if (node-admin_node_pend_cbk.invocation != 0) { + avd_saImmOiAdminOperationResult(avd_cb-immOiHandle, + node-admin_node_pend_cbk.invocation, result); + node-admin_node_pend_cbk.invocation = 0; + node-admin_node_pend_cbk.admin_oper = static_castSaAmfAdminOperationIdT(0); + } +} -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] amf: Donot allow more than 1 csi addition in single ccb [#681]
Hi, As I previously proposed (and even provided patch for) to change the SG FSM state to non stable would solve the problem in a more elegant and standard way. Why can't we just do that instead? And what about the use case, a fact is that no documentation states that only a single CSI per CCB can be used. What are the complications of supporting multiple CSIs per CCB? Thanks, Hans On 12/19/2013 04:45 PM, nagendr...@oracle.com wrote: osaf/libs/common/immsv/include/immutil.h | 1 + osaf/services/saf/amf/amfd/csi.cc| 12 +++- osaf/tools/safimm/src/immutil.c | 15 +++ 3 files changed, 27 insertions(+), 1 deletions(-) diff --git a/osaf/libs/common/immsv/include/immutil.h b/osaf/libs/common/immsv/include/immutil.h --- a/osaf/libs/common/immsv/include/immutil.h +++ b/osaf/libs/common/immsv/include/immutil.h @@ -172,6 +172,7 @@ CcbUtilOperationData_t *ccbutil_getNextC * @return CcbUtilOperationData_t* */ CcbUtilOperationData_t *ccbutil_getCcbOpDataByDN(SaImmOiCcbIdT id, const SaNameT *dn); +CcbUtilOperationData_t *ccbutil_getCcbOpDataByClassName(SaImmOiCcbIdT ccbId, CcbUtilOperationData_t *opData); /*@}*/ /** diff --git a/osaf/services/saf/amf/amfd/csi.cc b/osaf/services/saf/amf/amfd/csi.cc --- a/osaf/services/saf/amf/amfd/csi.cc +++ b/osaf/services/saf/amf/amfd/csi.cc @@ -424,7 +424,17 @@ static SaAisErrorT csi_ccb_completed_cre avsv_sanamet_init(opdata-objectName, si_name, safSi); avd_si = avd_si_get(si_name); - + /* Donot allow two csi in single ccb if si is already assigned. */ + if ((opdata != NULL) (avd_si != NULL)) { + if ((ccbutil_getCcbOpDataByClassName(opdata-ccbId, opdata) != NULL) + (NULL != avd_si-list_of_sisu)) { + report_ccb_validation_error(opdata, '%s' More than one csi in CCB for assigned si, + not allowed, opdata-objectName.value); + rc = SA_AIS_ERR_BAD_OPERATION; + LOG_WA(More than one csi in CCB for assigned si); + goto done; + } + } if (NULL != avd_si) { /* Check whether si has been assigned to any SU. */ if (NULL != avd_si-list_of_sisu) { diff --git a/osaf/tools/safimm/src/immutil.c b/osaf/tools/safimm/src/immutil.c --- a/osaf/tools/safimm/src/immutil.c +++ b/osaf/tools/safimm/src/immutil.c @@ -222,6 +222,21 @@ CcbUtilOperationData_t *ccbutil_getCcbOp return opData; } +CcbUtilOperationData_t *ccbutil_getCcbOpDataByClassName(SaImmOiCcbIdT ccbId, CcbUtilOperationData_t *opData) +{ +CcbUtilOperationData_t *opData_p = ccbutil_getNextCcbOp(ccbId, NULL); + +while (opData_p != NULL) { + if ((strcmp(opData-param.create.className, opData_p-param.create.className) == 0) + (opData_p != opData)) /* Should return another object with the same name. */ + break; + + opData_p = ccbutil_getNextCcbOp(ccbId, opData_p); + } + +return opData_p; +} + /* -- * General IMM help utilities; */ -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 00 of 15] Review Request for AMF: Coverity issues in amfd/amfnd [#680]
Is this a patch series for 4.5 or default? The ticket is an enhancement, is the 4.4 window closed? Thanks, Hans On 01/03/2014 06:51 AM, Gary Lee wrote: Summary: AMF: Coverity issues in amfd/amfnd [#680] Review request for Trac Ticket(s): 680 Peer Reviewer(s): Hans N, Hans F, Praveen, Nagendra Pull request to: Affected branch(es): default Development branch: default Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each y above): - This patch set fixes a number of 'issues' in amfd and amfnd identified by Coverity. A number of 'issues' still remain and will be fixed in a later patch/ticket. Basic testing and valgrind have been run. changeset 0b48fcffb4ff4a739e56d33e5780e535cd232d3a Author: Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Logically dead code (DEADCODE) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unused pointer value (UNUSED_VALUE) * Also remove check for NULL return value from avd_app_new() as this is no longer required changeset f4290079e205f56acd5597253927b384b64765da Author: Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Logically dead code (DEADCODE) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unused pointer value (UNUSED_VALUE) changeset b19a672ceb57f418503aeced80da5a452943d16a Author: Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Big parameter passed by value (PASS_BY_VALUE) * Remove saAmfStgValidSuTypes from AVD_AMF_SG_TYPE, as it appears to be a typo of saAmfSGtValidSuTypes changeset 80ff873eb3f034d2ca4ff30953bfb6bb925ac723 Author: Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unchecked return value (CHECKED_RETURN) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unnecessary header file (HFA) * Big parameter passed by value (PASS_BY_VALUE) changeset 91e108b0576533083ea81dbf4ffebfe5f86e33f0 Author: Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unchecked return value (CHECKED_RETURN) * Dereference null return value (NULL_RETURNS) * Logically dead code (DEADCODE) * Uninitialized scalar variable (UNINIT) * Unnecessary header file (HFA) * Big parameter passed by value (PASS_BY_VALUE) * Unused pointer value (UNUSED_VALUE) * Also, replace saAmfStgValidSuTypes with saAmfSGtValidSuTypes in sgtype.cc changeset df1a77bc86b2cdf885cf444280bbef6c9ad90174 Author: Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Dereference after null check (FORWARD_NULL) * Dereference null return value (NULL_RETURNS) * Logically dead code (DEADCODE) * Big parameter passed by value (PASS_BY_VALUE) * Unchecked return value (CHECKED_RETURN) * Side effect in assertion (ASSERT_SIDE_EFFECT) * Unused pointer value (UNUSED_VALUE) changeset ea43d2c57d577714d67c849cd2cca74d3a5b9b06 Author: Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unused pointer value (UNUSED_VALUE) * Unnecessary header file (HFA) changeset ae6184583eff874828c0433f82755881f395c4c4 Author: Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Calling risky function (SECURE_CODING) replace calls to sprintf with snprintf changeset
Re: [devel] [PATCH 0 of 4] Review Request for clm: Add CLMNA as a component and support node eviction after opensafd stop [#220]
Then I am confused, CLMNA is now a daemon for what purpose? /Hans -Original Message- From: Mathivanan Naickan Palanivelu [mailto:mathi.naic...@oracle.com] Sent: den 9 januari 2014 13:18 To: Hans Feldt; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 4] Review Request for clm: Add CLMNA as a component and support node eviction after opensafd stop [#220] -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: Thursday, January 09, 2014 5:38 PM To: Mathivanan Naickan Palanivelu; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] [PATCH 0 of 4] Review Request for clm: Add CLMNA as a component and support node eviction after opensafd stop [#220] One question, should the new CLMNA component be terminated by amfnd or ignore the cleanup and get killed by the init script a bit like dtmd? [Mathi] CLMNA being a component must be killed by AMFND during 'opensafd stop' The CLM server shall mark a node as non-member upon receiving AMFND down Event on a node on which 'opensafd stop' is being done. If killed by amfnd can it introduce a failover race problem of some kind? [Mathi] No, FM shall act on AMFND down to perform failover and AMFND being the last guy to exit. Mathi. Thanks, Hans On 12/04/2013 02:00 PM, mathi.naic...@oracle.com wrote: Summary: Add CLMNA as a component, evict a node after 'opensafd stop' [#220] Review request for Trac Ticket(s): #220 Peer Reviewer(s): Ramesh Pull request to: LIST THE PERSON WITH PUSH ACCESS HERE Affected branch(es): default Development branch: IF ANY GIVE THE REPO URL Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesy Core libraries n Samples n Tests n Other n Comments (indicate scope for each y above): - This patch adds CLMNA as a component to the NoRedundancy middleware SU with recovery as node failfast! CLM Server shall treat CLMNA down as an additional evidence to indicate that the can now be marked as 'left the cluster'. On controllers, FM shall subscribe to AMFND down. During an OpenSAF stop, when AMFND exits (at last), then FM receives this DOWN and shall perform controller failover. For users who have set OPENSAF_MANAGE_TIPC=no, when they do a /etc/init.d/opensafd stop, then the node will leave the cluster even if TIPC is running. TODO: - The patch works, however some testing is inprogress (i.e. dynamic addition to informaiton model is pending). - condition to restrict this behaviour only when PLM is disabled to be added. - provide a cli tool like 'clm-list' that lists the status of nodes in the cluster. changeset 00db20213aed87af57cb62940e38334d262d96fd Author: Mathivanan N.P.mathi.naic...@oracle.com Date: Wed, 04 Dec 2013 18:19:37 +0530 clm: add to the information model and update config, scripts [#220] changeset 8f93e72faea5f0dbeae073fb5868245db38b23f4 Author: Mathivanan N.P.mathi.naic...@oracle.com Date: Wed, 04 Dec 2013 18:20:15 +0530 clm: process clmna down at the clm server[#220] changeset 84e4801c4bf5aa036c94a0c6844e9476fd0b32ba Author: Mathivanan N.P.mathi.naic...@oracle.com Date: Wed, 04 Dec 2013 18:20:40 +0530 clm: register with amf [#220] changeset 858af6ff6a26d9f6fef5df54375febb2b82d8e13 Author: Mathivanan N.P.mathi.naic...@oracle.com Date: Wed, 04 Dec 2013 18:25:12 +0530 fm: trigger failover upon receiving AMFND down [#220] Added Files: osaf/services/saf/clmsv/config/clmna.conf Complete diffstat: -- opensaf.spec.in | 1 + osaf/services/infrastructure/fm/fms/fm_mds.c | 33 ++--- osaf/services/saf/clmsv/clms/clms_evt.c | 3 +++ osaf/services/saf/clmsv/clms/clms_evt.h | 1 + osaf/services/saf/clmsv/clms/clms_mds.c | 18 ++-- -- osaf/services/saf/clmsv/config/Makefile.am | 3 ++- osaf/services/saf/clmsv/config/clmna.conf| 14 ++ osaf/services/saf/clmsv/config/clmsv_objects.xml | 85 ++ +++ osaf/services/saf/clmsv/config/clmsv_plm_sc_template.xml | 17 + osaf/services/saf/clmsv/config/clmsv_sc_template.xml
[devel] [PATCH 0 of 1] Review Request for amfd : clear node level admin op related flags during sufailover [#663]
Summary: amfd : clear node level admin op related flags during sufailover [#663]. Review request for Trac Ticket(s): AMF #663 Peer Reviewer(s): LIST THE TECH REVIEWER(S) / MAINTAINER(S) HERE Pull request to: Hans F., Nagendra Affected branch(es):Default and 4.4 Development branch: IF ANY GIVE THE REPO URL Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each y above): - Refloating the patch after incorporating the comments. changeset a430348290f3d820a9d3b1fdcad14fd1440b3c1d Author: praveen.malv...@oracle.com Date: Thu, 09 Jan 2014 17:42:22 +0530 amfd : clear node level admin op related flags during sufailover [#663] When admin operation like lock and shutdown are performed on Node and this leads to sufailover, AMFD is not clearing the operation related flag in node pointer. This patch ensures if sufailover gets escalated when admin operation is performed on the node, then AMF will clear operation related flag. Also AMF will respond to IMM for the completion of operation. Complete diffstat: -- osaf/services/saf/amf/amfd/sgproc.cc | 31 +++ 1 files changed, 31 insertions(+), 0 deletions(-) Testing Commands: - Already mentioned in previous version of the patch Testing, Expected Results: -- Already mentioned in previous version of the patch Conditions of Submission: - Ack from Hans F. or Nagendra. Arch Built StartedLinux distro --- mipsn n mips64 n n x86 y y x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Opensaf-devel mailing
Re: [devel] [PATCH 0 of 4] Review Request for clm: Add CLMNA as a component and support node eviction after opensafd stop [#220]
-Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: Thursday, January 09, 2014 5:45 PM To: Mathivanan Naickan Palanivelu; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 4] Review Request for clm: Add CLMNA as a component and support node eviction after opensafd stop [#220] Then I am confused, CLMNA is now a daemon for what purpose? It is a 'middleware' component! And also for future enhancements. (And previously also was getting daemonized). -Mathi. /Hans -Original Message- From: Mathivanan Naickan Palanivelu [mailto:mathi.naic...@oracle.com] Sent: den 9 januari 2014 13:18 To: Hans Feldt; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [devel] [PATCH 0 of 4] Review Request for clm: Add CLMNA as a component and support node eviction after opensafd stop [#220] -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: Thursday, January 09, 2014 5:38 PM To: Mathivanan Naickan Palanivelu; Ramesh Babu Betham Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] [PATCH 0 of 4] Review Request for clm: Add CLMNA as a component and support node eviction after opensafd stop [#220] One question, should the new CLMNA component be terminated by amfnd or ignore the cleanup and get killed by the init script a bit like dtmd? [Mathi] CLMNA being a component must be killed by AMFND during 'opensafd stop' The CLM server shall mark a node as non-member upon receiving AMFND down Event on a node on which 'opensafd stop' is being done. If killed by amfnd can it introduce a failover race problem of some kind? [Mathi] No, FM shall act on AMFND down to perform failover and AMFND being the last guy to exit. Mathi. Thanks, Hans On 12/04/2013 02:00 PM, mathi.naic...@oracle.com wrote: Summary: Add CLMNA as a component, evict a node after 'opensafd stop' [#220] Review request for Trac Ticket(s): #220 Peer Reviewer(s): Ramesh Pull request to: LIST THE PERSON WITH PUSH ACCESS HERE Affected branch(es): default Development branch: IF ANY GIVE THE REPO URL Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesy Core libraries n Samples n Tests n Other n Comments (indicate scope for each y above): - This patch adds CLMNA as a component to the NoRedundancy middleware SU with recovery as node failfast! CLM Server shall treat CLMNA down as an additional evidence to indicate that the can now be marked as 'left the cluster'. On controllers, FM shall subscribe to AMFND down. During an OpenSAF stop, when AMFND exits (at last), then FM receives this DOWN and shall perform controller failover. For users who have set OPENSAF_MANAGE_TIPC=no, when they do a /etc/init.d/opensafd stop, then the node will leave the cluster even if TIPC is running. TODO: - The patch works, however some testing is inprogress (i.e. dynamic addition to informaiton model is pending). - condition to restrict this behaviour only when PLM is disabled to be added. - provide a cli tool like 'clm-list' that lists the status of nodes in the cluster. changeset 00db20213aed87af57cb62940e38334d262d96fd Author: Mathivanan N.P.mathi.naic...@oracle.com Date: Wed, 04 Dec 2013 18:19:37 +0530 clm: add to the information model and update config, scripts [#220] changeset 8f93e72faea5f0dbeae073fb5868245db38b23f4 Author: Mathivanan N.P.mathi.naic...@oracle.com Date: Wed, 04 Dec 2013 18:20:15 +0530 clm: process clmna down at the clm server[#220] changeset 84e4801c4bf5aa036c94a0c6844e9476fd0b32ba Author: Mathivanan N.P.mathi.naic...@oracle.com Date: Wed, 04 Dec 2013 18:20:40 +0530 clm: register with amf [#220] changeset 858af6ff6a26d9f6fef5df54375febb2b82d8e13 Author: Mathivanan N.P.mathi.naic...@oracle.com Date: Wed, 04 Dec 2013 18:25:12 +0530 fm: trigger failover upon receiving AMFND down [#220] Added Files: osaf/services/saf/clmsv/config/clmna.conf Complete diffstat: -- opensaf.spec.in | 1 + osaf/services/infrastructure/fm/fms/fm_mds.c | 33 ++--- osaf/services/saf/clmsv/clms/clms_evt.c | 3 +++
Re: [devel] [PATCH 00 of 15] Review Request for AMF: Coverity issues in amfd/amfnd [#680]
Hi There are some memory leaks fixed in the patch series that would be applicable to 4.4. I guess I should extract just the leak fixes for the next 4.4 candidate / correction release? Thanks Gary Quoting Anders Björnerstedt anders.bjornerst...@ericsson.com: Yes 4.4 window is closed since we have tagg'ed 4.4.FC. /AndersBj -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: den 9 januari 2014 13:06 To: Gary Lee; Hans Nordebäck; praveen.malv...@oracle.com; nagendr...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] [PATCH 00 of 15] Review Request for AMF: Coverity issues in amfd/amfnd [#680] Is this a patch series for 4.5 or default? The ticket is an enhancement, is the 4.4 window closed? Thanks, Hans On 01/03/2014 06:51 AM, Gary Lee wrote: Summary: AMF: Coverity issues in amfd/amfnd [#680] Review request for Trac Ticket(s): 680 Peer Reviewer(s): Hans N, Hans F, Praveen, Nagendra Pull request to: Affected branch(es): default Development branch: default Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each y above): - This patch set fixes a number of 'issues' in amfd and amfnd identified by Coverity. A number of 'issues' still remain and will be fixed in a later patch/ticket. Basic testing and valgrind have been run. changeset 0b48fcffb4ff4a739e56d33e5780e535cd232d3a Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Logically dead code (DEADCODE) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unused pointer value (UNUSED_VALUE) * Also remove check for NULL return value from avd_app_new() as this is no longer required changeset f4290079e205f56acd5597253927b384b64765da Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Logically dead code (DEADCODE) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unused pointer value (UNUSED_VALUE) changeset b19a672ceb57f418503aeced80da5a452943d16a Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Big parameter passed by value (PASS_BY_VALUE) * Remove saAmfStgValidSuTypes from AVD_AMF_SG_TYPE, as it appears to be a typo of saAmfSGtValidSuTypes changeset 80ff873eb3f034d2ca4ff30953bfb6bb925ac723 Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unchecked return value (CHECKED_RETURN) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unnecessary header file (HFA) * Big parameter passed by value (PASS_BY_VALUE) changeset 91e108b0576533083ea81dbf4ffebfe5f86e33f0 Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unchecked return value (CHECKED_RETURN) * Dereference null return value (NULL_RETURNS) * Logically dead code (DEADCODE) * Uninitialized scalar variable (UNINIT) * Unnecessary header file (HFA) * Big parameter passed by value (PASS_BY_VALUE) * Unused pointer value (UNUSED_VALUE) * Also, replace saAmfStgValidSuTypes with saAmfSGtValidSuTypes in sgtype.cc changeset df1a77bc86b2cdf885cf444280bbef6c9ad90174 Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Dereference after null check (FORWARD_NULL) * Dereference null return value (NULL_RETURNS) * Logically dead code (DEADCODE) * Big parameter passed by value (PASS_BY_VALUE) * Unchecked return value (CHECKED_RETURN) * Side effect in assertion (ASSERT_SIDE_EFFECT) * Unused pointer value (UNUSED_VALUE) changeset ea43d2c57d577714d67c849cd2cca74d3a5b9b06 Author:
Re: [devel] [PATCH 00 of 15] Review Request for AMF: Coverity issues in amfd/amfnd [#680]
Yes please do that asap, you do need defect tickets per service also... Thanks, Hans -Original Message- From: Gary Lee [mailto:gary@dektech.com.au] Sent: den 9 januari 2014 13:26 To: Anders Björnerstedt Cc: Hans Feldt; Hans Nordebäck; praveen.malv...@oracle.com; nagendr...@oracle.com; opensaf-devel@lists.sourceforge.net Subject: Re: [devel] [PATCH 00 of 15] Review Request for AMF: Coverity issues in amfd/amfnd [#680] Hi There are some memory leaks fixed in the patch series that would be applicable to 4.4. I guess I should extract just the leak fixes for the next 4.4 candidate / correction release? Thanks Gary Quoting Anders Björnerstedt anders.bjornerst...@ericsson.com: Yes 4.4 window is closed since we have tagg'ed 4.4.FC. /AndersBj -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: den 9 januari 2014 13:06 To: Gary Lee; Hans Nordebäck; praveen.malv...@oracle.com; nagendr...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] [PATCH 00 of 15] Review Request for AMF: Coverity issues in amfd/amfnd [#680] Is this a patch series for 4.5 or default? The ticket is an enhancement, is the 4.4 window closed? Thanks, Hans On 01/03/2014 06:51 AM, Gary Lee wrote: Summary: AMF: Coverity issues in amfd/amfnd [#680] Review request for Trac Ticket(s): 680 Peer Reviewer(s): Hans N, Hans F, Praveen, Nagendra Pull request to: Affected branch(es): default Development branch: default Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each y above): - This patch set fixes a number of 'issues' in amfd and amfnd identified by Coverity. A number of 'issues' still remain and will be fixed in a later patch/ticket. Basic testing and valgrind have been run. changeset 0b48fcffb4ff4a739e56d33e5780e535cd232d3a Author:Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Logically dead code (DEADCODE) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unused pointer value (UNUSED_VALUE) * Also remove check for NULL return value from avd_app_new() as this is no longer required changeset f4290079e205f56acd5597253927b384b64765da Author:Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Logically dead code (DEADCODE) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unused pointer value (UNUSED_VALUE) changeset b19a672ceb57f418503aeced80da5a452943d16a Author:Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Big parameter passed by value (PASS_BY_VALUE) * Remove saAmfStgValidSuTypes from AVD_AMF_SG_TYPE, as it appears to be a typo of saAmfSGtValidSuTypes changeset 80ff873eb3f034d2ca4ff30953bfb6bb925ac723 Author:Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unchecked return value (CHECKED_RETURN) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unnecessary header file (HFA) * Big parameter passed by value (PASS_BY_VALUE) changeset 91e108b0576533083ea81dbf4ffebfe5f86e33f0 Author:Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unchecked return value (CHECKED_RETURN) * Dereference null return value (NULL_RETURNS) * Logically dead code (DEADCODE) * Uninitialized scalar variable (UNINIT) * Unnecessary header file (HFA) * Big parameter passed by value (PASS_BY_VALUE) * Unused pointer value (UNUSED_VALUE) * Also, replace saAmfStgValidSuTypes with saAmfSGtValidSuTypes in sgtype.cc changeset df1a77bc86b2cdf885cf444280bbef6c9ad90174 Author:Gary Lee gary@dektech.com.au Date: Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues
Re: [devel] [PATCH 08 of 15] amfd: Correct a number of issues identified by Coverity [#680]
Hi! Replacing sprintf with snprintf is an improvement, but error handling is still missing in the code below. snprintf() returns -1 in case of an error, and it can return a value larger than or equal to SA_MAX_NAME_LENGTH in case the concatenation below would have overflowed the buffer. In these cases, SaNameT.length would get a value that is 0x or = SA_MAX_NAME_LENGTH, respectively. is_config_valid() is a validation function that seems to return 1 when successful and 0 otherwise. So if the above case happens, I would guess it is a good idea to return 0. regards, Anders Widell 2014-01-03 06:51, Gary Lee skrev: osaf/services/saf/amf/amfd/compcstype.cc | 5 +++-- osaf/services/saf/amf/amfd/hlt.cc| 2 +- osaf/services/saf/amf/amfd/imm.cc| 9 ++--- 3 files changed, 10 insertions(+), 6 deletions(-) * Calling risky function (SECURE_CODING) replace calls to sprintf with snprintf diff --git a/osaf/services/saf/amf/amfd/compcstype.cc b/osaf/services/saf/amf/amfd/compcstype.cc --- a/osaf/services/saf/amf/amfd/compcstype.cc +++ b/osaf/services/saf/amf/amfd/compcstype.cc @@ -204,7 +204,7 @@ p = strchr(p, ','); *p = '\0'; - ctcstype_name.length = sprintf((char*)ctcstype_name.value, %s,%s, cstype_name, comptype_name-value); + ctcstype_name.length = snprintf((char*)ctcstype_name.value, SA_MAX_NAME_LENGTH, %s,%s, cstype_name, comptype_name-value); if (avd_ctcstype_get(ctcstype_name) == NULL) { if (opdata == NULL) { @@ -256,7 +256,8 @@ p = strchr(cstype_name, ',') + 1; p = strchr(p, ','); *p = '\0'; - ctcstype_name.length = sprintf((char*)ctcstype_name.value, + ctcstype_name.length = snprintf((char*)ctcstype_name.value, + SA_MAX_NAME_LENGTH, %s,%s, cstype_name, comp-comp_type-name.value); ctcstype = avd_ctcstype_get(ctcstype_name); diff --git a/osaf/services/saf/amf/amfd/hlt.cc b/osaf/services/saf/amf/amfd/hlt.cc --- a/osaf/services/saf/amf/amfd/hlt.cc +++ b/osaf/services/saf/amf/amfd/hlt.cc @@ -115,7 +115,7 @@ comp_name = strstr((char *)opdata-objectName.value, safComp); osafassert(comp_name); - comp_dn.length = sprintf((char *)comp_dn.value, %s, comp_name); + comp_dn.length = snprintf((char *)comp_dn.value, SA_MAX_NAME_LENGTH, %s, comp_name); comp = avd_comp_get(comp_dn); osafassert(comp); diff --git a/osaf/services/saf/amf/amfd/imm.cc b/osaf/services/saf/amf/amfd/imm.cc --- a/osaf/services/saf/amf/amfd/imm.cc +++ b/osaf/services/saf/amf/amfd/imm.cc @@ -725,15 +725,18 @@ if (attrValue-attrValueType == SA_IMM_ATTR_SASTRINGT) { SaStringT rdnVal = *((SaStringT *)attrValue-attrValues[0]); if ((parent_name != NULL) (parent_name-length 0)) { - operation-objectName.length = sprintf((char *)operation-objectName.value, + operation-objectName.length = snprintf((char *)operation-objectName.value, + SA_MAX_NAME_LENGTH, %s,%s, rdnVal, parent_name-value); } else { - operation-objectName.length = sprintf((char *)operation-objectName.value, + operation-objectName.length = snprintf((char *)operation-objectName.value, + SA_MAX_NAME_LENGTH, %s, rdnVal); } } else { SaNameT *rdnVal = ((SaNameT *)attrValue-attrValues[0]); - operation-objectName.length = sprintf((char *)operation-objectName.value, + operation-objectName.length = snprintf((char *)operation-objectName.value, + SA_MAX_NAME_LENGTH, %s,%s, rdnVal-value, parent_name-value); } -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel -- CenturyLink Cloud: The Leader in
Re: [devel] [PATCH 00 of 15] Review Request for AMF: Coverity issues in amfd/amfnd [#680]
Yeah, fixes to memory leaks would be fixes of defects, not classed as enhancements. Youn need to write at least one separate defect ticket for the memory leak problems. Defects should be fixed on all supported branches, where applicable. In general, the point with discriminating between the two (enhancement and defect) and only allowing defects in FC (functionally complete) branches, is to significantly reduce the risk of new changesets introducing new problems. To maintain stability. /AndersBj -Original Message- From: Gary Lee [mailto:gary@dektech.com.au] Sent: den 9 januari 2014 13:26 To: Anders Björnerstedt Cc: Hans Feldt; Hans Nordebäck; praveen.malv...@oracle.com; nagendr...@oracle.com; opensaf-devel@lists.sourceforge.net Subject: Re: [devel] [PATCH 00 of 15] Review Request for AMF: Coverity issues in amfd/amfnd [#680] Hi There are some memory leaks fixed in the patch series that would be applicable to 4.4. I guess I should extract just the leak fixes for the next 4.4 candidate / correction release? Thanks Gary Quoting Anders Björnerstedt anders.bjornerst...@ericsson.com: Yes 4.4 window is closed since we have tagg'ed 4.4.FC. /AndersBj -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: den 9 januari 2014 13:06 To: Gary Lee; Hans Nordebäck; praveen.malv...@oracle.com; nagendr...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] [PATCH 00 of 15] Review Request for AMF: Coverity issues in amfd/amfnd [#680] Is this a patch series for 4.5 or default? The ticket is an enhancement, is the 4.4 window closed? Thanks, Hans On 01/03/2014 06:51 AM, Gary Lee wrote: Summary: AMF: Coverity issues in amfd/amfnd [#680] Review request for Trac Ticket(s): 680 Peer Reviewer(s): Hans N, Hans F, Praveen, Nagendra Pull request to: Affected branch(es): default Development branch: default Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each y above): - This patch set fixes a number of 'issues' in amfd and amfnd identified by Coverity. A number of 'issues' still remain and will be fixed in a later patch/ticket. Basic testing and valgrind have been run. changeset 0b48fcffb4ff4a739e56d33e5780e535cd232d3a Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Logically dead code (DEADCODE) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unused pointer value (UNUSED_VALUE) * Also remove check for NULL return value from avd_app_new() as this is no longer required changeset f4290079e205f56acd5597253927b384b64765da Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unnecessary header file (HFA) * Logically dead code (DEADCODE) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unused pointer value (UNUSED_VALUE) changeset b19a672ceb57f418503aeced80da5a452943d16a Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Big parameter passed by value (PASS_BY_VALUE) * Remove saAmfStgValidSuTypes from AVD_AMF_SG_TYPE, as it appears to be a typo of saAmfSGtValidSuTypes changeset 80ff873eb3f034d2ca4ff30953bfb6bb925ac723 Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unchecked return value (CHECKED_RETURN) * Dereference null return value (NULL_RETURNS) * Uninitialized scalar variable (UNINIT) * Unnecessary header file (HFA) * Big parameter passed by value (PASS_BY_VALUE) changeset 91e108b0576533083ea81dbf4ffebfe5f86e33f0 Author: Gary Lee gary@dektech.com.au Date:Fri, 03 Jan 2014 16:26:02 +1100 amfd: Correct a number of issues identified by Coverity [#680] * Unchecked return value (CHECKED_RETURN) * Dereference null return value (NULL_RETURNS) * Logically dead code (DEADCODE) * Uninitialized scalar variable (UNINIT) * Unnecessary header
Re: [devel] checkpoint problems
I went back to MAX_SYNC_TRANSFER_SIZE (30 * 1024 * 1024), and made the MDS changes you suggest. This works great guys, even up at 40k and 50k sections. Thanks! Alex On 01/09/2014 04:43 AM, A V Mahesh wrote: Hi Alex, Use the below patch as workaround for you to proceed your testing . This patch just increases the MDS internal fragmentation value to ~ TIPC_MAX_USER_MSG_SIZE define in tipc.h I will work with Hans to have final patch by considering the both TIPC TCP transports, and testing involved as a part of ticket `#654 MDS improvements` (https://sourceforge.net/p/opensaf/tickets/654/ ). I tested this patch with 10K sections checkpoint memory used was : 10136000 on TIPC transport. == diff --git a/osaf/libs/core/mds/include/mds_dt.h b/osaf/libs/core/mds/include/mds_dt.h --- a/osaf/libs/core/mds/include/mds_dt.h +++ b/osaf/libs/core/mds/include/mds_dt.h @@ -32,6 +32,7 @@ #include ncs_main_papi.h #include ncssysf_mem.h #include ncspatricia.h +#include linux/tipc.h /* This file is private to the MDTM layer. */ @@ -109,7 +110,7 @@ typedef struct mdtm_reassembly_queue { #define MDTM_MAX_DIRECT_BUFF_SIZE MDTM_MAX_SEGMENT_SIZE -#define MDTM_NORMAL_MSG_FRAG_SIZE 1400 +#define MDTM_NORMAL_MSG_FRAG_SIZE (TIPC_MAX_USER_MSG_SIZE-1000) /* TIPC_MAX_USER_MSG_SIZE = 66000 define linux/tipc.h */ #define MDTM_RECV_BUFFER_SIZE ((MDS_DIRECT_BUF_MAXSIZEMDTM_NORMAL_MSG_FRAG_SIZE)? \ (MDS_DIRECT_BUF_MAXSIZE+SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN):(MDTM_NORMAL_MSG_FRAG_SIZE+SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN)) == -AVM On 1/8/2014 10:42 PM, Alex Jones wrote: Hi Hans, Changing rmem_default and rmem_max has no effect on the problem. I even tried up to 2M to no avail. However, after looking at the cpnd_transfer_replica function in cpnd_evt.c, I found the following in cpsv_evt.h which controls how large the packets are which are sent through MDS: #define MAX_SYNC_TRANSFER_SIZE (30 * 1024 * 1024) 30M? What is the rationale for this number? This seems way too high. When I change it to (4*1024*1024) (4M) it solves my problem, and doesn't appear to affect performance. Alex On 01/08/2014 08:30 AM, Hans Feldt wrote: sysctl -a | grep rmem set rmem_default to 256K or so /Hans -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: den 8 januari 2014 14:01 To: A V Mahesh; Alex Jones Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] checkpoint problems The socket receive buffer size used is the system default. It can be too small, pump it up. I plan todo some change in MDS for this (and other stuff). /Hans -Original Message- From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: den 8 januari 2014 11:29 To: Alex Jones Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [devel] checkpoint problems Hi Alex, I suggest you increase and try the following TIPC values ( tipc code ) and rebuild `tipc.ko`: net/tipc/tipc_socket.c:#define OVERLOAD_LIMIT_BASE 5000 You can increase it to 5 and try again. - AVM. On 1/8/2014 4:16 AM, Alex Jones wrote: After doing some deep debugging I am seeing the following in the MDS log on node B. This is when the CPND_EVT_ND2ND_CKPT_ACTIVE_SYNC is sent from the active replica on node A to the replica on node B. The sync message never gets up to the CPND layer on node B because it is dropped. This is with 10k sections, each section 1k. Jan 7 21:32:32.772347 1789648919 ERR|MDTM: Frag recd is not next frag so dropping adest=0x010010023922604c Jan 7 21:32:32.772399 1789648919 ERR|MDTM: Message is dropped as msg is out of seq TRANSPOR-ID=0x010010023922604c I've turned on MDS debug on node B, and the packet being sent over is gigantic. It starts failing at fragment number 2703. The next fragment that comes in is 2707, then 2722. The last fragment that comes in is 7444. I've done a cursory look at the hardware stats, and nothing is being rate-limited or dropped. I'm going to take a deeper look at this, but I'm mentioning it in case it rings any bells. I am using TIPC as the transport. Alex On 01/07/2014 07:24 AM, Alex Jones wrote: AVM, I get SA_AIS_ERR_TIMEOUT even when I pass SA_TIME_END as the timeout value. Is this not a bug? the synchronous CheckpointOpen call doesn't work at all in this scenario. It never succeeds. I can reproduce the problem with sectionCreationAttributes.expirationTime set to SA_TIME_ONE_DAY. You should be able to reproduce the problem with the code I sent in the last e-mail. Alex On 01/06/2014 10:31 PM, A V Mahesh wrote: Hi Alex, CheckpointOpen call failing with SA_AIS_ERR_TIMEOUT NOT a bug , it is expected if you pass less time out value