Re: [devel] [PATCH 1 of 1] amfd: send node down notification during controller failover [#914]

2014-05-20 Thread Hans Feldt


> -Original Message-
> From: Nagendra Kumar [mailto:nagendr...@oracle.com]
> Sent: den 21 maj 2014 07:18
> To: Hans Feldt; Hans Nordebäck; Praveen Malviya
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1 of 1] amfd: send node down notification during 
> controller failover [#914]
> 
> > I can see other alternatives then to change the code like you proposed:
> > * skip setting disabled in avd_mds_avnd_down_evh when the active controller
> > disappeared
> You mean " remove the setting of saAmfNodeOperState to DISABLED " as your 
> previous comment?
[Hans] yes, but only when the active controller disappears
> 
> > * remove filtering in avd_node_oper_state_set (this filter is an indication 
> > that
> > something else should be fixed anyway)
> I didn't get that. You mean remove " if (node->saAmfNodeOperState == 
> oper_state) " ?
[Hans] yes
> This will lead to send many duplicate notifications as we set the states in 
> many flows.
> And in some flows, duplicate notification can be sent.
[Hans] can be sent perhaps, but we don't know for sure. Duplicates are better 
than no notification. As I said then that flow is incorrect anyway and should 
be fixed.

> 
> Thanks
> -Nagu
> 
> > -Original Message-
> > From: Hans Feldt [mailto:hans.fe...@ericsson.com]
> > Sent: 21 May 2014 10:35
> > To: Nagendra Kumar; Hans Nordebäck; Praveen Malviya
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: RE: [PATCH 1 of 1] amfd: send node down notification during 
> > controller
> > failover [#914]
> >
> > I can see other alternatives then to change the code like you proposed:
> > * skip setting disabled in avd_mds_avnd_down_evh when the active controller
> > disappeared
> > * remove filtering in avd_node_oper_state_set (this filter is an indication 
> > that
> > something else should be fixed anyway)
> >
> > Thanks,
> > Hans
> >
> > > -Original Message-
> > > From: Nagendra Kumar [mailto:nagendr...@oracle.com]
> > > Sent: den 20 maj 2014 08:30
> > > To: Hans Feldt; Hans Nordebäck; Praveen Malviya
> > > Cc: opensaf-devel@lists.sourceforge.net
> > > Subject: RE: [PATCH 1 of 1] amfd: send node down notification during
> > > controller failover [#914]
> > >
> > > >  Can't you just remove the setting of saAmfNodeOperState to DISABLED
> > > > from avd_mds_avnd_down_evh?
> > >
> > > It may break existing functionality as per comment :
> > >
> > > /* Remove dynamic info for node but keep in 
> > > nodeid tree.
> > >  * Possibly used at the end of controller 
> > > failover to
> > >  * to failover payload nodes.
> > >  */
> > >
> > > Node oper state is used for determining the assignment(SI dep as well) in 
> > > case
> > of payload failover followed by controller failover.
> > >
> > > Thanks
> > > -Nagu
> > >
> > > > -Original Message-
> > > > From: Hans Feldt [mailto:hans.fe...@ericsson.com]
> > > > Sent: 20 May 2014 01:20
> > > > To: Nagendra Kumar; Hans Nordebäck; Praveen Malviya
> > > > Cc: opensaf-devel@lists.sourceforge.net
> > > > Subject: RE: [PATCH 1 of 1] amfd: send node down notification during
> > > > controller failover [#914]
> > > >
> > > > Sorry but this is a rather ugly code change and I hope we can find a
> > > > better change.
> > > >  Can't you just remove the setting of saAmfNodeOperState to DISABLED
> > > > from avd_mds_avnd_down_evh?
> > > > Thanks,
> > > > Hans
> > > >
> > > > > -Original Message-
> > > > > From: nagendr...@oracle.com [mailto:nagendr...@oracle.com]
> > > > > Sent: den 19 maj 2014 12:36
> > > > > To: Hans Feldt; Hans Nordebäck; praveen.malv...@oracle.com
> > > > > Cc: opensaf-devel@lists.sourceforge.net
> > > > > Subject: [PATCH 1 of 1] amfd: send node down notification during
> > > > > controller failover [#914]
> > > > >
> > > > >  osaf/services/saf/amf/amfd/clm.cc |  2 +-
> > > > >  osaf/services/saf/amf/amfd/include/node.h |  2 +-
> > > > >  osaf/services/saf/amf/amfd/ndfsm.cc   |  4 ++--
> > > > >  osaf/services/saf/amf/amfd/node.cc|  8 +---
> > > > >  osaf/services/saf/amf/amfd/sgproc.cc  |  4 ++--
> > > > >  5 files changed, 11 insertions(+), 9 deletions(-)
> > > > >
> > > > >
> > > > > When Act controller is stopped, newly act controller is not
> > > > > sending node oper state disable notification.
> > > > > This is because of marking node oper state disable first in
> > > > > avd_mds_avnd_down_evh at standby controller and then calling
> > > > avd_node_oper_state_set() from avd_node_failover.
> > > > > In avd_node_oper_state_set, since oper state remain disable, the
> > > > > function only update rt attributes and donot send notification.
> > > > > Amf is maintaining context while calling avd_node_oper_state_set()
> > > > > whether it is being called during failover or in other context.
> > > > > If avd_node_oper_state_set is being called during failover, then
> > > > > notification is being sent for no

Re: [devel] [PATCH 1 of 1] amfd: send node down notification during controller failover [#914]

2014-05-20 Thread Nagendra Kumar
> I can see other alternatives then to change the code like you proposed:
> * skip setting disabled in avd_mds_avnd_down_evh when the active controller
> disappeared
You mean " remove the setting of saAmfNodeOperState to DISABLED " as your 
previous comment? 

> * remove filtering in avd_node_oper_state_set (this filter is an indication 
> that
> something else should be fixed anyway)
I didn't get that. You mean remove " if (node->saAmfNodeOperState == 
oper_state) " ?
This will lead to send many duplicate notifications as we set the states in 
many flows. 
And in some flows, duplicate notification can be sent.

Thanks
-Nagu

> -Original Message-
> From: Hans Feldt [mailto:hans.fe...@ericsson.com]
> Sent: 21 May 2014 10:35
> To: Nagendra Kumar; Hans Nordebäck; Praveen Malviya
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1 of 1] amfd: send node down notification during 
> controller
> failover [#914]
> 
> I can see other alternatives then to change the code like you proposed:
> * skip setting disabled in avd_mds_avnd_down_evh when the active controller
> disappeared
> * remove filtering in avd_node_oper_state_set (this filter is an indication 
> that
> something else should be fixed anyway)
> 
> Thanks,
> Hans
> 
> > -Original Message-
> > From: Nagendra Kumar [mailto:nagendr...@oracle.com]
> > Sent: den 20 maj 2014 08:30
> > To: Hans Feldt; Hans Nordebäck; Praveen Malviya
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: RE: [PATCH 1 of 1] amfd: send node down notification during
> > controller failover [#914]
> >
> > >  Can't you just remove the setting of saAmfNodeOperState to DISABLED
> > > from avd_mds_avnd_down_evh?
> >
> > It may break existing functionality as per comment :
> >
> > /* Remove dynamic info for node but keep in nodeid 
> > tree.
> >  * Possibly used at the end of controller failover 
> > to
> >  * to failover payload nodes.
> >  */
> >
> > Node oper state is used for determining the assignment(SI dep as well) in 
> > case
> of payload failover followed by controller failover.
> >
> > Thanks
> > -Nagu
> >
> > > -Original Message-
> > > From: Hans Feldt [mailto:hans.fe...@ericsson.com]
> > > Sent: 20 May 2014 01:20
> > > To: Nagendra Kumar; Hans Nordebäck; Praveen Malviya
> > > Cc: opensaf-devel@lists.sourceforge.net
> > > Subject: RE: [PATCH 1 of 1] amfd: send node down notification during
> > > controller failover [#914]
> > >
> > > Sorry but this is a rather ugly code change and I hope we can find a
> > > better change.
> > >  Can't you just remove the setting of saAmfNodeOperState to DISABLED
> > > from avd_mds_avnd_down_evh?
> > > Thanks,
> > > Hans
> > >
> > > > -Original Message-
> > > > From: nagendr...@oracle.com [mailto:nagendr...@oracle.com]
> > > > Sent: den 19 maj 2014 12:36
> > > > To: Hans Feldt; Hans Nordebäck; praveen.malv...@oracle.com
> > > > Cc: opensaf-devel@lists.sourceforge.net
> > > > Subject: [PATCH 1 of 1] amfd: send node down notification during
> > > > controller failover [#914]
> > > >
> > > >  osaf/services/saf/amf/amfd/clm.cc |  2 +-
> > > >  osaf/services/saf/amf/amfd/include/node.h |  2 +-
> > > >  osaf/services/saf/amf/amfd/ndfsm.cc   |  4 ++--
> > > >  osaf/services/saf/amf/amfd/node.cc|  8 +---
> > > >  osaf/services/saf/amf/amfd/sgproc.cc  |  4 ++--
> > > >  5 files changed, 11 insertions(+), 9 deletions(-)
> > > >
> > > >
> > > > When Act controller is stopped, newly act controller is not
> > > > sending node oper state disable notification.
> > > > This is because of marking node oper state disable first in
> > > > avd_mds_avnd_down_evh at standby controller and then calling
> > > avd_node_oper_state_set() from avd_node_failover.
> > > > In avd_node_oper_state_set, since oper state remain disable, the
> > > > function only update rt attributes and donot send notification.
> > > > Amf is maintaining context while calling avd_node_oper_state_set()
> > > > whether it is being called during failover or in other context.
> > > > If avd_node_oper_state_set is being called during failover, then
> > > > notification is being sent for node oper state down along with rt 
> > > > update.
> > > > Otherwise, the functionality of avd_node_oper_state_set is the same.
> > > >
> > > > diff --git a/osaf/services/saf/amf/amfd/clm.cc
> > > > b/osaf/services/saf/amf/amfd/clm.cc
> > > > --- a/osaf/services/saf/amf/amfd/clm.cc
> > > > +++ b/osaf/services/saf/amf/amfd/clm.cc
> > > > @@ -38,7 +38,7 @@ static void clm_node_join_complete(AVD_A
> > > > goto done;
> > > > }
> > > >
> > > > -   avd_node_oper_state_set(node, SA_AMF_OPERATIONAL_ENABLED);
> > > > +   avd_node_oper_state_set(node, SA_AMF_OPERATIONAL_ENABLED,
> > > false);
> > > > su = node->list_of_su;
> > > > while (su != NULL) {
> > > > /* For non-preinstantiable SU unlock

Re: [devel] [PATCH 1 of 1] amfd: send node down notification during controller failover [#914]

2014-05-20 Thread Hans Feldt
I can see other alternatives then to change the code like you proposed:
* skip setting disabled in avd_mds_avnd_down_evh when the active controller 
disappeared
* remove filtering in avd_node_oper_state_set (this filter is an indication 
that something else should be fixed anyway)

Thanks,
Hans

> -Original Message-
> From: Nagendra Kumar [mailto:nagendr...@oracle.com]
> Sent: den 20 maj 2014 08:30
> To: Hans Feldt; Hans Nordebäck; Praveen Malviya
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1 of 1] amfd: send node down notification during 
> controller failover [#914]
> 
> >  Can't you just remove the setting of saAmfNodeOperState to DISABLED from
> > avd_mds_avnd_down_evh?
> 
> It may break existing functionality as per comment :
> 
> /* Remove dynamic info for node but keep in nodeid 
> tree.
>  * Possibly used at the end of controller failover to
>  * to failover payload nodes.
>  */
> 
> Node oper state is used for determining the assignment(SI dep as well) in 
> case of payload failover followed by controller failover.
> 
> Thanks
> -Nagu
> 
> > -Original Message-
> > From: Hans Feldt [mailto:hans.fe...@ericsson.com]
> > Sent: 20 May 2014 01:20
> > To: Nagendra Kumar; Hans Nordebäck; Praveen Malviya
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: RE: [PATCH 1 of 1] amfd: send node down notification during 
> > controller
> > failover [#914]
> >
> > Sorry but this is a rather ugly code change and I hope we can find a better
> > change.
> >  Can't you just remove the setting of saAmfNodeOperState to DISABLED from
> > avd_mds_avnd_down_evh?
> > Thanks,
> > Hans
> >
> > > -Original Message-
> > > From: nagendr...@oracle.com [mailto:nagendr...@oracle.com]
> > > Sent: den 19 maj 2014 12:36
> > > To: Hans Feldt; Hans Nordebäck; praveen.malv...@oracle.com
> > > Cc: opensaf-devel@lists.sourceforge.net
> > > Subject: [PATCH 1 of 1] amfd: send node down notification during
> > > controller failover [#914]
> > >
> > >  osaf/services/saf/amf/amfd/clm.cc |  2 +-
> > >  osaf/services/saf/amf/amfd/include/node.h |  2 +-
> > >  osaf/services/saf/amf/amfd/ndfsm.cc   |  4 ++--
> > >  osaf/services/saf/amf/amfd/node.cc|  8 +---
> > >  osaf/services/saf/amf/amfd/sgproc.cc  |  4 ++--
> > >  5 files changed, 11 insertions(+), 9 deletions(-)
> > >
> > >
> > > When Act controller is stopped, newly act controller is not sending
> > > node oper state disable notification.
> > > This is because of marking node oper state disable first in
> > > avd_mds_avnd_down_evh at standby controller and then calling
> > avd_node_oper_state_set() from avd_node_failover.
> > > In avd_node_oper_state_set, since oper state remain disable, the
> > > function only update rt attributes and donot send notification.
> > > Amf is maintaining context while calling avd_node_oper_state_set()
> > > whether it is being called during failover or in other context.
> > > If avd_node_oper_state_set is being called during failover, then
> > > notification is being sent for node oper state down along with rt update.
> > > Otherwise, the functionality of avd_node_oper_state_set is the same.
> > >
> > > diff --git a/osaf/services/saf/amf/amfd/clm.cc
> > > b/osaf/services/saf/amf/amfd/clm.cc
> > > --- a/osaf/services/saf/amf/amfd/clm.cc
> > > +++ b/osaf/services/saf/amf/amfd/clm.cc
> > > @@ -38,7 +38,7 @@ static void clm_node_join_complete(AVD_A
> > >   goto done;
> > >   }
> > >
> > > - avd_node_oper_state_set(node, SA_AMF_OPERATIONAL_ENABLED);
> > > + avd_node_oper_state_set(node, SA_AMF_OPERATIONAL_ENABLED,
> > false);
> > >   su = node->list_of_su;
> > >   while (su != NULL) {
> > >   /* For non-preinstantiable SU unlock-inst will not lead to its 
> > > inst
> > > until unlock. */ diff --git
> > > a/osaf/services/saf/amf/amfd/include/node.h
> > > b/osaf/services/saf/amf/amfd/include/node.h
> > > --- a/osaf/services/saf/amf/amfd/include/node.h
> > > +++ b/osaf/services/saf/amf/amfd/include/node.h
> > > @@ -195,7 +195,7 @@ extern AVD_AVND *avd_node_find_nodeid(Sa
> > extern
> > > AVD_AVND *avd_node_getnext_nodeid(SaClmNodeIdT node_id);  extern
> > > SaAisErrorT avd_node_config_get(void);  extern void
> > > avd_node_state_set(AVD_AVND *node, AVD_AVND_STATE node_state); -
> > extern
> > > void avd_node_oper_state_set(AVD_AVND *node, SaAmfOperationalStateT
> > > oper_state);
> > > +extern void avd_node_oper_state_set(AVD_AVND *node,
> > > +SaAmfOperationalStateT oper_state, bool fail_over);
> > >  extern void node_admin_state_set(AVD_AVND *node, SaAmfAdminStateT
> > > admin_state);  extern void avd_node_constructor(void);  extern void
> > > avd_node_add_su(AVD_SU *su); diff --git
> > > a/osaf/services/saf/amf/amfd/ndfsm.cc
> > > b/osaf/services/saf/amf/amfd/ndfsm.cc
> > > --- a/osaf/services/saf/amf/amfd/ndfsm.cc
> > > +++ b/osaf/services/saf/amf/amfd/ndfsm.cc
> > > @@ -203,7 

Re: [devel] [PATCH 0 of 1] Review Request for cpnd: increase performance when creating large numbers of sections [#770]

2014-05-20 Thread A V Mahesh
HI Alex,

I appreciate your contribution in CPSV , thanks for your patch , I will 
take some time reviewing/testing
the complete patch (database a C++ STL map for fast access ) ,for now my 
initial comment is :

On 5/21/2014 2:57 AM, Alex Jones wrote:
> make MAX_SYNC_TRANSFER_SIZE much smaller: 3M instead of   30M.

Considering the below  fragmentation changes (changesets 5279- 5282) , I 
hope it is not required to change form 30MB to 3MB
so can you please re-test  ckpt API functions return with 
SA_AIS_ERR_TIMEOUT case and let me know the result.

==
changeset:   5282:0a995a368ec3
user:Hans Feldt 
date:Mon May 19 21:00:03 2014 +0200
summary: mds: adjust some constants [#654]

changeset:   5281:ad842ed4d464
summary: mds: use TIPC segmentation/reassembly [#654]

changeset:   5280:4d0da0f9e7cf
summary: mds: support variable size fragmentation limit [#654]

changeset:   5279:2a60ec043b3d
summary: mds: use TIPC defined max msg size for rec buffer [#654]
==

-AVM


On 5/21/2014 2:57 AM, Alex Jones wrote:
> Summary: cpnd: increase performance when creating large numbers of sections 
> [#770]
> Review request for Trac Ticket(s): 770
> Peer Reviewer(s): AVM
> Pull request to: AVM
> Affected branch(es): default
> Development branch:
>
> 
> Impacted area   Impact y/n
> 
>   Docsn
>   Build systemn
>   RPM/packaging   n
>   Configuration files n
>   Startup scripts n
>   SAF servicesy
>   OpenSAF servicesn
>   Core libraries  n
>   Samples n
>   Tests   n
>   Other   n
>
>
> Comments (indicate scope for each "y" above):
> -
> Please ignore the previous "fat-finger" submission.
>
> changeset 6f9c8080657af783462c52d81137e41038aee6a6
> Author:   Alex Jones 
> Date: Tue, 20 May 2014 17:03:25 -0400
>
>   cpnd: increase performance when creating large numbers of sections 
> [#770]
>
>   Jan 7 21:32:32.772347 <1789648919> ERR |MDTM: Frag recd is not next frag
>   so dropping adest=<0x010010023922604c> Jan 7 21:32:32.772399 
> <1789648919>
>   ERR |MDTM: Message is dropped as msg is out of seq TRANSPOR-
>   ID=<0x010010023922604c> With large numbers of sections (>5k) on the 
> standby
>   the CPU is pegged and all ckpt API functions return with 
> SA_AIS_ERR_TIMEOUT,
>   including ActiveReplicaSet, and CheckpointClose!
>
>   The section id database is implemented as a linked list. Each write to a
>   section must traverse the list in order to find the section. With 
> 1000's of
>   sections this takes a looong time. Also, sync data being sent over is 
> too
>   large for one packet (30M). This causes the transport layer (in this 
> case
>   TIPC), to drop packets. Lastly, the SectionCreate message is not
>   asynchronous when ACTIVE_REPLICA is specified.
>
>   Solution is in 3 parts: (1) make the section id database a C++ STL map 
> for
>   fast access. (2) make MAX_SYNC_TRANSFER_SIZE much smaller: 3M instead of
>   30M. (3) SectionCreate message should be asynchronous when 
> ACTIVE_REPLICA is
>   specified.
>
>
> Added Files:
> 
>   osaf/services/saf/cpsv/cpnd/cpnd_sec.cc
>
>
> Complete diffstat:
> --
>   osaf/libs/common/cpsv/include/cpnd_cb.h   |6 +-
>   osaf/libs/common/cpsv/include/cpnd_init.h |   14 +-
>   osaf/libs/common/cpsv/include/cpsv_evt.h  |2 +-
>   osaf/services/saf/cpsv/cpnd/Makefile.am   |3 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_db.c |  246 +
>   osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  115 ---
>   osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |   26 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_res.c|   15 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  424 
> ++
>   9 files changed, 531 insertions(+), 320 deletions(-)
>
>
> Testing Commands:
> -
> (1) Create a 6-node setup
> (2) Open 5 checkpoints (40k sections for each checkpoint, each section 1k) for
>  writing, 1 on each active node
> (3) On the standby node, open these 5 for reading
> (4) Continuously write the active checkpoints as fast as possible.
>
>
> Testing, Expected Results:
> --
> The standby opening all 5 checkpoints should be able to keep up with all the
> writes (200k sections).  No dropped packets, no timeouts.
>
>
> Conditions of Submission:
> -
>
>
>
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  y  y
> powerpc n 

Re: [devel] [PATCH 1 of 1] v4 amfnd: Not all npi-components are restarted during SuRestart escalation [#885]

2014-05-20 Thread praveen malviya
I will push this after incorporating the comments.

Thanks,
Praveen
On 16-May-14 1:02 PM, Hans Feldt wrote:
> Sorry for being picky but see inline ...
>
> Anyway ack from me, but this could be fixed (by the maintainer) before push.
>
> Thanks,
> Hans
>
>> -Original Message-
>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>> Sent: den 15 maj 2014 15:28
>> To: nagendr...@oracle.com; Hans Nordebäck; Hans Feldt; 
>> praveen.malv...@oracle.com
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: [PATCH 1 of 1] v4 amfnd: Not all npi-components are restarted 
>> during SuRestart escalation [#885]
> [Hans] again the problem is repeated in the short commit message. This one 
> should just say in forward terms what the patch is doing with the code base 
> like:
> "amfnd: fix SU restart escalation [#885]"
>
>>   osaf/services/saf/amf/amfnd/susm.cc |  16 +---
>>   1 files changed, 13 insertions(+), 3 deletions(-)
>>
>>
> [Hans] problem/symptom should be here (which now is the short commit message)
>
> This is the analysis:
>> In case of npi su restart recovery, this problem appears because the 
>> condition
>> of changing su presence state to INSTANTIATED is not sufficient.
>>
>> If the component of the last csi in csi_list has been restarted first due
>> to componentRestart (before suRestart), amfnd can not find any next csi
>> and changes the su presence state to INSTANTIATED, this will miss the
>> restart for the rest of components.
>>
>> The fix uses UNASSIGNED csi to avoid duplication of restarting the same 
>> component
>> many times during su restart, only component linked to UNASSIGNED csi is 
>> restarted.
>> Therefore, the condition of changing su presence state to INSTANTIATED 
>> should be
>> all csis are ASSIGNED
>>
>> diff --git a/osaf/services/saf/amf/amfnd/susm.cc 
>> b/osaf/services/saf/amf/amfnd/susm.cc
>> --- a/osaf/services/saf/amf/amfnd/susm.cc
>> +++ b/osaf/services/saf/amf/amfnd/susm.cc
>> @@ -2638,12 +2638,22 @@ uint32_t avnd_su_pres_restart_compinst_h
>>
>>  /* get the next csi */
>>  curr_csi = (AVND_COMP_CSI_REC 
>> *)m_NCS_DBLIST_FIND_NEXT(&curr_csi->si_dll_node);
>> -if (curr_csi) {
>> +/* To be taken into restart, the next found csi must be 
>> UNASSIGNED (avoid
>> + * duplication of restarting component). The component linked 
>> to csi must
>> + * not be in RESTARTING (avoid the component just coming in 
>> component restart
>> + * recovery), and not be in INSTANTIATING (avoid the component 
>> in progress
>> + * of instantiation). But for now the check for INSTANTIATING 
>> is not included
>> + * because avnd_su_pres_insting_surestart_hdler has not been 
>> implemented.
> [Hans] cryptic comment that sounds like a TODO?
>
>> + */
>> +if (curr_csi != NULL &&
>> +
>> m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_UNASSIGNED(curr_csi) == true &&
>> +curr_csi->comp->pres != SA_AMF_PRESENCE_RESTARTING) {
> [Hans] some parenthesis around the expressions would be nice and is safer
>
>>  /* we have another csi. trigger the comp fsm with 
>> RestartEv */
>> -rc = avnd_comp_clc_fsm_trigger(cb, curr_csi->comp, 
>> AVND_COMP_CLC_PRES_FSM_EV_RESTART);
>> +rc = avnd_comp_clc_fsm_trigger(cb, curr_csi->comp,
>> + 
>> AVND_COMP_CLC_PRES_FSM_EV_RESTART);
>>  if (NCSCC_RC_SUCCESS != rc)
>>  goto done;
>> -} else {
>> +} else if (all_csis_in_assigned_state(su)) {
> [Hans] == true
>
>>  /* => si assignment done */
>>  avnd_su_pres_state_set(su, 
>> SA_AMF_PRESENCE_INSTANTIATED);
>>  }


--
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 3 of 3] dtm: make TCP_NODELAY set socket option configurable [#654]

2014-05-20 Thread A V Mahesh
Hi,

On 5/20/2014 4:21 PM, Hans Feldt wrote:
> But if you have strong requirement for this fine. But you must describe why 
> anyone ever wanted to enable Nagle
Their no strong requirement , for now I will withdraw this
if a practical use-case requirements comes I will re-consider it.

-AVM

On 5/20/2014 4:21 PM, Hans Feldt wrote:
>
>> -Original Message-
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sent: den 20 maj 2014 09:02
>> To: Hans Feldt
>> Cc: de...@list.opensaf.org; opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 3 of 3] dtm: make TCP_NODELAY set socket option 
>> configurable [#654]
>>
>> Hi Hans,
>>
>> On 5/20/2014 1:03 AM, Hans Feldt wrote:
>>> Not sure I understand the use case for setting NODELAY to false. In the 
>>> code you have a comment about setting it to true which
>> anyway is the default. So what is the benefit of setting it to false?
>> [AVM]I  came across  one requirement where ,user don't want much
>> frequent traffic in network , and doesn't bother about performance .
> [Hans] Nagle is at core about tradeoffs between network utilization and 
> latency. In my opinion the MW should have as predictable latencies as 
> possible. Can we take responsibility for MW characteristics with 
> NODELAY=False?
>
>>> Since dtm is a system wide messaging service it will affect all services 
>>> for example AMFs healtchecks can be delayed sitting in a
>> queue somewhere...
>>[AVM]  Opnesaf  was working  NODELAY to false previously , we have
>> introduced NODELAY to true recently , so their wont be any problem.
> [Hans] I doubt it was properly tested in production
>
> But if you have strong requirement for this fine. But you must describe why 
> anyone ever wanted to enable Nagle
>
> Thanks,
> Hans
>
>> -AVM
>>
>>> Thanks,
>>> Hans
>>>
 -Original Message-
 From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com]
 Sent: den 19 maj 2014 06:41
 To: de...@list.opensaf.org; Hans Feldt
 Cc: opensaf-devel@lists.sourceforge.net
 Subject: [PATCH 3 of 3] dtm: make TCP_NODELAY set socket option 
 configurable [#654]

osaf/services/infrastructure/dtms/config/dtmd.conf   |  11 
osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c |  36 
 
osaf/services/infrastructure/dtms/dtm/dtm_read_config.c  |  15 ++
osaf/services/infrastructure/dtms/include/dtm_cb.h   |   1 +
4 files changed, 45 insertions(+), 18 deletions(-)


 Made TCP_NODELAY configurable to specifi whether MDS TCP
 transport should follow the Nagle algorithm for deciding when to send 
 data.By
 default, TCP will follow the Nagle algorithm is disable this behavior , 
 can be
 configured to TRUE(1) or FALSE(0)

 Note :This path is on top of Hans Feldt published   on 4-29-2014 10:58 AM.

 diff --git a/osaf/services/infrastructure/dtms/config/dtmd.conf 
 b/osaf/services/infrastructure/dtms/config/dtmd.conf
 --- a/osaf/services/infrastructure/dtms/config/dtmd.conf
 +++ b/osaf/services/infrastructure/dtms/config/dtmd.conf
 @@ -85,3 +85,14 @@ DTM_TCP_KEEPALIVE_PROBES=2
#export MDS_SOCK_SND_RCV_BUF_SIZE=16777216
DTM_SOCK_SND_RCV_BUF_SIZE=126976

 +#
 +#Specifies whether MDS TCP transport should follow the Nagle algorithm for
 +#deciding when to send data.By default, TCP will follow the Nagle 
 algorithm is disable
 +#this behavior,MDS TCP transport can enable TCP_NODELAY to force
 +#TCP to always send data immediately.For example,TCP_NODELAY should be 
 used
 +#when there is an application using TCP for a request/response.
 +#This option is only supported for sockets with an address family
 +#of AF_INET or AF_INET6 (internode )and type of SOCK_STREAM
 +# TRUE(1) or FALSE(0)
 +DTM_TCP_NODELAY_FLAG=1
 +
 diff --git a/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
 b/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
 --- a/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
 +++ b/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
 @@ -577,12 +577,12 @@ int comm_socket_setup_new(DTM_INTERNODE_
goto done;
}

 -int flag = 1;
 -if (setsockopt(sock_desc, IPPROTO_TCP, TCP_NODELAY, (void 
 *)&flag, sizeof(flag)) != 0) {
 -LOG_ER("DTM:Socket TCP_NODELAY set failed err :%s", 
 strerror(errno));
 -dtm_comm_socket_close(&sock_desc);
 -goto done;
 -}
 +  int flag = dtms_cb->tcp_nodelay_flag;
 +  if (setsockopt(sock_desc, IPPROTO_TCP, TCP_NODELAY, (void *)&flag, 
 sizeof(flag)) != 0) {
 +  LOG_ER("DTM:Socket TCP_NODELAY set failed err :%s", 
 strerror(errno));
 +  dtm_comm_socket_close(&sock_desc);
 +  goto done;
 +  }

  

[devel] [PATCH 1 of 1] cpnd: increase performance when creating large numbers of sections [#770]

2014-05-20 Thread Alex Jones
 osaf/libs/common/cpsv/include/cpnd_cb.h   |6 +-
 osaf/libs/common/cpsv/include/cpnd_init.h |   14 +-
 osaf/libs/common/cpsv/include/cpsv_evt.h  |2 +-
 osaf/services/saf/cpsv/cpnd/Makefile.am   |3 +-
 osaf/services/saf/cpsv/cpnd/cpnd_db.c |  246 +
 osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  115 ---
 osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |   26 +-
 osaf/services/saf/cpsv/cpnd/cpnd_res.c|   15 +-
 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  424 ++
 9 files changed, 531 insertions(+), 320 deletions(-)


Jan  7 21:32:32.772347 <1789648919> ERR|MDTM: Frag recd is not next frag so 
dropping adest=<0x010010023922604c>
Jan  7 21:32:32.772399 <1789648919> ERR|MDTM: Message is dropped as msg is 
out of seq TRANSPOR-ID=<0x010010023922604c>
With large numbers of sections (>5k) on the standby the CPU is pegged and all
ckpt API functions return with SA_AIS_ERR_TIMEOUT, including ActiveReplicaSet,
and CheckpointClose!

The section id database is implemented as a linked list.  Each write to a
section must traverse the list in order to find the section.  With 1000's of
sections this takes a looong time.  Also, sync data being sent over is too large
for one packet (30M).  This causes the transport layer (in this case TIPC), to
drop packets.  Lastly, the SectionCreate message is not asynchronous when
ACTIVE_REPLICA is specified.

Solution is in 3 parts:  (1) make the section id database a C++ STL map for fast
access.  (2) make MAX_SYNC_TRANSFER_SIZE much smaller: 3M instead of 30M.
(3) SectionCreate message should be asynchronous when ACTIVE_REPLICA is
specified.

diff --git a/osaf/libs/common/cpsv/include/cpnd_cb.h 
b/osaf/libs/common/cpsv/include/cpnd_cb.h
--- a/osaf/libs/common/cpsv/include/cpnd_cb.h
+++ b/osaf/libs/common/cpsv/include/cpnd_cb.h
@@ -23,7 +23,7 @@
 #include "ncs_queue.h"
 
 /* global variables */
-uint32_t gl_cpnd_cb_hdl;
+extern uint32_t gl_cpnd_cb_hdl;
 
 /* macros for the CB handle */
 #define m_CPND_TAKE_CPND_CB  ncshm_take_hdl(NCS_SERVICE_ID_CPND, 
gl_cpnd_cb_hdl)
@@ -131,7 +131,6 @@ typedef struct cpnd_ckpt_section_info {
SaSizeT sec_size;
SaTimeT exp_tmr;
SaTimeT lastUpdate;
-   struct cpnd_ckpt_section_info *prev, *next;
 } CPND_CKPT_SECTION_INFO;
 
 #define CPND_CKPT_SECTION_INFO_NULL ((CPND_CKPT_SECTION_INFO *)0)
@@ -144,7 +143,8 @@ typedef struct cpnd_ckpt_replica_info_ta
SaUint32T mem_used; /* Used for status */
NCS_OS_POSIX_SHM_REQ_INFO open; /* for shm open */
uint32_t *shm_sec_mapping;  /* for validity of sec */
-   CPND_CKPT_SECTION_INFO *section_info;   /* Sections in the shared 
memory */
+   void *section_db;   /* used for C++ STL map */
+   void *local_section_db; /* used for C++ STL map */
 } CPND_CKPT_REPLICA_INFO;
 
 /*Structure to store info for ALL_REPL_WRITE EVT processing*/
diff --git a/osaf/libs/common/cpsv/include/cpnd_init.h 
b/osaf/libs/common/cpsv/include/cpnd_init.h
--- a/osaf/libs/common/cpsv/include/cpnd_init.h
+++ b/osaf/libs/common/cpsv/include/cpnd_init.h
@@ -157,15 +157,21 @@ void cpnd_evt_node_getnext(CPND_CB *cb, 
 uint32_t cpnd_evt_node_add(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE *evt_node);
 uint32_t cpnd_evt_node_del(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE *evt_node);
 CPND_CKPT_NODE *cpnd_ckpt_node_find_by_name(CPND_CB *cpnd_cb, SaNameT 
ckpt_name);
-CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get(CPND_CKPT_NODE *cp_node, 
SaCkptSectionIdT *id);
-CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get_create(CPND_CKPT_NODE *cp_node, 
SaCkptSectionIdT *id);
-uint32_t cpnd_ckpt_sec_find(CPND_CKPT_NODE *cp_node, SaCkptSectionIdT *id);
+void cpnd_ckpt_sec_map_init(CPND_CKPT_REPLICA_INFO *replica_info);
+void cpnd_ckpt_sec_map_destroy(CPND_CKPT_REPLICA_INFO *replica_info);
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get_first(const CPND_CKPT_REPLICA_INFO 
*replicaInfo);
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get_next(const CPND_CKPT_REPLICA_INFO 
*replicaInfo, const CPND_CKPT_SECTION_INFO *section);
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get(const CPND_CKPT_NODE *cp_node, const 
SaCkptSectionIdT *id);
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get_create(const CPND_CKPT_NODE 
*cp_node, const SaCkptSectionIdT *id);
+uint32_t cpnd_ckpt_sec_find(const CPND_CKPT_NODE *cp_node, const 
SaCkptSectionIdT *id);
 CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_del(CPND_CKPT_NODE *cp_node, 
SaCkptSectionIdT *id);
 CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CKPT_NODE *cp_node, 
SaCkptSectionIdT *id, SaTimeT exp_time,
  uint32_t gen_flag);
+uint32_t cpnd_ckpt_sec_add_db(CPND_CKPT_REPLICA_INFO *replicaInfo, 
CPND_CKPT_SECTION_INFO *sectionInfo);
 void cpnd_ckpt_delete_all_sect(CPND_CKPT_NODE *cp_node);
+bool cpnd_ckpt_sec_empty(const CPND_CKPT_REPLICA_INFO *replicaInfo);
 void cpnd_evt_backup_queue_add(CPND_CKPT_NODE *cp_node, CPND_EVT *evt);
-CPND_CKPT_SECTION_INFO *cpnd_get_sect_with_id(CPND_CKPT_NODE *cp_node, 

[devel] [PATCH 0 of 1] Review Request for cpnd: increase performance when creating large numbers of sections [#770]

2014-05-20 Thread Alex Jones
Summary: cpnd: increase performance when creating large numbers of sections 
[#770]
Review request for Trac Ticket(s): 770
Peer Reviewer(s): AVM
Pull request to: AVM
Affected branch(es): default
Development branch: 


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
Please ignore the previous "fat-finger" submission.

changeset 6f9c8080657af783462c52d81137e41038aee6a6
Author: Alex Jones 
Date:   Tue, 20 May 2014 17:03:25 -0400

cpnd: increase performance when creating large numbers of sections 
[#770]

Jan 7 21:32:32.772347 <1789648919> ERR |MDTM: Frag recd is not next frag
so dropping adest=<0x010010023922604c> Jan 7 21:32:32.772399 
<1789648919>
ERR |MDTM: Message is dropped as msg is out of seq TRANSPOR-
ID=<0x010010023922604c> With large numbers of sections (>5k) on the 
standby
the CPU is pegged and all ckpt API functions return with 
SA_AIS_ERR_TIMEOUT,
including ActiveReplicaSet, and CheckpointClose!

The section id database is implemented as a linked list. Each write to a
section must traverse the list in order to find the section. With 
1000's of
sections this takes a looong time. Also, sync data being sent over is 
too
large for one packet (30M). This causes the transport layer (in this 
case
TIPC), to drop packets. Lastly, the SectionCreate message is not
asynchronous when ACTIVE_REPLICA is specified.

Solution is in 3 parts: (1) make the section id database a C++ STL map 
for
fast access. (2) make MAX_SYNC_TRANSFER_SIZE much smaller: 3M instead of
30M. (3) SectionCreate message should be asynchronous when 
ACTIVE_REPLICA is
specified.


Added Files:

 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc


Complete diffstat:
--
 osaf/libs/common/cpsv/include/cpnd_cb.h   |6 +-
 osaf/libs/common/cpsv/include/cpnd_init.h |   14 +-
 osaf/libs/common/cpsv/include/cpsv_evt.h  |2 +-
 osaf/services/saf/cpsv/cpnd/Makefile.am   |3 +-
 osaf/services/saf/cpsv/cpnd/cpnd_db.c |  246 +
 osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  115 ---
 osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |   26 +-
 osaf/services/saf/cpsv/cpnd/cpnd_res.c|   15 +-
 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  424 ++
 9 files changed, 531 insertions(+), 320 deletions(-)


Testing Commands:
-
(1) Create a 6-node setup
(2) Open 5 checkpoints (40k sections for each checkpoint, each section 1k) for
writing, 1 on each active node
(3) On the standby node, open these 5 for reading
(4) Continuously write the active checkpoints as fast as possible.


Testing, Expected Results:
--
The standby opening all 5 checkpoints should be able to keep up with all the
writes (200k sections).  No dropped packets, no timeouts.


Conditions of Submission:
-



Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneo

[devel] [PATCH 1 of 1] cpnd: increase performance when creating large numbers of sections [#770]

2014-05-20 Thread Alex Jones
 osaf/libs/common/cpsv/include/cpnd_cb.h   |6 +-
 osaf/libs/common/cpsv/include/cpnd_init.h |   14 +-
 osaf/libs/common/cpsv/include/cpsv_evt.h  |2 +-
 osaf/services/saf/cpsv/cpnd/Makefile.am   |3 +-
 osaf/services/saf/cpsv/cpnd/cpnd_db.c |  246 +
 osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  115 ---
 osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |   26 +-
 osaf/services/saf/cpsv/cpnd/cpnd_res.c|   15 +-
 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  424 ++
 9 files changed, 531 insertions(+), 320 deletions(-)


Jan  7 21:32:32.772347 <1789648919> ERR|MDTM: Frag recd is not next frag so 
dropping adest=<0x010010023922604c>
Jan  7 21:32:32.772399 <1789648919> ERR|MDTM: Message is dropped as msg is 
out of seq TRANSPOR-ID=<0x010010023922604c>
With large numbers of sections (>5k) on the standby the CPU is pegged and all
ckpt API functions return with SA_AIS_ERR_TIMEOUT, including ActiveReplicaSet,
and CheckpointClose!

The section id database is implemented as a linked list.  Each write to a
section must traverse the list in order to find the section.  With 1000's of
sections this takes a looong time.  Also, sync data being sent over is too large
for one packet (30M).  This causes the transport layer (in this case TIPC), to
drop packets.  Lastly, the SectionCreate message is not asynchronous when
ACTIVE_REPLICA is specified.

Solution is in 3 parts:  (1) make the section id database a C++ STL map for fast
access.  (2) make MAX_SYNC_TRANSFER_SIZE much smaller: 3M instead of 30M.
(3) SectionCreate message should be asynchronous when ACTIVE_REPLICA is
specified.

diff --git a/osaf/libs/common/cpsv/include/cpnd_cb.h 
b/osaf/libs/common/cpsv/include/cpnd_cb.h
--- a/osaf/libs/common/cpsv/include/cpnd_cb.h
+++ b/osaf/libs/common/cpsv/include/cpnd_cb.h
@@ -23,7 +23,7 @@
 #include "ncs_queue.h"
 
 /* global variables */
-uint32_t gl_cpnd_cb_hdl;
+extern uint32_t gl_cpnd_cb_hdl;
 
 /* macros for the CB handle */
 #define m_CPND_TAKE_CPND_CB  ncshm_take_hdl(NCS_SERVICE_ID_CPND, 
gl_cpnd_cb_hdl)
@@ -131,7 +131,6 @@ typedef struct cpnd_ckpt_section_info {
SaSizeT sec_size;
SaTimeT exp_tmr;
SaTimeT lastUpdate;
-   struct cpnd_ckpt_section_info *prev, *next;
 } CPND_CKPT_SECTION_INFO;
 
 #define CPND_CKPT_SECTION_INFO_NULL ((CPND_CKPT_SECTION_INFO *)0)
@@ -144,7 +143,8 @@ typedef struct cpnd_ckpt_replica_info_ta
SaUint32T mem_used; /* Used for status */
NCS_OS_POSIX_SHM_REQ_INFO open; /* for shm open */
uint32_t *shm_sec_mapping;  /* for validity of sec */
-   CPND_CKPT_SECTION_INFO *section_info;   /* Sections in the shared 
memory */
+   void *section_db;   /* used for C++ STL map */
+   void *local_section_db; /* used for C++ STL map */
 } CPND_CKPT_REPLICA_INFO;
 
 /*Structure to store info for ALL_REPL_WRITE EVT processing*/
diff --git a/osaf/libs/common/cpsv/include/cpnd_init.h 
b/osaf/libs/common/cpsv/include/cpnd_init.h
--- a/osaf/libs/common/cpsv/include/cpnd_init.h
+++ b/osaf/libs/common/cpsv/include/cpnd_init.h
@@ -157,15 +157,21 @@ void cpnd_evt_node_getnext(CPND_CB *cb, 
 uint32_t cpnd_evt_node_add(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE *evt_node);
 uint32_t cpnd_evt_node_del(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE *evt_node);
 CPND_CKPT_NODE *cpnd_ckpt_node_find_by_name(CPND_CB *cpnd_cb, SaNameT 
ckpt_name);
-CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get(CPND_CKPT_NODE *cp_node, 
SaCkptSectionIdT *id);
-CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get_create(CPND_CKPT_NODE *cp_node, 
SaCkptSectionIdT *id);
-uint32_t cpnd_ckpt_sec_find(CPND_CKPT_NODE *cp_node, SaCkptSectionIdT *id);
+void cpnd_ckpt_sec_map_init(CPND_CKPT_REPLICA_INFO *replica_info);
+void cpnd_ckpt_sec_map_destroy(CPND_CKPT_REPLICA_INFO *replica_info);
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get_first(const CPND_CKPT_REPLICA_INFO 
*replicaInfo);
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get_next(const CPND_CKPT_REPLICA_INFO 
*replicaInfo, const CPND_CKPT_SECTION_INFO *section);
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get(const CPND_CKPT_NODE *cp_node, const 
SaCkptSectionIdT *id);
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_get_create(const CPND_CKPT_NODE 
*cp_node, const SaCkptSectionIdT *id);
+uint32_t cpnd_ckpt_sec_find(const CPND_CKPT_NODE *cp_node, const 
SaCkptSectionIdT *id);
 CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_del(CPND_CKPT_NODE *cp_node, 
SaCkptSectionIdT *id);
 CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CKPT_NODE *cp_node, 
SaCkptSectionIdT *id, SaTimeT exp_time,
  uint32_t gen_flag);
+uint32_t cpnd_ckpt_sec_add_db(CPND_CKPT_REPLICA_INFO *replicaInfo, 
CPND_CKPT_SECTION_INFO *sectionInfo);
 void cpnd_ckpt_delete_all_sect(CPND_CKPT_NODE *cp_node);
+bool cpnd_ckpt_sec_empty(const CPND_CKPT_REPLICA_INFO *replicaInfo);
 void cpnd_evt_backup_queue_add(CPND_CKPT_NODE *cp_node, CPND_EVT *evt);
-CPND_CKPT_SECTION_INFO *cpnd_get_sect_with_id(CPND_CKPT_NODE *cp_node, 

[devel] [PATCH 0 of 1] Review Request for test

2014-05-20 Thread Alex Jones
Summary: 
Review request for Trac Ticket(s): <>
Peer Reviewer(s): <>
Pull request to: <>
Affected branch(es): <>
Development branch: <>


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
 <>

changeset 6f9c8080657af783462c52d81137e41038aee6a6
Author: Alex Jones 
Date:   Tue, 20 May 2014 17:03:25 -0400

cpnd: increase performance when creating large numbers of sections 
[#770]

Jan 7 21:32:32.772347 <1789648919> ERR |MDTM: Frag recd is not next frag
so dropping adest=<0x010010023922604c> Jan 7 21:32:32.772399 
<1789648919>
ERR |MDTM: Message is dropped as msg is out of seq TRANSPOR-
ID=<0x010010023922604c> With large numbers of sections (>5k) on the 
standby
the CPU is pegged and all ckpt API functions return with 
SA_AIS_ERR_TIMEOUT,
including ActiveReplicaSet, and CheckpointClose!

The section id database is implemented as a linked list. Each write to a
section must traverse the list in order to find the section. With 
1000's of
sections this takes a looong time. Also, sync data being sent over is 
too
large for one packet (30M). This causes the transport layer (in this 
case
TIPC), to drop packets. Lastly, the SectionCreate message is not
asynchronous when ACTIVE_REPLICA is specified.

Solution is in 3 parts: (1) make the section id database a C++ STL map 
for
fast access. (2) make MAX_SYNC_TRANSFER_SIZE much smaller: 3M instead of
30M. (3) SectionCreate message should be asynchronous when 
ACTIVE_REPLICA is
specified.


Added Files:

 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc


Complete diffstat:
--
 osaf/libs/common/cpsv/include/cpnd_cb.h   |6 +-
 osaf/libs/common/cpsv/include/cpnd_init.h |   14 +-
 osaf/libs/common/cpsv/include/cpsv_evt.h  |2 +-
 osaf/services/saf/cpsv/cpnd/Makefile.am   |3 +-
 osaf/services/saf/cpsv/cpnd/cpnd_db.c |  246 +
 osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  115 ---
 osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |   26 +-
 osaf/services/saf/cpsv/cpnd/cpnd_res.c|   15 +-
 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  424 ++
 9 files changed, 531 insertions(+), 320 deletions(-)


Testing Commands:
-
 <>


Testing, Expected Results:
--
 <>


Conditions of Submission:
-
 <>


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
co

[devel] [PATCH 1 of 1] imm: fix loading the first value of a double type attribute in immload [#53]

2014-05-20 Thread Zoran Milinkovic
 osaf/services/saf/immsv/immloadd/imm_pbe_load.cc |  80 +--
 1 files changed, 57 insertions(+), 23 deletions(-)


When immload loads data from PBE, in the first loaded value of a double type 
attribute, the floating-point precision is not fully supported.
The patch provides the fix for correct loading the first value of a double type 
attribute.

diff --git a/osaf/services/saf/immsv/immloadd/imm_pbe_load.cc 
b/osaf/services/saf/immsv/immloadd/imm_pbe_load.cc
--- a/osaf/services/saf/immsv/immloadd/imm_pbe_load.cc
+++ b/osaf/services/saf/immsv/immloadd/imm_pbe_load.cc
@@ -436,9 +436,7 @@ bool loadObjectFromPbe(void* pbeHandle, 
sqlite3* dbHandle = (sqlite3 *) pbeHandle;
sqlite3_stmt *stmt = NULL;
int rc=0;
-   char **resultF=NULL;
char *zErr=NULL;
-   int nrows=0;
int ncols=0;
int c;
std::string sqlF("select \"");
@@ -486,33 +484,44 @@ bool loadObjectFromPbe(void* pbeHandle, 
 
TRACE("GENERATED F:%s", sqlF.c_str());
 
-   rc = sqlite3_get_table(dbHandle, sqlF.c_str(), &resultF, &nrows,
-   &ncols, &zErr);
-   if(rc) {
+   rc = sqlite3_prepare_v2(dbHandle, sqlF.c_str(), -1, &stmt, NULL);
+   if(rc != SQLITE_OK) {
+   LOG_IN("Failed to prepare SQL statement");
+   goto bailout;
+   }
+
+   rc = sqlite3_step(stmt);
+   if(rc != SQLITE_ROW && rc != SQLITE_DONE) {
LOG_IN("Could not access table '%s', error:%s",
class_info->className.c_str(), zErr);
sqlite3_free(zErr);
goto bailout;
}
-   if(nrows != 1) {
-   LOG_ER("Expected 1 row got %u rows", nrows);
+
+   if(rc == SQLITE_DONE) {
+   LOG_ER("Expected 1 row got 0 rows");
goto bailout;
}
+
+   ncols = sqlite3_column_count(stmt);
TRACE_2("Successfully accessed '%s' table. cols:%u",
class_info->className.c_str(), ncols);
 
+   const unsigned char *res;
for(c=0; c attrValueBuffers;
+   res = sqlite3_column_text(stmt, c);
+   if(res) {
SaImmValueTypeT attrType = (SaImmValueTypeT) 0;
-   size_t len = strlen(resultF[ncols+c]);
-   char * str = (char *) malloc(len+1);
-   strncpy(str, (const char *) resultF[ncols+c], len);
-   str[len] = '\0';
-   attrValueBuffers.push_front(str);
+   const char *colname = sqlite3_column_name(stmt, c);
+   assert(colname != NULL);
+
+   if((strcmp(colname, "SaImmAttrImplementerName") == 0) &&
+   (class_info->class_category == 
SA_IMM_CLASS_CONFIG))
+   continue;
+
it = class_info->attrInfoVector.begin();
while(it != class_info->attrInfoVector.end()) {
-   if((*it)->attrName == std::string(resultF[c]))
+   if((*it)->attrName == std::string(colname))
{
attrType = (*it)->attrValueType;
break;
@@ -520,17 +529,42 @@ bool loadObjectFromPbe(void* pbeHandle, 
++it;
}
assert(it != class_info->attrInfoVector.end());
-   if((strcmp(resultF[c], "SaImmAttrImplementerName") == 
0) && 
-   (class_info->class_category == 
SA_IMM_CLASS_CONFIG))
-   continue;
-   
-   addObjectAttributeDefinition((char *) 
-   class_info->className.c_str(), 
-   resultF[c], &attrValueBuffers, 
+
+   char *val;
+   if(attrType == SA_IMM_ATTR_SADOUBLET) {
+   double dbl = sqlite3_column_double(stmt, c);
+
+   val = (char *)malloc(30);
+   int size = snprintf(val, 30, "%.17g", dbl);
+   size++;
+   if(size > 30) {
+   val = (char *)realloc(val, size);
+   snprintf(val, size, "%.17g", dbl);
+   }
+   } else {
+   val = strdup((const char *)res);
+   }
+
+   std::list attrValueBuffers;
+   attrValueBuffers.push_front(val);
+
+   addObjectAttributeDefinition((char *)
+   class_info->className.c_str(),
+   (SaImmAttrNameT)colname, &attrValueBuffer

[devel] [PATCH 0 of 1] Review Request for imm: fix loading the first value of a double type attribute in immload [#53]

2014-05-20 Thread Zoran Milinkovic
Summary: imm: fix loading the first value of a double type attribute in immload 
[#53]
Review request for Trac Ticket(s): 53
Peer Reviewer(s): Neelakanta
Pull request to: Zoran
Affected branch(es): default(4.5)
Development branch: default(4.5)


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 49409a21fed3a54a1690d59ac9fd7db421ad83b8
Author: Zoran Milinkovic 
Date:   Tue, 20 May 2014 17:33:02 +0200

imm: fix loading the first value of a double type attribute in immload 
[#53]

When immload loads data from PBE, in the first loaded value of a double 
type
attribute, the floating-point precision is not fully supported. The 
patch
provides the fix for correct loading the first value of a double type
attribute.


Complete diffstat:
--
 osaf/services/saf/immsv/immloadd/imm_pbe_load.cc |  80 
+---
 1 files changed, 57 insertions(+), 23 deletions(-)


Testing Commands:
-


Testing, Expected Results:
--
Test multivalue and non-mulitvalue double type attributes with enough long 
mantissa before and after cluster reboot.
Double values of attributes after the cluster reboot must be the same as double 
values before the cluster reboot.
PBE must be enabled.


Conditions of Submission:
-
Ack from Neelakanta


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.source

Re: [devel] [PATCH 0 of 1] Review Request for immpbe: unlink any /tmp/imm.db.xxxx file and journal when exiting on error [#869]

2014-05-20 Thread Anders Björnerstedt
Ok, hanks for the review and comment.
I guess it is the journal file in this case that is left behind.
If it is a trivial fix then I will add it before pushing. 

/AndersBj
 

-Original Message-
From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com] 
Sent: den 20 maj 2014 16:29
To: Anders Björnerstedt; Zoran Milinkovic
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 0 of 1] Review Request for immpbe: unlink any 
/tmp/imm.db. file and journal when exiting on error [#869]

Hi AndersBj,

Reviewed and tested the patch.
Ack with following comments:

  immdump -p /tmp/dump.db

In the immdump case, the temporary file created because of immdump is not 
unlinked.

This is because "IMMSV_PBE_TMP_DIR" is not exported for immdump.

If the defect scope is not immdump, then a new ticket has to be created.

/Neel.

On Friday 09 May 2014 06:09 PM, Anders Bjornerstedt wrote:
> Summary: immpbe: unlink any /tmp/imm.db. file and journal when 
> exiting on error [#869] Review request for Trac Ticket(s): 869 Peer 
> Reviewer(s): Neel; Zoran Pull request to:
> Affected branch(es): 4.3; 4.4; default(4.5) Development branch:
>
> 
> Impacted area   Impact y/n
> 
>   Docsn
>   Build systemn
>   RPM/packaging   n
>   Configuration files n
>   Startup scripts n
>   SAF servicesn
>   OpenSAF servicesn
>   Core libraries  n
>   Samples n
>   Tests   n
>   Other   n
>
>
> Comments (indicate scope for each "y" above):
> -
>
> changeset c71ace4de3807588089ad78ec00f5a9db1a9bca3
> Author:   Anders Bjornerstedt 
> Date: Fri, 09 May 2014 14:18:13 +0200
>
>   immpbe: unlink any /tmp/imm.db. file and journal when exiting on 
> error
>   [#869]
>
>   Symptoms of the problem are dropped imm.db.xx files, mkstemp(3), and
>   imm.db.xx-journal files. The journal files are actually consistently
>   dropped as empty files even on successfull generation of imm.db.
>
>   The cause is simply an omission to unlink the temp file when exiting on
>   error and the fact that that sqlite never unlinks the journal file when
>   journaling mode is TRUNCATE. The journaling mode was altered for PBE a 
> few
>   years ago.
>
>   Solution is to catch all cases of controlled exit of PBE during the
>   generation of the temporary imm.db file and unlink these files before
>   exiting. This ticket does not take care of cleaning up files dropped by 
> an
>   uncontrolled crash of PBE. To solve that problem safely and reliably we 
> need
>   to create all temporary imm.db files in a sub directory to the 
> configured
>   IMMSV_PBE_TMP_DIR in immnd.conf. That is a change of file representation
>   that is more risky, needs more righorous sytem testing and will thus be
>   added as an enhancement.
>
>
> Complete diffstat:
> --
>   osaf/libs/common/immsv/immpbe_dump.cc|  66 
> ++
>   osaf/libs/common/immsv/include/immpbe_dump.hh|  10 +-
>   osaf/services/saf/immsv/immpbed/immpbe.cc|  32 
> +++-
>   osaf/services/saf/immsv/immpbed/immpbe_daemon.cc |   8 ++--
>   osaf/tools/safimm/immdump/imm_dumper.cc  |  19 +++
>   5 files changed, 91 insertions(+), 44 deletions(-)
>
>
> Testing Commands:
> -
> Actually testing dropped tmp files requires fault injection to the PBE.
> The patch also fixes the removval of the droped empty journal file.
> This is easy to check.
>
>
> Testing, Expected Results:
> --
> Without this patch, the empty imm.db.XX-journal file is always 
> droped in the /tmp direcotry (or the directory configured for tmp 
> files). Alaso on any case of controlledexit of PBE during generation 
> of imm.db.XX temprary file will drop that file in the sam tmp directory.
>
> With this patch, no imm.db files should be dropped in the tmp 
> directory except possibly after uncontrolled crash of PBE. 
> Implementation of cleanup of tmp files dropped after crash  will be tracked 
> by a separate enhancement ticket.
> This is a defect ticket.
>
>
> Conditions of Submission:
> -
> Ack from Neel and Zoran.
>
>
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  n  n
> powerpc n  n
> powerpc64   n  n
>
>
> Reviewer Checklist:
> ---
> [Submitters: make sure that your review doesn't trigger any 
> checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomp

Re: [devel] [PATCH 0 of 1] Review Request for immpbe: unlink any /tmp/imm.db.xxxx file and journal when exiting on error [#869]

2014-05-20 Thread Neelakanta Reddy
Hi AndersBj,

Reviewed and tested the patch.
Ack with following comments:

  immdump -p /tmp/dump.db

In the immdump case, the temporary file created because of immdump is 
not unlinked.

This is because "IMMSV_PBE_TMP_DIR" is not exported for immdump.

If the defect scope is not immdump, then a new ticket has to be created.

/Neel.

On Friday 09 May 2014 06:09 PM, Anders Bjornerstedt wrote:
> Summary: immpbe: unlink any /tmp/imm.db. file and journal when exiting on 
> error [#869]
> Review request for Trac Ticket(s): 869
> Peer Reviewer(s): Neel; Zoran
> Pull request to:
> Affected branch(es): 4.3; 4.4; default(4.5)
> Development branch:
>
> 
> Impacted area   Impact y/n
> 
>   Docsn
>   Build systemn
>   RPM/packaging   n
>   Configuration files n
>   Startup scripts n
>   SAF servicesn
>   OpenSAF servicesn
>   Core libraries  n
>   Samples n
>   Tests   n
>   Other   n
>
>
> Comments (indicate scope for each "y" above):
> -
>
> changeset c71ace4de3807588089ad78ec00f5a9db1a9bca3
> Author:   Anders Bjornerstedt 
> Date: Fri, 09 May 2014 14:18:13 +0200
>
>   immpbe: unlink any /tmp/imm.db. file and journal when exiting on 
> error
>   [#869]
>
>   Symptoms of the problem are dropped imm.db.xx files, mkstemp(3), and
>   imm.db.xx-journal files. The journal files are actually consistently
>   dropped as empty files even on successfull generation of imm.db.
>
>   The cause is simply an omission to unlink the temp file when exiting on
>   error and the fact that that sqlite never unlinks the journal file when
>   journaling mode is TRUNCATE. The journaling mode was altered for PBE a 
> few
>   years ago.
>
>   Solution is to catch all cases of controlled exit of PBE during the
>   generation of the temporary imm.db file and unlink these files before
>   exiting. This ticket does not take care of cleaning up files dropped by 
> an
>   uncontrolled crash of PBE. To solve that problem safely and reliably we 
> need
>   to create all temporary imm.db files in a sub directory to the 
> configured
>   IMMSV_PBE_TMP_DIR in immnd.conf. That is a change of file representation
>   that is more risky, needs more righorous sytem testing and will thus be
>   added as an enhancement.
>
>
> Complete diffstat:
> --
>   osaf/libs/common/immsv/immpbe_dump.cc|  66 
> ++
>   osaf/libs/common/immsv/include/immpbe_dump.hh|  10 +-
>   osaf/services/saf/immsv/immpbed/immpbe.cc|  32 
> +++-
>   osaf/services/saf/immsv/immpbed/immpbe_daemon.cc |   8 ++--
>   osaf/tools/safimm/immdump/imm_dumper.cc  |  19 +++
>   5 files changed, 91 insertions(+), 44 deletions(-)
>
>
> Testing Commands:
> -
> Actually testing dropped tmp files requires fault injection to the PBE.
> The patch also fixes the removval of the droped empty journal file.
> This is easy to check.
>
>
> Testing, Expected Results:
> --
> Without this patch, the empty imm.db.XX-journal file is always droped in 
> the
> /tmp direcotry (or the directory configured for tmp files). Alaso on any case
> of controlledexit of PBE during generation of imm.db.XX temprary file will
> drop that file in the sam tmp directory.
>
> With this patch, no imm.db files should be dropped in the tmp directory except
> possibly after uncontrolled crash of PBE. Implementation of cleanup of tmp 
> files
> dropped after crash  will be tracked by a separate enhancement ticket.
> This is a defect ticket.
>
>
> Conditions of Submission:
> -
> Ack from Neel and Zoran.
>
>
> Arch  Built StartedLinux distro
> ---
> mipsn  n
> mips64  n  n
> x86 n  n
> x86_64  n  n
> powerpc n  n
> powerpc64   n  n
>
>
> Reviewer Checklist:
> ---
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>  that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments

Re: [devel] [PATCH 3 of 3] dtm: make TCP_NODELAY set socket option configurable [#654]

2014-05-20 Thread Hans Feldt


> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: den 20 maj 2014 09:02
> To: Hans Feldt
> Cc: de...@list.opensaf.org; opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 3 of 3] dtm: make TCP_NODELAY set socket option 
> configurable [#654]
> 
> Hi Hans,
> 
> On 5/20/2014 1:03 AM, Hans Feldt wrote:
> > Not sure I understand the use case for setting NODELAY to false. In the 
> > code you have a comment about setting it to true which
> anyway is the default. So what is the benefit of setting it to false?
> [AVM]I  came across  one requirement where ,user don't want much
> frequent traffic in network , and doesn't bother about performance .
[Hans] Nagle is at core about tradeoffs between network utilization and 
latency. In my opinion the MW should have as predictable latencies as possible. 
Can we take responsibility for MW characteristics with NODELAY=False?

> > Since dtm is a system wide messaging service it will affect all services 
> > for example AMFs healtchecks can be delayed sitting in a
> queue somewhere...
>   [AVM]  Opnesaf  was working  NODELAY to false previously , we have
> introduced NODELAY to true recently , so their wont be any problem.
[Hans] I doubt it was properly tested in production

But if you have strong requirement for this fine. But you must describe why 
anyone ever wanted to enable Nagle

Thanks,
Hans

> 
> -AVM
> 
> >
> > Thanks,
> > Hans
> >
> >> -Original Message-
> >> From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com]
> >> Sent: den 19 maj 2014 06:41
> >> To: de...@list.opensaf.org; Hans Feldt
> >> Cc: opensaf-devel@lists.sourceforge.net
> >> Subject: [PATCH 3 of 3] dtm: make TCP_NODELAY set socket option 
> >> configurable [#654]
> >>
> >>   osaf/services/infrastructure/dtms/config/dtmd.conf   |  11 
> >>   osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c |  36 
> >> 
> >>   osaf/services/infrastructure/dtms/dtm/dtm_read_config.c  |  15 ++
> >>   osaf/services/infrastructure/dtms/include/dtm_cb.h   |   1 +
> >>   4 files changed, 45 insertions(+), 18 deletions(-)
> >>
> >>
> >> Made TCP_NODELAY configurable to specifi whether MDS TCP
> >> transport should follow the Nagle algorithm for deciding when to send 
> >> data.By
> >> default, TCP will follow the Nagle algorithm is disable this behavior , 
> >> can be
> >> configured to TRUE(1) or FALSE(0)
> >>
> >> Note :This path is on top of Hans Feldt published   on 4-29-2014 10:58 AM.
> >>
> >> diff --git a/osaf/services/infrastructure/dtms/config/dtmd.conf 
> >> b/osaf/services/infrastructure/dtms/config/dtmd.conf
> >> --- a/osaf/services/infrastructure/dtms/config/dtmd.conf
> >> +++ b/osaf/services/infrastructure/dtms/config/dtmd.conf
> >> @@ -85,3 +85,14 @@ DTM_TCP_KEEPALIVE_PROBES=2
> >>   #export MDS_SOCK_SND_RCV_BUF_SIZE=16777216
> >>   DTM_SOCK_SND_RCV_BUF_SIZE=126976
> >>
> >> +#
> >> +#Specifies whether MDS TCP transport should follow the Nagle algorithm for
> >> +#deciding when to send data.By default, TCP will follow the Nagle 
> >> algorithm is disable
> >> +#this behavior,MDS TCP transport can enable TCP_NODELAY to force
> >> +#TCP to always send data immediately.For example,TCP_NODELAY should be 
> >> used
> >> +#when there is an application using TCP for a request/response.
> >> +#This option is only supported for sockets with an address family
> >> +#of AF_INET or AF_INET6 (internode )and type of SOCK_STREAM
> >> +# TRUE(1) or FALSE(0)
> >> +DTM_TCP_NODELAY_FLAG=1
> >> +
> >> diff --git a/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
> >> b/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
> >> --- a/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
> >> +++ b/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
> >> @@ -577,12 +577,12 @@ int comm_socket_setup_new(DTM_INTERNODE_
> >>goto done;
> >>}
> >>
> >> -int flag = 1;
> >> -if (setsockopt(sock_desc, IPPROTO_TCP, TCP_NODELAY, (void 
> >> *)&flag, sizeof(flag)) != 0) {
> >> -LOG_ER("DTM:Socket TCP_NODELAY set failed err :%s", 
> >> strerror(errno));
> >> -dtm_comm_socket_close(&sock_desc);
> >> -goto done;
> >> -}
> >> +  int flag = dtms_cb->tcp_nodelay_flag;
> >> +  if (setsockopt(sock_desc, IPPROTO_TCP, TCP_NODELAY, (void *)&flag, 
> >> sizeof(flag)) != 0) {
> >> +  LOG_ER("DTM:Socket TCP_NODELAY set failed err :%s", 
> >> strerror(errno));
> >> +  dtm_comm_socket_close(&sock_desc);
> >> +  goto done;
> >> +  }
> >>
> >>if (NCSCC_RC_SUCCESS != set_keepalive(dtms_cb, sock_desc)) {
> >>LOG_ER("DTM :set_keepalive failed ");
> >> @@ -727,12 +727,12 @@ uint32_t dtm_stream_nonblocking_listener
> >>LOG_ER("DTM:Socket snd buf size set failed err :%s", 
> >> strerror(errno));
> >>}
> >>
> >> -int flag = 1;
> >> -if (setsockopt(dtms_cb->stream_sock, IPPROTO_TCP, TCP_NODE

[devel] [PATCH 0 of 1] Review Request for imm: Add upgrade flag for OpenSAF4.5 in opensafImmNostdFlags [#842]

2014-05-20 Thread Anders Bjornerstedt
Summary: imm: Add upgrade flag for OpenSAF4.5 in opensafImmNostdFlags [#842]
Review request for Trac Ticket(s): 842
Peer Reviewer(s): Neel; Zoran
Pull request to: 
Affected branch(es): default(4.5)
Development branch: 


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset e8d651611b7260f371e50f3aa19e717fe23c415b
Author: Anders Bjornerstedt 
Date:   Tue, 20 May 2014 11:07:54 +0200

imm: Add upgrade flag for OpenSAF4.5 in opensafImmNostdFlags [#842]

New message types and new server states have been added to the immsv in
OpenSAF 4.5. During upgrades from older OpenSAF releases to the OpenSAF 
4.5
release, usage of the new message types and new states would cause 
problems
for the mixed version cluster.

The OpensAF IMM uses the opensafImmNostdFlags in the imm service object:
'opensafImm=opensafImm,safApp=safImmService' for managing such 
transient and
special functionality. This changeset allocates bit 5 of the bitvector 
for
this purpose. See osaf/services/saf/immsv/README for details.

This patch only adds support for the flag and auto enablement of the flag at
cluster restart. Actual usage of the new flag (bit 5) will be added as a part
for each 4.5 ticket that needs upgade protection.


Complete diffstat:
--
 osaf/libs/common/immsv/include/immsv_api.h |   1 +
 osaf/services/saf/immsv/README |  46 
--
 osaf/services/saf/immsv/immnd/ImmModel.cc  |  31 
+++
 osaf/services/saf/immsv/immnd/ImmModel.hh  |   1 +
 osaf/services/saf/immsv/immnd/immnd_init.h |   1 +
 5 files changed, 74 insertions(+), 6 deletions(-)


Testing Commands:
-
After cluster start check the value of the opensafImmNostdFlags RTA in
the object: 'opensafImm=opensafImm,safApp=safImmService'.

immlist opensafImm=opensafImm,safApp=safImmService


Testing, Expected Results:
--

After cluster start the opensafImmNostdFlags of the object: 
'opensafImm=opensafImm,safApp=safImmService' should have bits 2, 3 and 5 
turned on. That would be the bits controlling:

Bit 2 controls OpenSAF4.1 protocols allowed or not.
Bit 3 controls OpenSAF4.3 protocols allowed or not.
Bit 5 controls OpenSAF4.5 protocols allowed or not.

This ticket only tracks the addition of support for the 4.5 protocol bit.
Actually testing the usage/sensitivity to the bit will be part of each
relevant 4.5 enhancement ticket. 
I will re-open relevant tickets for this accordingly.


Conditions of Submission:
-
Ack from Neel.


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for 

[devel] [PATCH 1 of 1] imm: Add upgrade flag for OpenSAF4.5 in opensafImmNostdFlags [#842]

2014-05-20 Thread Anders Bjornerstedt
 osaf/libs/common/immsv/include/immsv_api.h |   1 +
 osaf/services/saf/immsv/README |  46 ++---
 osaf/services/saf/immsv/immnd/ImmModel.cc  |  31 
 osaf/services/saf/immsv/immnd/ImmModel.hh  |   1 +
 osaf/services/saf/immsv/immnd/immnd_init.h |   1 +
 5 files changed, 74 insertions(+), 6 deletions(-)


New message types and new server states have been added to the immsv
in OpenSAF 4.5. During upgrades from older OpenSAF releases to the
OpenSAF 4.5 release, usage of the new message types and new states would
cause problems for the mixed version cluster.

The OpensAF IMM uses the opensafImmNostdFlags in the imm service object:
'opensafImm=opensafImm,safApp=safImmService' for managing such transient
and special functionality. This changeset allocates bit 5 of the bitvector
for this purpose. See osaf/services/saf/immsv/README for details.

diff --git a/osaf/libs/common/immsv/include/immsv_api.h 
b/osaf/libs/common/immsv/include/immsv_api.h
--- a/osaf/libs/common/immsv/include/immsv_api.h
+++ b/osaf/libs/common/immsv/include/immsv_api.h
@@ -119,6 +119,7 @@ extern "C" {
 #define OPENSAF_IMM_FLAG_PRT41_ALLOW 0x0002
 #define OPENSAF_IMM_FLAG_PRT43_ALLOW 0x0004
 #define OPENSAF_IMM_FLAG_2PBE1_ALLOW 0x0008
+#define OPENSAF_IMM_FLAG_PRT45_ALLOW 0x0010
 
 
 #define OPENSAF_IMM_SERVICE_NAME "safImmService"
diff --git a/osaf/services/saf/immsv/README b/osaf/services/saf/immsv/README
--- a/osaf/services/saf/immsv/README
+++ b/osaf/services/saf/immsv/README
@@ -1625,12 +1625,7 @@ has the meaning of 'flags-ON'. Operation
 This flag needs to be turned ON when the upgrade to OpenSAF 4.3 has been
 successfully completed. Thus in the final steps of the upgrade. A cluster
 start/restart of an OpenSAF4.3 system will always automatically turn on
-this flag. In summary:
-
-Bit 1 controls schema (imm class) changes allowed or not.
-Bit 2 controls OpenSAF4.1 protocols allowed or not.
-Bit 3 controls OpenSAF4.3 protocols allowed or not.
-Bit 4 controls 2PBE oneSafe2PBE (see 2PBE feature in OpenSAF4.4 below).
+this flag. 
 
 2PBE Allow IMM PBE to be configured without shared file system (4.4)
 ===
@@ -1870,6 +1865,45 @@ Currently the default OI callback timeou
 recommended to set an OI callback timeout to a value higher than 10 seconds, 
since
 that is the default om/oi-client side timeout for synchronous downcalls. 
 
+
+Notes on upgrading from OpenSAF 4.[1,2,3,4] to OpenSAF 4.5
+==
+Several enhancements in OpenSAF4.5 add new message types or add new imm server 
states
+for ccb handling (#798,  #799, #16). During a rolling upgrade from an earlier 
OpenSAF
+release to the 4.5 release there will be nodes executing the older release 
concurrently
+with nodes executing OpenSAF 4.5. Nodes executing the earlier release will not
+recognize new message types originating from nodes executing 4.5 and messages 
from
+nodes executing the old release may interfere with the extended state machine 
for ccbs
+in new 4.5 nodes.
+
+Because of this upgrade issue, the older more restrictive behavior is still 
enforced
+in OpenSAF 4.5 unless a flag is toggled on in the opensafImmNostdFlags runtime
+attribute in the object: opensafImm=opensafImm,safApp=safImmService.
+The following is the shell command:
+
+immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:16 \
+   opensafImm=opensafImm,safApp=safImmService
+
+This will set bit 5 of the 'opensafImmNostdFlags' runtime attribute inside the 
immsv.
+Operation-id '1' invoked on the object:
+
+ 'opensafImm=opensafImm,safApp=safImmService'
+
+has the meaning of 'flags-ON'. Operation-id '2' has the meaning of 'flags-OFF'.
+This flag (and possibly other relevant flags) needs to be toggled ON when the 
upgrade
+to OpenSAF 4.5 has been successfully completed. This would be in some final 
step of
+the upgrade. Any cluster start/restart of an OpenSAF4.5 system will always
+automatically toggle on relevant flags. 
+
+In summary:
+
+Bit 1 controls schema (imm class) changes allowed or not.
+Bit 2 controls OpenSAF4.1 protocols allowed or not.
+Bit 3 controls OpenSAF4.3 protocols allowed or not.
+Bit 4 controls 2PBE oneSafe2PBE (see 2PBE feature in OpenSAF4.4 above).
+Bit 5 controls OpenSAF4.5 protocols allowed or not.
+
+
 
 DEPENDENCIES
 
diff --git a/osaf/services/saf/immsv/immnd/ImmModel.cc 
b/osaf/services/saf/immsv/immnd/ImmModel.cc
--- a/osaf/services/saf/immsv/immnd/ImmModel.cc
+++ b/osaf/services/saf/immsv/immnd/ImmModel.cc
@@ -890,6 +890,14 @@ immModel_oneSafe2PBEAllowed(IMMND_CB *cb
 }
 
 SaBoolT
+immModel_protocol45Allowed(IMMND_CB *cb)
+{
+return (ImmModel::instance(&cb->immModel)->protocol45Allowed()) ?
+SA_TRUE : SA_FALSE;
+}
+
+
+SaBoolT
 immModel_purgeSyncRequest(IMMND_CB *cb, SaUint32T clientId)
 {
 return (ImmModel::instance(&cb->immM

Re: [devel] [PATCH 3 of 3] dtm: make TCP_NODELAY set socket option configurable [#654]

2014-05-20 Thread A V Mahesh
Hi Hans,

On 5/20/2014 1:03 AM, Hans Feldt wrote:
> Not sure I understand the use case for setting NODELAY to false. In the code 
> you have a comment about setting it to true which anyway is the default. So 
> what is the benefit of setting it to false?
[AVM]I  came across  one requirement where ,user don't want much 
frequent traffic in network , and doesn't bother about performance .
> Since dtm is a system wide messaging service it will affect all services for 
> example AMFs healtchecks can be delayed sitting in a queue somewhere...
  [AVM]  Opnesaf  was working  NODELAY to false previously , we have 
introduced NODELAY to true recently , so their wont be any problem.

-AVM

>
> Thanks,
> Hans
>
>> -Original Message-
>> From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com]
>> Sent: den 19 maj 2014 06:41
>> To: de...@list.opensaf.org; Hans Feldt
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: [PATCH 3 of 3] dtm: make TCP_NODELAY set socket option configurable 
>> [#654]
>>
>>   osaf/services/infrastructure/dtms/config/dtmd.conf   |  11 
>>   osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c |  36 
>> 
>>   osaf/services/infrastructure/dtms/dtm/dtm_read_config.c  |  15 ++
>>   osaf/services/infrastructure/dtms/include/dtm_cb.h   |   1 +
>>   4 files changed, 45 insertions(+), 18 deletions(-)
>>
>>
>> Made TCP_NODELAY configurable to specifi whether MDS TCP
>> transport should follow the Nagle algorithm for deciding when to send data.By
>> default, TCP will follow the Nagle algorithm is disable this behavior , can 
>> be
>> configured to TRUE(1) or FALSE(0)
>>
>> Note :This path is on top of Hans Feldt published   on 4-29-2014 10:58 AM.
>>
>> diff --git a/osaf/services/infrastructure/dtms/config/dtmd.conf 
>> b/osaf/services/infrastructure/dtms/config/dtmd.conf
>> --- a/osaf/services/infrastructure/dtms/config/dtmd.conf
>> +++ b/osaf/services/infrastructure/dtms/config/dtmd.conf
>> @@ -85,3 +85,14 @@ DTM_TCP_KEEPALIVE_PROBES=2
>>   #export MDS_SOCK_SND_RCV_BUF_SIZE=16777216
>>   DTM_SOCK_SND_RCV_BUF_SIZE=126976
>>
>> +#
>> +#Specifies whether MDS TCP transport should follow the Nagle algorithm for
>> +#deciding when to send data.By default, TCP will follow the Nagle algorithm 
>> is disable
>> +#this behavior,MDS TCP transport can enable TCP_NODELAY to force
>> +#TCP to always send data immediately.For example,TCP_NODELAY should be used
>> +#when there is an application using TCP for a request/response.
>> +#This option is only supported for sockets with an address family
>> +#of AF_INET or AF_INET6 (internode )and type of SOCK_STREAM
>> +# TRUE(1) or FALSE(0)
>> +DTM_TCP_NODELAY_FLAG=1
>> +
>> diff --git a/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
>> b/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
>> --- a/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
>> +++ b/osaf/services/infrastructure/dtms/dtm/dtm_node_sockets.c
>> @@ -577,12 +577,12 @@ int comm_socket_setup_new(DTM_INTERNODE_
>>  goto done;
>>  }
>>
>> -int flag = 1;
>> -if (setsockopt(sock_desc, IPPROTO_TCP, TCP_NODELAY, (void *)&flag, 
>> sizeof(flag)) != 0) {
>> -LOG_ER("DTM:Socket TCP_NODELAY set failed err :%s", 
>> strerror(errno));
>> -dtm_comm_socket_close(&sock_desc);
>> -goto done;
>> -}
>> +int flag = dtms_cb->tcp_nodelay_flag;
>> +if (setsockopt(sock_desc, IPPROTO_TCP, TCP_NODELAY, (void *)&flag, 
>> sizeof(flag)) != 0) {
>> +LOG_ER("DTM:Socket TCP_NODELAY set failed err :%s", 
>> strerror(errno));
>> +dtm_comm_socket_close(&sock_desc);
>> +goto done;
>> +}
>>
>>  if (NCSCC_RC_SUCCESS != set_keepalive(dtms_cb, sock_desc)) {
>>  LOG_ER("DTM :set_keepalive failed ");
>> @@ -727,12 +727,12 @@ uint32_t dtm_stream_nonblocking_listener
>>  LOG_ER("DTM:Socket snd buf size set failed err :%s", 
>> strerror(errno));
>>  }
>>
>> -int flag = 1;
>> -if (setsockopt(dtms_cb->stream_sock, IPPROTO_TCP, TCP_NODELAY, 
>> (void *)&flag, sizeof(flag)) != 0) {
>> -LOG_ER("DTM:Socket TCP_NODELAY set failed err :%s", 
>> strerror(errno));
>> -dtm_sockdesc_close(dtms_cb->stream_sock);
>> -return NCSCC_RC_FAILURE;
>> -}
>> +int flag = dtms_cb->tcp_nodelay_flag;
>> +if (setsockopt(dtms_cb->stream_sock, IPPROTO_TCP, TCP_NODELAY, (void 
>> *)&flag, sizeof(flag)) != 0) {
>> +LOG_ER("DTM:Socket TCP_NODELAY set failed err :%s", 
>> strerror(errno));
>> +dtm_sockdesc_close(dtms_cb->stream_sock);
>> +return NCSCC_RC_FAILURE;
>> +}
>>
>>  if (set_keepalive(dtms_cb, dtms_cb->stream_sock) != NCSCC_RC_SUCCESS) {
>>  LOG_ER("DTM : set_keepalive() failed");
>> @@ -1296,12 +1296,12 @@ int dtm_process_accept(DTM_INTERNODE_CB
>>  goto done;
>>  }
>>
>> -