[devel] Announcement: OpenSAF policy on limits

2013-06-25 Thread Anders Widell
I would like to announce that OpenSAF now has a policy regarding
limits: limits should, in general, be possible to configure without
recompiling the source code. This means that we would like to move
away from using hard-coded limits, and we would like the code to be
designed in scalable way so that limits can be increased. This way, we
can ensure that OpenSAF is flexible enough to be used in a wide
variety of applications and deployments.

The rest of this message contains some detailed discussions around
limits. Read on if you are interested, but the main point of this
message has already been covered. :-)

First of all I would like to point out that instead of making a limit
configurable, the limit could be removed altogether. However, in many
cases there is a good reason to have a limit, especially in the case
of resource limits that I will describe in more details below. The
main reason for having a limit is to protect the system from a
misbehaving application. There are many reasons why an application may
misbehave so that it will exceed the limits we have set:

* Due to a resource leak
* Due to a fault in the program logic (e.g. miscalculating the size)
* Due to memory corruption
* Due to an attack

By having a limit, we can detect the problem early and stop it before
the whole system becomes unstable.

EXPLICIT LIMITS
===

Resource limits
---

By a resource limit I mean a limit for something that the application
allocates and deallocates. Thus, if the application forgets to
deallocate the resource, we have a resource leak. We are mainly
concerned about resources that are visible outside the application
process: a resource that is allocated either in the memory of an
OpenSAF service, on disk, or in the Linux OS (e.g. a shared memory
segment). If the resource is allocated locally inside the application
then a resource leak could be regarded as an ordinary memory leak in
the application.

A typical example of a resource is a SAF handle returned by one of the
saXxxInitalize() functions. Compare this with a file handle in
Linux. In Linux, there are two limits for file handles: a per-process
limit that can be configured using the setrlimit() system call, and a
system-global limit that can configured using the /proc file
system. The reason for having a per-process limit is that if the
application process is leaking file handles, we don't want it to
exhaust all available file handles in the entire system. Therefore,
the per-process limit is set to a small value (1024 by default) that
is much lower than the system-global limit.

Size limits
---

By size limit I mean typically the maximum size for an object. It
could for example be a maximum string length. Again, if the string is
just allocated locally in the application process, we don't have to be
so concerned about it. But if the string is passed as a parameter to
an OpenSAF service, we may wish to protect the service against
receiving too large strings. Imagine that the application has a memory
corruption bug, so that the variable containing the length of the
string is garbled. The OpenSAF service would receive an insanely large
string, which if not rejected is likely to cause problems to the
service.

Distinguished names, normally stored in the SaNameT type, is an
example for which we have a size limit. Unfortunately, the size limit
is in this particular case specified by the SAF standard, and is
therefore difficult to change. However, there are suggestions for how
we can make an OpenSAF extension that would allow longer distinguished
names.

Time limits
---

Time limits, or maybe timeout limits, is yet another category of
limits. There may be a need to configure time limits differently on
different types of systems. OpenSAF can be used on large systems with
many nodes, possibly geographically distributed, and with a large disk
accessed over the network. Or it may be used on a small two-node
embedded system with a fast node-local solid state disk or flash
memory.

IMPLICIT LIMITS
===

The different types of limits mentioned above are expressed explicitly
in the code; there is a place in the code where we check if we are
above or below the limit, and take different actions depending on
this. I would also like to mention implicit limits; limits that are
not expressed directly in the code, but rather are the results of the
way the code is written: the data types, OS functions or algorithms
that were chosen. The sections below describe different types of
implicit limits:

Limited range of (integer) data type


A 64-bit integer can for all practical purposes be regarded as being
able to hold an infinite range of numbers. There may be a perfectly
good reason to use a smaller integer type in order to save memory,
network bandwidth, or disk space. But please be careful to ensure that
the data type has sufficient range to hold all possible values also in
the case where the 

Re: [devel] [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM at amfnd [#98]

2013-06-25 Thread Hans Feldt

On 06/25/2013 11:50 AM, Mathivanan Naickan Palanivelu wrote:
 -Original Message-
 From: Hans Feldt [mailto:hans.fe...@ericsson.com]

 - I'm sure you would have had a reason to do it this way, but I thought
 alternatively performing the repair at the AMFND itself(instead of notifying
 to AMFD) is one viable option!

 Guess not it interferes with a potential auto adjust feature right?

 I meant that we could avoid the 'extra step' of informing AMFD.
 May be, we could reduce the latency (in repairing) if avoid going through 
 AMFD.
 Well, AMFND could be made aware of autoadjust attributes, isn't it!

There might be a another higher ranked SU that should be activated 
instead. A bit like auto adjust at failures, not the full semantics. A 
think it will actually work like that. Praveen?

Thanks,
Hans


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM at amfnd [#98]

2013-06-25 Thread praveen malviya
Thanks for the comments and discussion. I will respond for other 
comments soon, as of now starting with this mail.
Please see the response below.

One more point can be brought into discussion here. The case is when 
admin restart operation is invoked on the component and 
saAmfDisableRestart is true for it.
  Now if saAmfSUFailover is also true for the SU of this component, then:
case1) AMF should honor saAmfSUFailover and perform the failover of 
whole SU.
case2) Amf should reject the comp restart admin operation because whole 
su should fail-over as a single entity. In my opinion AMF should reject 
the operation.

Any comments?

On 25-Jun-13 3:20 PM, Mathivanan Naickan Palanivelu wrote:
 -Original Message-
 From: Hans Feldt [mailto:hans.fe...@ericsson.com]
 Sent: Tuesday, June 25, 2013 3:01 PM
 To: Mathivanan Naickan Palanivelu
 Cc: Praveen Malviya; Nagendra Kumar; opensaf-devel@lists.sourceforge.net
 Subject: Re: [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM
 at amfnd [#98]


 On 06/24/2013 04:39 PM, Mathivanan Naickan Palanivelu wrote:
 Hi Praveen,

 Thanks for the clarification.
 Please find some other comments below:

 - I'm sure you would have had a reason to do it this way, but I thought
 alternatively performing the repair at the AMFND itself(instead of notifying
 to AMFD) is one viable option!

 Guess not it interferes with a potential auto adjust feature right?
 I meant that we could avoid the 'extra step' of informing AMFD.
 May be, we could reduce the latency (in repairing) if avoid going through 
 AMFD.
 Well, AMFND could be made aware of autoadjust attributes, isn't it!
saAmfSGAutoRepair attribute is maintained at amfd only. Current 
implementation support repair of SU through admin operation.
When repair admin operation is invoked, amfd informs amfnd to perform 
repair. So functionally it is amfnd that performs the repair.
In these patches, this same mechanism is used to perform repair. After 
performing failover of assignments of faulted SU (this only amfd can 
do), amfd checks  saAmfSGAutoRepair and if it is true informs amfnd to 
perform repair. Rest of the flow is same as admin repair operation.
If saAmfSGAutoRepair  is maintained at amfnd, then in ccb modify 
operations on this attribute  amfd will have to update to each amfnd 
hosting SUs of this SG. These again will be extra steps.

 Some other comments I had in mind are:
 - is si-si dependencies automatically taken care by these changes?
 - are we honouring 'order' of termination of components based on the
 'instantiationlevel'(in reverse)?

 3.11.1.3.2 SU failover:

 If the service unit is configured to fail over as a single entity
 (saAmfSUFailover set to SA_TRUE), all other components of the service unit
 are abruptly terminated

 Does not say anything about ordering. Should we add some defined ordering
 semantics on top of that you mean?


 I meant the below. Also, SU failover 'effectively' involves restart of the 
 failed SU!
-In case of SU failover no quiesced assignment will be given. So SI dep 
will be honored only while giving active assignment to the standby or 
spare su.
- As a part of su failover components will be abruptly terminated 
without honoring instantiation-level. I think instantiation-level is to 
be considered during graceful termination
of component like Lock-in operation.
In case of SU restart one ticket exists  #315
 FYI just realized ordering and semantics of SU restart is wrong. Will
 write a defect.
As pointed out ticket already exists #315.
 I suspected this problem could exist.

 Thanks,
 Mathi.

 Thanks,
 Hans

 Thanks,
 Mathi.


 -Original Message-
 From: praveen malviya
 Sent: Friday, June 21, 2013 6:20 PM
 To: Mathivanan Naickan Palanivelu
 Cc: hans.fe...@ericsson.com; Nagendra Kumar; opensaf-
 de...@lists.sourceforge.net
 Subject: Re: [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp
 FSM
 at amfnd [#98]

 Can you briefly highlight during which scenario the **autorepair is
 taken
 into account by this patch.

 **autorepiar is one of the attributes of SG (saAmfSGAutoRepair). During
 sufailover AMF will perform recovery first.
 If saAmfSGAutoRepair is true for the SG of faulted SU then AMF will
 perform
 auto-repair by enabling the SU.

 Thanks
 Praveen

 On 21-Jun-13 5:39 PM, Mathivanan Naickan Palanivelu wrote:
 Hi Praveen,

 Good that you have sent them as patch series and thanks for working on
 this long awaited ticket.
 A quick question.

 Can you briefly highlight during which scenario the **autorepair is taken
 into account by this patch.
 Thanks,
 Mathi.

 -Original Message-
 From: Praveen Malviya
 Sent: Friday, June 07, 2013 12:10 PM
 To: hans.fe...@ericsson.com; Mathivanan Naickan Palanivelu;
 Nagendra
 Kumar
 Cc: opensaf-devel@lists.sourceforge.net
 Subject: [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp
 FSM
 at amfnd [#98]

 osaf/services/saf/avsv/avnd/avnd_clc.c  |  130
 ++-
 
 

Re: [devel] [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM at amfnd [#98]

2013-06-25 Thread praveen malviya

On 25-Jun-13 4:24 PM, Hans Feldt wrote:

 On 06/25/2013 11:50 AM, Mathivanan Naickan Palanivelu wrote:
 -Original Message-
 From: Hans Feldt [mailto:hans.fe...@ericsson.com]

 - I'm sure you would have had a reason to do it this way, but I 
 thought
 alternatively performing the repair at the AMFND itself(instead of 
 notifying
 to AMFD) is one viable option!

 Guess not it interferes with a potential auto adjust feature right?

 I meant that we could avoid the 'extra step' of informing AMFD.
 May be, we could reduce the latency (in repairing) if avoid going 
 through AMFD.
 Well, AMFND could be made aware of autoadjust attributes, isn't it!

 There might be a another higher ranked SU that should be activated 
 instead. A bit like auto adjust at failures, not the full semantics. 
 A think it will actually work like that. Praveen?

Yes, after fail-over  avd_sg_app_su_inst_func() will do that. It will 
try to instantiate other higher ranked SU honoring other conditions and  
SG attributes.
 Thanks,
 Hans



--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel