[devel] Announcement: OpenSAF policy on limits
I would like to announce that OpenSAF now has a policy regarding limits: limits should, in general, be possible to configure without recompiling the source code. This means that we would like to move away from using hard-coded limits, and we would like the code to be designed in scalable way so that limits can be increased. This way, we can ensure that OpenSAF is flexible enough to be used in a wide variety of applications and deployments. The rest of this message contains some detailed discussions around limits. Read on if you are interested, but the main point of this message has already been covered. :-) First of all I would like to point out that instead of making a limit configurable, the limit could be removed altogether. However, in many cases there is a good reason to have a limit, especially in the case of resource limits that I will describe in more details below. The main reason for having a limit is to protect the system from a misbehaving application. There are many reasons why an application may misbehave so that it will exceed the limits we have set: * Due to a resource leak * Due to a fault in the program logic (e.g. miscalculating the size) * Due to memory corruption * Due to an attack By having a limit, we can detect the problem early and stop it before the whole system becomes unstable. EXPLICIT LIMITS === Resource limits --- By a resource limit I mean a limit for something that the application allocates and deallocates. Thus, if the application forgets to deallocate the resource, we have a resource leak. We are mainly concerned about resources that are visible outside the application process: a resource that is allocated either in the memory of an OpenSAF service, on disk, or in the Linux OS (e.g. a shared memory segment). If the resource is allocated locally inside the application then a resource leak could be regarded as an ordinary memory leak in the application. A typical example of a resource is a SAF handle returned by one of the saXxxInitalize() functions. Compare this with a file handle in Linux. In Linux, there are two limits for file handles: a per-process limit that can be configured using the setrlimit() system call, and a system-global limit that can configured using the /proc file system. The reason for having a per-process limit is that if the application process is leaking file handles, we don't want it to exhaust all available file handles in the entire system. Therefore, the per-process limit is set to a small value (1024 by default) that is much lower than the system-global limit. Size limits --- By size limit I mean typically the maximum size for an object. It could for example be a maximum string length. Again, if the string is just allocated locally in the application process, we don't have to be so concerned about it. But if the string is passed as a parameter to an OpenSAF service, we may wish to protect the service against receiving too large strings. Imagine that the application has a memory corruption bug, so that the variable containing the length of the string is garbled. The OpenSAF service would receive an insanely large string, which if not rejected is likely to cause problems to the service. Distinguished names, normally stored in the SaNameT type, is an example for which we have a size limit. Unfortunately, the size limit is in this particular case specified by the SAF standard, and is therefore difficult to change. However, there are suggestions for how we can make an OpenSAF extension that would allow longer distinguished names. Time limits --- Time limits, or maybe timeout limits, is yet another category of limits. There may be a need to configure time limits differently on different types of systems. OpenSAF can be used on large systems with many nodes, possibly geographically distributed, and with a large disk accessed over the network. Or it may be used on a small two-node embedded system with a fast node-local solid state disk or flash memory. IMPLICIT LIMITS === The different types of limits mentioned above are expressed explicitly in the code; there is a place in the code where we check if we are above or below the limit, and take different actions depending on this. I would also like to mention implicit limits; limits that are not expressed directly in the code, but rather are the results of the way the code is written: the data types, OS functions or algorithms that were chosen. The sections below describe different types of implicit limits: Limited range of (integer) data type A 64-bit integer can for all practical purposes be regarded as being able to hold an infinite range of numbers. There may be a perfectly good reason to use a smaller integer type in order to save memory, network bandwidth, or disk space. But please be careful to ensure that the data type has sufficient range to hold all possible values also in the case where the
Re: [devel] [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM at amfnd [#98]
On 06/25/2013 11:50 AM, Mathivanan Naickan Palanivelu wrote: -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] - I'm sure you would have had a reason to do it this way, but I thought alternatively performing the repair at the AMFND itself(instead of notifying to AMFD) is one viable option! Guess not it interferes with a potential auto adjust feature right? I meant that we could avoid the 'extra step' of informing AMFD. May be, we could reduce the latency (in repairing) if avoid going through AMFD. Well, AMFND could be made aware of autoadjust attributes, isn't it! There might be a another higher ranked SU that should be activated instead. A bit like auto adjust at failures, not the full semantics. A think it will actually work like that. Praveen? Thanks, Hans -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM at amfnd [#98]
Thanks for the comments and discussion. I will respond for other comments soon, as of now starting with this mail. Please see the response below. One more point can be brought into discussion here. The case is when admin restart operation is invoked on the component and saAmfDisableRestart is true for it. Now if saAmfSUFailover is also true for the SU of this component, then: case1) AMF should honor saAmfSUFailover and perform the failover of whole SU. case2) Amf should reject the comp restart admin operation because whole su should fail-over as a single entity. In my opinion AMF should reject the operation. Any comments? On 25-Jun-13 3:20 PM, Mathivanan Naickan Palanivelu wrote: -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: Tuesday, June 25, 2013 3:01 PM To: Mathivanan Naickan Palanivelu Cc: Praveen Malviya; Nagendra Kumar; opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM at amfnd [#98] On 06/24/2013 04:39 PM, Mathivanan Naickan Palanivelu wrote: Hi Praveen, Thanks for the clarification. Please find some other comments below: - I'm sure you would have had a reason to do it this way, but I thought alternatively performing the repair at the AMFND itself(instead of notifying to AMFD) is one viable option! Guess not it interferes with a potential auto adjust feature right? I meant that we could avoid the 'extra step' of informing AMFD. May be, we could reduce the latency (in repairing) if avoid going through AMFD. Well, AMFND could be made aware of autoadjust attributes, isn't it! saAmfSGAutoRepair attribute is maintained at amfd only. Current implementation support repair of SU through admin operation. When repair admin operation is invoked, amfd informs amfnd to perform repair. So functionally it is amfnd that performs the repair. In these patches, this same mechanism is used to perform repair. After performing failover of assignments of faulted SU (this only amfd can do), amfd checks saAmfSGAutoRepair and if it is true informs amfnd to perform repair. Rest of the flow is same as admin repair operation. If saAmfSGAutoRepair is maintained at amfnd, then in ccb modify operations on this attribute amfd will have to update to each amfnd hosting SUs of this SG. These again will be extra steps. Some other comments I had in mind are: - is si-si dependencies automatically taken care by these changes? - are we honouring 'order' of termination of components based on the 'instantiationlevel'(in reverse)? 3.11.1.3.2 SU failover: If the service unit is configured to fail over as a single entity (saAmfSUFailover set to SA_TRUE), all other components of the service unit are abruptly terminated Does not say anything about ordering. Should we add some defined ordering semantics on top of that you mean? I meant the below. Also, SU failover 'effectively' involves restart of the failed SU! -In case of SU failover no quiesced assignment will be given. So SI dep will be honored only while giving active assignment to the standby or spare su. - As a part of su failover components will be abruptly terminated without honoring instantiation-level. I think instantiation-level is to be considered during graceful termination of component like Lock-in operation. In case of SU restart one ticket exists #315 FYI just realized ordering and semantics of SU restart is wrong. Will write a defect. As pointed out ticket already exists #315. I suspected this problem could exist. Thanks, Mathi. Thanks, Hans Thanks, Mathi. -Original Message- From: praveen malviya Sent: Friday, June 21, 2013 6:20 PM To: Mathivanan Naickan Palanivelu Cc: hans.fe...@ericsson.com; Nagendra Kumar; opensaf- de...@lists.sourceforge.net Subject: Re: [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM at amfnd [#98] Can you briefly highlight during which scenario the **autorepair is taken into account by this patch. **autorepiar is one of the attributes of SG (saAmfSGAutoRepair). During sufailover AMF will perform recovery first. If saAmfSGAutoRepair is true for the SG of faulted SU then AMF will perform auto-repair by enabling the SU. Thanks Praveen On 21-Jun-13 5:39 PM, Mathivanan Naickan Palanivelu wrote: Hi Praveen, Good that you have sent them as patch series and thanks for working on this long awaited ticket. A quick question. Can you briefly highlight during which scenario the **autorepair is taken into account by this patch. Thanks, Mathi. -Original Message- From: Praveen Malviya Sent: Friday, June 07, 2013 12:10 PM To: hans.fe...@ericsson.com; Mathivanan Naickan Palanivelu; Nagendra Kumar Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM at amfnd [#98] osaf/services/saf/avsv/avnd/avnd_clc.c | 130 ++-
Re: [devel] [PATCH 6 of 6] amf: handle sufailover in SU FSM and Comp FSM at amfnd [#98]
On 25-Jun-13 4:24 PM, Hans Feldt wrote: On 06/25/2013 11:50 AM, Mathivanan Naickan Palanivelu wrote: -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] - I'm sure you would have had a reason to do it this way, but I thought alternatively performing the repair at the AMFND itself(instead of notifying to AMFD) is one viable option! Guess not it interferes with a potential auto adjust feature right? I meant that we could avoid the 'extra step' of informing AMFD. May be, we could reduce the latency (in repairing) if avoid going through AMFD. Well, AMFND could be made aware of autoadjust attributes, isn't it! There might be a another higher ranked SU that should be activated instead. A bit like auto adjust at failures, not the full semantics. A think it will actually work like that. Praveen? Yes, after fail-over avd_sg_app_su_inst_func() will do that. It will try to instantiate other higher ranked SU honoring other conditions and SG attributes. Thanks, Hans -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel