Ack from me for the second part of the patch (code review only).

I think ticket should be kept opened for further investigation supported 
by updated logs. Also description needs to be changed to match the 
pbulsihed/pushed patch.


Thanks,
Praveen

On 10-May-16 5:02 PM, praveen malviya wrote:
>
>
> On 10-May-16 2:36 PM, Gary Lee wrote:
>> Hi Praveen
>>
>> Thanks for the comment. Yes, I think changing AVSV_HB_DURATION through 
>> amfnd.conf is probably better.
>>
>> Are you OK with the change in role.cc?
>>
> For the second part: I did not get it completely. As mentioned in the
> commit logs, the problems seems to be in MDS, is it related to sender
> part or receiver part of MDS? Any chance that this 75 node cluster is
> crossing any MDS limitation (documented or undocumented)?
>   Please share traces and syslog, if possbile, for clear understanding
> of the problem.
>
> Thanks,
> Praveen
>>
>> Thanks
>> Gary
>>
>> On 10/05/2016, 7:01 PM, "praveen malviya" <[email protected]> wrote:
>>
>>> One comment on first part of the patch.
>>>
>>> Thanks,
>>> Praveen
>>>
>>> On 09-May-16 12:46 PM, Gary Lee wrote:
>>>>  osaf/libs/common/amf/include/amf_defs.h |   2 +-
>>>>  osaf/services/saf/amf/amfd/role.cc      |  10 +++++-----
>>>>  2 files changed, 6 insertions(+), 6 deletions(-)
>>>>
>>>>
>>>> On a 75 node deployment, avd_imm_config_get() can take up to 3 minutes to 
>>>> complete.
>>>> The default heartbeat period of 60s between amfnd and amfd does not allow 
>>>> enough time
>>>> for avd_imm_config_get() to finish. The same thread is handling both 
>>>> heartbeat
>>>> and reading of IMM.
>>>>
>>>> Furthermore, it has been observed that CLM callbacks to amfd can become 
>>>> 'lost'
>>>> in a large cluster. It seems to be occurring in MDS, when the callbacks are
>>>> sent around the same time as amfd is calling avd_imm_config_get().
>>>> A workaround is to call avd_clm_track_start() after avd_imm_config_get() 
>>>> is completed.
>>>> Further investigations are required. It seems avd_imm_config_get() 
>>>> generates a large
>>>> amount of traffic through MDS.
>>>>
>>>> diff --git a/osaf/libs/common/amf/include/amf_defs.h 
>>>> b/osaf/libs/common/amf/include/amf_defs.h
>>>> --- a/osaf/libs/common/amf/include/amf_defs.h
>>>> +++ b/osaf/libs/common/amf/include/amf_defs.h
>>>> @@ -58,7 +58,7 @@
>>>>  #define AVSV_DEF_HB_PERIOD (10 * SA_TIME_ONE_SECOND)
>>>>
>>>>  /* Default Heart beat duration */
>>>> -#define AVSV_DEF_HB_DURATION (60 * SA_TIME_ONE_SECOND)
>>>> +#define AVSV_DEF_HB_DURATION (180 * SA_TIME_ONE_SECOND)
>>>>
>>> Heart beat duration can be configured through amfnd.conf (#export
>>> AVSV_HB_DURATION=60000000000).
>>> It is left to user to judge and fine tune it for environment in which
>>> applications are being deployed. Also a user may like to change
>>> corresponding value of period (amfd.conf AVSV_HB_PERIOD=10000000000).
>>> So in the reported case it can be configured to 3 minutes.
>>>
>>> I think instead of changing the default value, the optimized values for
>>> different cluster size can be mentioned in PR doc. AMF cannot always
>>> chase cluster size through defects.
>>>
>>>
>>>
>>>>  typedef enum {
>>>>    AVSV_COMP_TYPE_INVALID,
>>>> diff --git a/osaf/services/saf/amf/amfd/role.cc 
>>>> b/osaf/services/saf/amf/amfd/role.cc
>>>> --- a/osaf/services/saf/amf/amfd/role.cc
>>>> +++ b/osaf/services/saf/amf/amfd/role.cc
>>>> @@ -174,11 +174,6 @@ uint32_t avd_active_role_initialization(
>>>>
>>>>    TRACE_ENTER();
>>>>
>>>> -  if (avd_clm_track_start() != SA_AIS_OK) {
>>>> -          LOG_ER("avd_clm_track_start FAILED");
>>>> -          goto done;
>>>> -  }
>>>> -
>>>>    if (avd_imm_config_get() != NCSCC_RC_SUCCESS) {
>>>>            LOG_ER("avd_imm_config_get FAILED");
>>>>            goto done;
>>>> @@ -193,6 +188,11 @@ uint32_t avd_active_role_initialization(
>>>>
>>>>    avd_imm_update_runtime_attrs();
>>>>
>>>> +  if (avd_clm_track_start() != SA_AIS_OK) {
>>>> +          LOG_ER("avd_clm_track_start FAILED");
>>>> +          goto done;
>>>> +  }
>>>> +
>>>>    status = NCSCC_RC_SUCCESS;
>>>>  done:
>>>>    TRACE_LEAVE();
>>>>
>>
>
> ------------------------------------------------------------------------------
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
> _______________________________________________
> Opensaf-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to