Hi HansN,

 >> any  how GA is tagged.

Sorry I mean  RC2 tagged

-AVM

On 9/21/2016 12:41 PM, A V Mahesh wrote:
> Hi HansN,
>
> I just tested with uniform buffer sizes in all nodes and sending 
> messages with normal phase the results looks OK,
> even after hitting the TIPC_ERR_OVERLOAD.
>
> So my conclusion is, in general all node will have same buffer sizes 
> let us go with V2  patch,  any  how GA is tagged ,
> so we have enough time for testing and if we get some issues we can 
> resolve them by next release.
>
> ==================================================================================================
>  
>
>
> Sep 21 11:51:40 SC-1 osafamfd[15792]: NO Node 'PL-4' joined the cluster
> Sep 21 11:51:40 SC-1 osafimmnd[15741]: NO Implementer connected: 17 
> (MsgQueueService132111) <0, 2040f>
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message 
> condition ancillary data size: 0 : TIPC_ERR_OVERLOAD
> Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message 
> condition ancillary data: TIPC_RETDATA
>
> ==================================================================================================
>  
>
>
>
> On 9/21/2016 11:37 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> On 9/20/2016 4:17 PM, Hans Nordebäck wrote:
>>> Hi Mahesh,
>>>
>>> I think only logging is needed as proposed in the patch, as some 
>>> services are already handling dropped messages. This logging will 
>>> help in
>>> trouble shooting. Keeping TIPC_DEST_DROPPABLE to true will only make 
>>> TIPC to silently drop messages, the original problem persists and 
>>> needs investigation,
>>> i.e. why the socket receive buffer is overloaded, one reason may be 
>>> that the MDS poll/receive loop together with the "big" mutex lock, 
>>> (ticket #520).
>> [AVM]   One valid reason could be, in case of  TIPC_ERR_OVERLOAD
>> recd_bytes is NOT zero ,  so buffer is overloaded can occur at TIPC or
>> MDS level ,
>>                 I  will investigate more and update.
>>
>>> Did you check why MDS message loss mechanism doesn't detect on TIPC 
>>> dropped messages, AMF
>>> do detect this via e.g "out of sync", "msg id mismatch" and so on?
>> [AVM]  You mean  IMMD  message loss mechanism ?
>>
>> -AVM
>>> /Regards HansN
>>>
>>> -----Original Message-----
>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>> Sent: den 20 september 2016 12:29
>>> To: Anders Widell <anders.wid...@ericsson.com>; Hans Nordebäck 
>>> <hans.nordeb...@ericsson.com>
>>> Cc: opensaf-devel@lists.sourceforge.net; mathi.naic...@oracle.com
>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>>>
>>> HI Anders Widell / HansN,
>>>
>>> On 9/16/2016 2:03 PM, Anders Widell wrote:
>>>> The idea was to just log reception of error info messages, for
>>>> trouble-shooting purposes.
>>> After multiple attempts,  i manged to simulate TIPC_ERR_OVERLOAD
>>> error.    After  TIPC_ERR_OVERLOAD error is hit
>>> the cluster going to UN-recoverable state , because the send buffers 
>>> are full.
>>>
>>> So we have two options :
>>>
>>> 1)  Set  TIPC_DEST_DROPPABLE to false ,  log TIPC_ERR_OVERLOAD error 
>>> and then  graceful  exist of sender,
>>>         which allows remaining nodes to be survived.
>>>
>>> 2)  keep the current configuration as it is ( TIPC_DEST_DROPPABLE to 
>>> true )
>>>
>>> =================================================================================================================
>>>  
>>>
>>> Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Received node_up from 2040f:
>>> msg_id 1
>>> Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Node 'PL-4' joined the 
>>> cluster Sep 20 15:14:09 SC-1 osafimmnd[3695]: NO Implementer 
>>> connected: 19
>>> (MsgQueueService132111) <0, 2040f>
>>> *Sep 20 15:16:59 SC-1 osafimmd[3684]: 77 MDTM: undelivered message 
>>> condition ancillary data: TIPC_ERR_OVERLOAD* Sep 20 15:17:00 SC-1 
>>> osafimmnd[3695]: WA Director Service in NOACTIVE state - fevs 
>>> replies pending:1 fevs highest processed:218744 Sep 20 15:17:00 SC-1 
>>> osafamfnd[3773]: NO 
>>> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
>>> 'avaDown' : Recovery is 'nodeFailfast'
>>> Sep 20 15:17:00 SC-1 osafamfnd[3773]: ER 
>>> safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
>>> to:avaDown Recovery is:nodeFailfast Sep 20 15:17:00 SC-1 
>>> osafamfnd[3773]: Rebooting OpenSAF NodeId = 131343 EE Name = , 
>>> Reason: Component faulted: recovery is node failfast, OwnNodeId = 
>>> 131343, SupervisionTime = 60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: 
>>> WA DISCARD DUPLICATE FEVS
>>> message:218744
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Error code 2 returned for 
>>> message type 82 - ignoring Sep 20 15:17:00 SC-1 opensaf_reboot: 
>>> Rebooting local node; timeout=60 Sep 20 15:17:00 SC-1 
>>> osafimmnd[3695]: WA SC Absence IS allowed:900 IMMD service is DOWN 
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO IMMD SERVICE IS DOWN, HYDRA 
>>> IS CONFIGURED => UNREGISTERING IMMND form MDS Sep 20 15:17:00 SC-1 
>>> osafntfimcnd[3742]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE 
>>> (9) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client 
>>> id:20002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 1 <2,
>>> 2010f> (safLogService)
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:d0d0002010f
>>> sv_id:26
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:100002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 2 
>>> <16,
>>> 2010f> (@safLogService_appl)
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:130002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 3 
>>> <19,
>>> 2010f> (@OpenSafImmReplicatorA)
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:140002010f
>>> sv_id:26
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:150002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 4 
>>> <21,
>>> 2010f> (safClmService)
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1a0002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 5 
>>> <26,
>>> 2010f> (safAmfService)
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1b0002010f
>>> sv_id:26
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bc0002010f
>>> sv_id:26
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bd0002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 6 
>>> <1469, 2010f> (MsgQueueService131343) Sep 20 15:17:00 SC-1 
>>> osafimmnd[3695]: NO Removing client id:5c00002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 10 
>>> <1472, 2010f> (safEvtService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: 
>>> NO Removing client id:5c40002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 8 
>>> <1476, 2010f> (safSmfService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: 
>>> NO Removing client id:5c60002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 9 
>>> <1478, 2010f> (safLckService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: 
>>> NO Removing client id:5c70002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 7 
>>> <1479, 2010f> (safMsgGrpService) Sep 20 15:17:00 SC-1 
>>> osafimmnd[3695]: NO Removing client id:5cc0002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5ce0002010f
>>> sv_id:27
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 12 
>>> <1486, 2010f> (safCheckPointService) Sep 20 15:17:00 SC-1 
>>> osafimmnd[3695]: NO Implementer disconnected 13 <0, 2020f(down)> 
>>> (MsgQueueService131599) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO 
>>> Implementer disconnected 14 <0, 2020f(down)> 
>>> (@OpenSafImmReplicatorB) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO 
>>> Implementer disconnected 15 <0, 2020f(down)> (@safAmfService2020f) 
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Impl Discarded node 2020f 
>>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 16 
>>> <0, 2030f(down)> (MsgQueueService131855) Sep 20 15:17:00 SC-1 
>>> osafimmnd[3695]: NO Impl Discarded node 2030f Sep 20 15:17:00 SC-1 
>>> osafimmnd[3695]: NO Implementer disconnected 19 <0, 2040f(down)> 
>>> (MsgQueueService132111) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO 
>>> Impl Discarded node 2040f Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO 
>>> MDS unregisterede. sleeping ...
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO Sleep done registering 
>>> IMMND with MDS Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: 
>>> mds_register_callback:
>>> dest 2010fe8fa0043 already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb60040 already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb6002e already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb60037 already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb60028 already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb6003d already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb6002b already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb6001c already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb60019 already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcba0012 already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb60028 already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback:
>>> dest 2010fdcb60019 already exist
>>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO SUCCESS IN REGISTERING 
>>> IMMND WITH MDS Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO Re-introduce-me
>>> highestProcessed:218744 highestReceived:218744 Sep 20 15:17:03 SC-1 
>>> kernel: [ 1794.198381] md: stopping all md devices.
>>> Sep 20 15:17:03 SC-1 osafntfimcnd[8997]: WA ntfimcn_imm_init
>>> saImmOiInitialize_2() returned SA_AIS_ERR_TIMEOUT (5) Sep 20 
>>> 15:18:00 SC-1 syslog-ng[1221]: syslog-ng starting up; version='2.0.9'
>>> =================================================================================================================
>>>  
>>>
>>>
>>> -AVM
>>>
>>> On 9/16/2016 2:03 PM, Anders Widell wrote:
>>>> I don't think we need (or even should) inform the sender when MDS
>>>> receives an error information message from TIPC. Note that these error
>>>> information messages are received asynchronously, when the sender has
>>>> already received an OK return code from the MDS send call. The idea
>>>> was to just log reception of error info messages, for trouble-shooting
>>>> purposes. We already have a mechanism in MDS that informs the receiver
>>>> about lost MDS messages. If we wish to inform the sender we would need
>>>> to introduce a second mechanism in MDS, and at this point I don't
>>>> think it is needed. Another approach we could consider is that MDS
>>>> retransmits the message transparently without informing the sender.
>>>> This would require MDS to internally store sent messages for a while,
>>>> so that they can be retransmitted. It would also require the receiver
>>>> to re-order received messages, since a retransmitted message will be
>>>> received out of sequence.
>>>>
>>>> regards,
>>>>
>>>> Anders Widell
>>>>
>>>>
>>>> On 09/16/2016 06:40 AM, A V Mahesh wrote:
>>>>> Hi HansN,
>>>>>
>>>>> I managed to create TIPC_ERRINFO/TIPC_RETDATA  error cases ( not
>>>>> TIPC_ERR_OVERLOAD error )  with normal messages and It is observed
>>>>> that  TIPC_DEST_DROPPABLE set to true even error TIPC_ERRINFO is NOT
>>>>> notified ( it means TIPC_ERR_OVERLOAD ) , if TIPC_DEST_DROPPABLE set
>>>>> to false TIPC_ERRINFO/TIPC_RETDATA errors are notified.
>>>>>
>>>>> Now I will also check implication of TIPC_DEST_DROPPABLE set to false
>>>>> on multicast and broadcast  messages, based on that we can re-arrange
>>>>> the TIPC_DEST_DROPPABLE setting to false conditions  based on agent
>>>>> `i_msg_loss_indication = true` condition mds can return to agent the
>>>>> same error  TIPC_ERR_OVERLOAD.
>>>>>
>>>>> TIPC_DEST_DROPPABLE to false:
>>>>>
>>>>> ==================================================================
>>>>>
>>>>> Sep 15 16:10:39 SC-1 osafimmnd[32051]: NO Implementer disconnected 13
>>>>> <0, 2040f> (MsgQueueService132111) Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]:  777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: NO MDS event
>>>>> from svc_id 25 (change:4, dest:567413369208836) Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]:  777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM:
>>>>> undelivered message condition ancillary data: TIPC_ERRINFO abort err
>>>>> : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: 7777 MDTM: undelivered
>>>>> message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]:  777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM:
>>>>> undelivered message condition ancillary data: TIPC_ERRINFO abort err
>>>>> : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: 7777 MDTM: undelivered
>>>>> message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]:  777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM:
>>>>> undelivered message condition ancillary data: TIPC_ERRINFO abort err
>>>>> : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: 7777 MDTM: undelivered
>>>>> message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]:  777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM:
>>>>> undelivered message condition ancillary data: TIPC_ERRINFO abort err
>>>>> : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: 7777 MDTM: undelivered
>>>>> message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]:  777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1
>>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary
>>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafamfd[32114]: NO Node
>>>>> 'PL-4' left the cluster
>>>>>
>>>>> ==================================================================
>>>>>
>>>>> TIPC_DEST_DROPPABLE to true:
>>>>>
>>>>> ==================================================================
>>>>>
>>>>> Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Implementer disconnected 13
>>>>> <0, 2040f> (MsgQueueService132111) Sep 15 15:59:55 SC-1
>>>>> osafimmd[26450]: NO MDS event from svc_id 25 (change:4,
>>>>> dest:567412923957252) Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO
>>>>> Global discard node received for nodeId:2040f pid:410 Sep 15 15:59:55
>>>>> SC-1 osafamfd[28810]: NO Node 'PL-4' left the cluster Sep 15 15:59:58
>>>>> SC-1 kernel: [ 5147.648737] tipc: Resetting link
>>>>> <1.1.1:eth0-1.1.4:eth0>, peer not responding Sep 15 15:59:58 SC-1
>>>>> kernel: [ 5147.648756] tipc: Lost link <1.1.1:eth0-1.1.4:eth0> on
>>>>> network plane A Sep 15 15:59:58 SC-1 kernel: [ 5147.648771] tipc:
>>>>> Lost contact with <1.1.4>
>>>>>
>>>>> ==================================================================
>>>>>
>>>>> -AVM
>>>>>
>>>>>
>>>>> On 9/1/2016 10:59 AM, Hans Nordebäck wrote:
>>>>>> Hi Mahesh,
>>>>>>
>>>>>> I have not tested this, but the following should work:
>>>>>>
>>>>>> - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE
>>>>>>
>>>>>> - set socket receive buffer to a small value:
>>>>>>
>>>>>>     optval = "small socket recieive buffer size" , 5000 ?
>>>>>>
>>>>>>     setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, &optval,
>>>>>> optlen)
>>>>>>
>>>>>> -  sysctl -w net.tipc.tipc_rmem="5000 40000000 68240400" (or smaller
>>>>>> values)
>>>>>>
>>>>>> - add some delays when processing messages in
>>>>>> mdtm_process_recv_events(), to provoke overloading the socket
>>>>>> receive buffer.
>>>>>>
>>>>>> We experience dropped packages in a 75 node system, and as a
>>>>>> workaround increasing the default so receive buffer size it seems
>>>>>> working for that setup.
>>>>>>
>>>>>> /Thanks HansN
>>>>>>
>>>>>> On 09/01/2016 05:50 AM, A V Mahesh wrote:
>>>>>>> Hi HansN,
>>>>>>>
>>>>>>> Do you have any tips to created overload case,
>>>>>>>
>>>>>>> I would like test and observe TIPC_DEST_DROPPABLE enabled &
>>>>>>> disabled cases.
>>>>>>>
>>>>>>> -AVM
>>>>>>>
>>>>>>>
>>>>>>> On 9/1/2016 9:12 AM, A V Mahesh wrote:
>>>>>>>> Hi HansN,
>>>>>>>>
>>>>>>>> Sorry for the delay.
>>>>>>>>
>>>>>>>> I will test it and get back to you soon.
>>>>>>>>
>>>>>>>> -AVM
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/31/2016 4:29 PM, Hans Nordebäck wrote:
>>>>>>>>> Hi Mahesh,
>>>>>>>>> Any updates on this?
>>>>>>>>>
>>>>>>>>> /Regards HansN
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Anders Widell
>>>>>>>>> Sent: den 25 augusti 2016 13:11
>>>>>>>>> To: A V Mahesh <mahesh.va...@oracle.com>; Hans Nordebäck
>>>>>>>>> <hans.nordeb...@ericsson.com>; mathi.naic...@oracle.com
>>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>>>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages
>>>>>>>>> [#1957]
>>>>>>>>>
>>>>>>>>> Hi!
>>>>>>>>>
>>>>>>>>> This is what the TIPC user documentation says about
>>>>>>>>> TIPC_DEST_DROPPABLE:
>>>>>>>>> "This option governs the handling of messages sent by the socket
>>>>>>>>> if the message cannot be delivered to its destination, either
>>>>>>>>> because the receiver is congested or because the specified
>>>>>>>>> receiver does not exist.
>>>>>>>>> If enabled, the message is discarded; otherwise the message is
>>>>>>>>> returned to the sender."
>>>>>>>>>
>>>>>>>>> This is what the TIPC user documentation says about the return
>>>>>>>>> value from the recvmsg() system call: "When used with a
>>>>>>>>> connectionless socket, a return value of 0 indicates the arrival
>>>>>>>>> of a returned data message that was originally sent by this 
>>>>>>>>> socket."
>>>>>>>>>
>>>>>>>>> I think the documentation is pretty clear. If you set
>>>>>>>>> TIPC_DEST_DROPPABLE to true, the receiver can discard messages
>>>>>>>>> e.g. when the receive buffer is full. The sender will not be
>>>>>>>>> notified in this case. If TIPC_DEST_DROPPABLE is set to false,
>>>>>>>>> the message will be returned to the sender in case of a full
>>>>>>>>> receive buffer. The sender knows that it has received such a
>>>>>>>>> returned message when the recvmsg() call returns zero.
>>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>> Anders Widell
>>>>>>>>>
>>>>>>>>> On 08/25/2016 11:30 AM, A V Mahesh wrote:
>>>>>>>>>> Hi HansN,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 8/23/2016 5:22 PM, Hans Nordebäck wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>
>>>>>>>>>>> Yes, this is my understanding too, if TIPC_DROPPABLE = true
>>>>>>>>>>> tipc may drop messages silently,  at receive sock buffer full
>>>>>>>>>>> condition,  but do not return any ancillary message.
>>>>>>>>>>> If TIPC_DROPPABLE = false tipc may drop message but will send
>>>>>>>>>>> an ancillary message to inform about TIPC_ERR_OVERLOAD.
>>>>>>>>>> [AVM]
>>>>>>>>>>
>>>>>>>>>> My observation are understanding is different, based on TIPC
>>>>>>>>>> code and Linux TIPC 2.0 Programmer's Guide , that the
>>>>>>>>>> TIPC_ERR_OVERLOAD error returned when TIPC is unable to enqueue
>>>>>>>>>> an incoming message on the receiving socket's receive queue
>>>>>>>>>> irrelevant of TIPC_DEST_DROPPABLE enabled or disabled.
>>>>>>>>>>
>>>>>>>>>> The only difference between TIPC_DEST_DROPPABLE enabled or
>>>>>>>>>> disabled is , If  TIPC_DEST_DROPPABLE enabled, the message is
>>>>>>>>>> discarded and
>>>>>>>>>> recvmsg() returned size is ZERO and application will get errors,
>>>>>>>>>> if TIPC_DEST_DROPPABLE disabled  the message is returned to the
>>>>>>>>>> sender it means the recvmsg() returned size is user send data
>>>>>>>>>> size and application will get errors .
>>>>>>>>>>
>>>>>>>>>> I did check the TIPC code and documentations and I haven't get
>>>>>>>>>> any evidences that  TIPC_ERR_OVERLOAD error code will be send
>>>>>>>>>> only If TIPC_DEST_DROPPABLE = false.
>>>>>>>>>>
>>>>>>>>>> Even while testing #1227
>>>>>>>>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my
>>>>>>>>>> observations and understanding was, an individual TIPC socket is
>>>>>>>>>> only allowed to queue up
>>>>>>>>>> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level
>>>>>>>>>> before it starts rejecting them.
>>>>>>>>>> Once a socket receiving queue length exceeds the maximum limit
>>>>>>>>>> value, the receiving socket will send out a reject message with
>>>>>>>>>> TIPC_ERR_OVERLOAD error code with cmsg_type as
>>>>>>>>>> TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0
>>>>>>>>>> Programmer's Guide  confirmed the same .
>>>>>>>>>>
>>>>>>>>>> tipc/socket.c
>>>>>>>>>> =======================================================
>>>>>>>>>> /* Reject message if there isn't room to queue it */
>>>>>>>>>>
>>>>>>>>>> recv_q_len = (u32)atomic_read(&tipc_queue_size);
>>>>>>>>>> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
>>>>>>>>>>        if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
>>>>>>>>>>            return TIPC_ERR_OVERLOAD; } recv_q_len =
>>>>>>>>>> skb_queue_len(&sk->sk_receive_queue);
>>>>>>>>>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
>>>>>>>>>>        if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE 
>>>>>>>>>> / 2))
>>>>>>>>>>            return TIPC_ERR_OVERLOAD; }
>>>>>>>>>> =======================================================
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2.1.17. setsockopt() of  TIPC 2.0 Programmer's Guide
>>>>>>>>>> =======================================================
>>>>>>>>>> TIPC_DEST_DROPPABLE
>>>>>>>>>> This option governs the handling of messages sent by the socket
>>>>>>>>>> if the message cannot be delivered to its destination, either
>>>>>>>>>> because the receiver is congested or because the specified
>>>>>>>>>> receiver does not exist. If enabled, the message is discarded;
>>>>>>>>>> otherwise the message is returned to the sender.
>>>>>>>>>>
>>>>>>>>>> By default, this option is disabled for SOCK_SEQPACKET and
>>>>>>>>>> SOCK_STREAM socket types, and enabled for SOCK_RDM and
>>>>>>>>>> SOCK_DGRAM, This arrangement ensures proper teardown of failed
>>>>>>>>>> connections when connection-oriented data transfer is used,
>>>>>>>>>> without increasing the complexity of connectionless data
>>>>>>>>>> transfer.
>>>>>>>>>>
>>>>>>>>>> TIPC_SRC_DROPPABLE
>>>>>>>>>> This option governs the handling of messages sent by the socket
>>>>>>>>>> if link congestion occurs. If enabled, the message is discarded;
>>>>>>>>>> otherwise the system queues the message for later transmission.
>>>>>>>>>> By default, this option is disabled for SOCK_SEQPACKET,
>>>>>>>>>> SOCK_STREAM, and SOCK_RDM socket types (resulting in "reliable"
>>>>>>>>>> data transfer), and enabled for SOCK_DGRAM (resulting in
>>>>>>>>>> "unreliable" data transfer).
>>>>>>>>>> =======================================================
>>>>>>>>>>
>>>>>>>>>> Now I will try to create OVERLOAD case and update you soon my
>>>>>>>>>> latest observations.
>>>>>>>>>>
>>>>>>>>>> -AVM
>>>>>>>>>>
>>>>>>>>>>> Correcting this and adding an abort is not backward compatible
>>>>>>>>>>> as some service already handle flow control in some way, only
>>>>>>>>>>> log when packages are dropped.
>>>>>>>>>>> Regarding ticket #1960 there are other solutions than
>>>>>>>>>>> introducing flow control in MDS, e.g. expose an option to the
>>>>>>>>>>> service to choose connection oriented or connection less.
>>>>>>>>>>> The problem with dropped messages seems in one case related to,
>>>>>>>>>>> (by MDS), intensive MDS logging.
>>>>>>>>>>>
>>>>>>>>>>> /Thanks HansN
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>>>>>>>>> Sent: den 23 augusti 2016 11:27
>>>>>>>>>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell
>>>>>>>>>>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com
>>>>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>>>>>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages
>>>>>>>>>>> [#1957]
>>>>>>>>>>>
>>>>>>>>>>> Hi HansN,
>>>>>>>>>>>
>>>>>>>>>>> It seems I am missing some thing , please allow me to under
>>>>>>>>>>> stand
>>>>>>>>>>>
>>>>>>>>>>> If I currently understand you observation :
>>>>>>>>>>>
>>>>>>>>>>> With current Opensaf code ( this #1957 patch NOT applied ) , by
>>>>>>>>>>> default TIPC_DROPPABLE=true ,while running Opensaf with that
>>>>>>>>>>> binary when TIPC_ERR_OVERLOAD  occurring, TIPC is not given
>>>>>>>>>>> errors TIPC_ERRINFO or  TIPC_RETDATA and following code is not
>>>>>>>>>>> being get hit of function recvfrom_connectionless(), is my
>>>>>>>>>>> understanding right ?
>>>>>>>>>>>
>>>>>>>>>>> ===============================================================
>>>>>>>>>>> ======
>>>>>>>>>>>
>>>>>>>>>>> ========================================
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *if (anc->cmsg_type == TIPC_ERRINFO) {*
>>>>>>>>>>>          /* TIPC_ERRINFO - TIPC error code associated with a
>>>>>>>>>>> returned data message or a connection termination message  so
>>>>>>>>>>> abort */
>>>>>>>>>>>          m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>>>>>>>>>>> condition
>>>>>>>>>>> ancillary
>>>>>>>>>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) );
>>>>>>>>>>> *abort();*
>>>>>>>>>>> *} else if (anc->cmsg_type == TIPC_RETDATA) {*
>>>>>>>>>>>          /* If we set TIPC_DEST_DROPPABLE off messge (configure
>>>>>>>>>>> TIPC to return rejected messages to the sender )
>>>>>>>>>>>             we will hit this when we implement MDS retransmit
>>>>>>>>>>> lost messages abort can be replaced with flow control logic*/
>>>>>>>>>>>          for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
>>>>>>>>>>>              m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", 
>>>>>>>>>>> *cptr);
>>>>>>>>>>>              cptr++;
>>>>>>>>>>>          }
>>>>>>>>>>>          /* TIPC_RETDATA -The contents of a returned data 
>>>>>>>>>>> message
>>>>>>>>>>> so abort */
>>>>>>>>>>>          m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>>>>>>>>>>> condition
>>>>>>>>>>> ancillary
>>>>>>>>>>> data: TIPC_RETDATA abort err :%s", strerror(errno) );
>>>>>>>>>>> *abort();*
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> ===============================================================
>>>>>>>>>>> ======
>>>>>>>>>>>
>>>>>>>>>>> ========================================
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -AVM
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 8/23/2016 1:08 PM, Hans Nordebäck wrote:
>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>
>>>>>>>>>>>> Please see response below with [HansN] /Thanks HansN
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>>>>>>>>>> Sent: den 23 augusti 2016 08:25
>>>>>>>>>>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders
>>>>>>>>>>>> Widell <anders.wid...@ericsson.com>; mathi.naic...@oracle.com
>>>>>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>>>>>>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages
>>>>>>>>>>>> [#1957]
>>>>>>>>>>>>
>>>>>>>>>>>> Hi HansN
>>>>>>>>>>>>
>>>>>>>>>>>> Please see response below with [AVM]
>>>>>>>>>>>>
>>>>>>>>>>>> -AVM
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/23/2016 11:41 AM, Hans Nordebäck wrote:
>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>
>>>>>>>>>>>>> please see comments below.
>>>>>>>>>>>>>
>>>>>>>>>>>>> /Thanks HansN
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 08/23/2016 07:21 AM, A V Mahesh wrote:
>>>>>>>>>>>>>> Hi HansN,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let us fist discuss the error handling and abort, then we
>>>>>>>>>>>>>> can come back to interpretation of  TIPC currently does
>>>>>>>>>>>>>> permit  OR does not permit an application to send a
>>>>>>>>>>>>>> multicast message with the "destination droppable" setting
>>>>>>>>>>>>>> disabled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to
>>>>>>>>>>>>>> return an undelivered multicast message to its sender and we
>>>>>>>>>>>>>> can determine issue is  because of TIPC_ERR_OVERLOAD, this
>>>>>>>>>>>>>> helps in debugging , so that application may increased
>>>>>>>>>>>>>> SO_SNDBUF/SO_RCVBUF to reduce the problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But still we need to abort(), the reason for that is current
>>>>>>>>>>>>>> MDS implementations doesn't have flow control logic ( no
>>>>>>>>>>>>>> retry because of error ) , so Application like AMF can go
>>>>>>>>>>>>>> wrong and cluster will go into unstable/recoverble state.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> [HansN] In the current implementation messages are dropped
>>>>>>>>>>>>> silently and no abort is done.
>>>>>>>>>>>> [AVM]  I can see  abort(); in current code , you mean abort();
>>>>>>>>>>>> is not working and application(amf) is not existing ?
>>>>>>>>>>>> [HansN] In case of TIPC_DROPPABLE=true and messages are
>>>>>>>>>>>> dropped,
>>>>>>>>>>>> (TIPC_ERR_OVERLOAD)  no abort is be performed, e.g amfd
>>>>>>>>>>>> detects this in the msg sanity chk and logs "invalid msg id
>>>>>>>>>>>> ..."
>>>>>>>>>>>> ==============================================================
>>>>>>>>>>>> ======
>>>>>>>>>>>>
>>>>>>>>>>>> ==
>>>>>>>>>>>> ======
>>>>>>>>>>>> if (anc->cmsg_type == TIPC_ERRINFO) {
>>>>>>>>>>>>           /* TIPC_ERRINFO - TIPC error code associated with a
>>>>>>>>>>>> returned data message or a connection termination message so
>>>>>>>>>>>> abort */
>>>>>>>>>>>>           m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>>>>>>>>>> condition ancillary
>>>>>>>>>>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) );
>>>>>>>>>>>> *abort();*
>>>>>>>>>>>> } else if (anc->cmsg_type == TIPC_RETDATA) {
>>>>>>>>>>>>           /* If we set TIPC_DEST_DROPPABLE off messge 
>>>>>>>>>>>> (configure
>>>>>>>>>>>> TIPC to return rejected messages to the sender )
>>>>>>>>>>>>              we will hit this when we implement MDS retransmit
>>>>>>>>>>>> lost messages abort can be replaced with flow control logic*/
>>>>>>>>>>>>           for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
>>>>>>>>>>>>               m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", 
>>>>>>>>>>>> *cptr);
>>>>>>>>>>>>               cptr++;
>>>>>>>>>>>>           }
>>>>>>>>>>>>           /* TIPC_RETDATA -The contents of a returned data
>>>>>>>>>>>> message  so abort */
>>>>>>>>>>>>           m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>>>>>>>>>> condition ancillary
>>>>>>>>>>>> data: TIPC_RETDATA abort err :%s", strerror(errno) );
>>>>>>>>>>>> *abort();*
>>>>>>>>>>>> }
>>>>>>>>>>>> ==============================================================
>>>>>>>>>>>> ======
>>>>>>>>>>>>
>>>>>>>>>>>> ==
>>>>>>>>>>>> ======
>>>>>>>>>>>>> This patch enables logging
>>>>>>>>>>>>> when packages are dropped to help in debugging. I don't agree
>>>>>>>>>>>>> that we should also introduce abort, but instead:
>>>>>>>>>>>>> 1) Implement a solution to handle dropped packages, ticket
>>>>>>>>>>>>> #1960
>>>>>>>>>>>> [AVM]  This is nothing but flow control implementation in MDS,
>>>>>>>>>>>> this is future enhancement
>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Investigate why packages may be dropped, the receiving MDS
>>>>>>>>>>>>> thread is a real time thread and should be able to consume a
>>>>>>>>>>>>> large amount of incoming messages.
>>>>>>>>>>>>> E.g. is the receiving MDS thread "live hanging" due to locks,
>>>>>>>>>>>>> file I/O etc?
>>>>>>>>>>>>>> This was the reason we haven't gone for it while addressing
>>>>>>>>>>>>>> Ticket
>>>>>>>>>>>>>> #1227
>>>>>>>>>>>>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/
>>>>>>>>>>>>>> ) So currently we don't have any advantage of disabling
>>>>>>>>>>>>>> TIPC_DEST_DROPPABLE and not allowing multicast messages.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -AVM
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>>>>>>>>>>>>>> osaf/libs/core/mds/mds_dt_tipc.c |  32
>>>>>>>>>>>>>>> +++++++++++++++++++++++++-------
>>>>>>>>>>>>>>>        1 files changed, 25 insertions(+), 7 deletions(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>>>>>>>>> b/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>>>>>>>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>>>>>>>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>>>>>>>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID 
>>>>>>>>>>>>>>> nodeid,
>>>>>>>>>>>>>>> m_MDS_LOG_INFO("MDTM: Successfully set
>>>>>>>>>>>>>>> default socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>        +        int droppable = 0;
>>>>>>>>>>>>>>> +        if (setsockopt(tipc_cb.BSRsock, SOL_TIPC,
>>>>>>>>>>>>>>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) {
>>>>>>>>>>>>>>> +                LOG_ER("MDTM: Can't set
>>>>>>>>>>>>>>> TIPC_DEST_DROPPABLE to
>>>>>>>>>>>>>>> + zero
>>>>>>>>>>>>>>> err :%s\n", strerror(errno));
>>>>>>>>>>>>>>> +                m_MDS_LOG_ERR("MDTM: Can't set
>>>>>>>>>>>>>>> + TIPC_DEST_DROPPABLE
>>>>>>>>>>>>>>> to zero err :%s\n", strerror(errno));
>>>>>>>>>>>>>>> +                osafassert(0);
>>>>>>>>>>>>>>> +        } else {
>>>>>>>>>>>>>>> + m_MDS_LOG_NOTIFY("MDTM: Successfully set
>>>>>>>>>>>>>>> TIPC_DEST_DROPPABLE to zero");
>>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>            return NCSCC_RC_SUCCESS;
>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>        @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless
>>>>>>>>>>>>>>> (int sd,
>>>>>>>>>>>>>>>            unsigned char *cptr;
>>>>>>>>>>>>>>>            int i;
>>>>>>>>>>>>>>>            int has_addr;
>>>>>>>>>>>>>>> +    int anc_data[2];
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>            ssize_t sz;
>>>>>>>>>>>>>>>              has_addr = (from != NULL) && (addrlen != 
>>>>>>>>>>>>>>> NULL);
>>>>>>>>>>>>>>> @@
>>>>>>>>>>>>>>> -591,19
>>>>>>>>>>>>>>> +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>>>>>>>>>>>>>>>                       if the message was sent using a TIPC
>>>>>>>>>>>>>>> name or name sequence as the
>>>>>>>>>>>>>>>                       destination rather than a TIPC 
>>>>>>>>>>>>>>> port ID
>>>>>>>>>>>>>>> So abort for TIPC_ERRINFO and TIPC_RETDATA*/
>>>>>>>>>>>>>>>                    if (anc->cmsg_type == TIPC_ERRINFO) {
>>>>>>>>>>>>>>> -                /* TIPC_ERRINFO - TIPC error code
>>>>>>>>>>>>>>> associated with a
>>>>>>>>>>>>>>> returned data message or a connection termination message
>>>>>>>>>>>>>>> so abort */
>>>>>>>>>>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>>>>>>>>>>>>> condition ancillary data: TIPC_ERRINFO abort err :%s",
>>>>>>>>>>>>>>> strerror(errno) );
>>>>>>>>>>>>>>> -                abort();
>>>>>>>>>>>>>>> +                anc_data[0] = *((unsigned
>>>>>>>>>>>>>>> int*)(CMSG_DATA(anc) +
>>>>>>>>>>>>>>> 0));
>>>>>>>>>>>>>>> +                if (anc_data[0] == TIPC_ERR_OVERLOAD) {
>>>>>>>>>>>>>>> +                    LOG_CR("MDTM: undelivered message
>>>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>> ancillary data: TIPC_ERR_OVERLOAD");
>>>>>>>>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>>>>>>>>>>>>> condition ancillary data: TIPC_ERR_OVERLOAD");
>>>>>>>>>>>>>>> +                } else {
>>>>>>>>>>>>>>> +                    /* TIPC_ERRINFO - TIPC error code
>>>>>>>>>>>>>>> associated
>>>>>>>>>>>>>>> with a returned data message or a connection termination
>>>>>>>>>>>>>>> message so abort */
>>>>>>>>>>>>>>> +                    LOG_CR("MDTM: undelivered message
>>>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
>>>>>>>>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>>>>>>>>>>>>> condition ancillary data: TIPC_ERRINFO abort err : %d",
>>>>>>>>>>>>>>> anc_data[0]);
>>>>>>>>>>>>>>> +                }
>>>>>>>>>>>>>>>                    } else if (anc->cmsg_type == 
>>>>>>>>>>>>>>> TIPC_RETDATA) {
>>>>>>>>>>>>>>> -                /* If we set TIPC_DEST_DROPPABLE off 
>>>>>>>>>>>>>>> messge
>>>>>>>>>>>>>>> (configure TIPC to return rejected messages to the sender )
>>>>>>>>>>>>>>> +                /* If we set TIPC_DEST_DROPPABLE off
>>>>>>>>>>>>>>> + message
>>>>>>>>>>>>>>> (configure TIPC to return rejected messages to the sender )
>>>>>>>>>>>>>>>                           we will hit this when we 
>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>> MDS retransmit lost messages  abort can be replaced with
>>>>>>>>>>>>>>> flow control logic*/
>>>>>>>>>>>>>>>                        for (i = anc->cmsg_len - 
>>>>>>>>>>>>>>> sizeof(*anc);
>>>>>>>>>>>>>>> i > 0;
>>>>>>>>>>>>>>> i--) {
>>>>>>>>>>>>>>> - m_MDS_LOG_DBG("MDTM: returned byte
>>>>>>>>>>>>>>> 0x%02x\n",
>>>>>>>>>>>>>>> *cptr);
>>>>>>>>>>>>>>> +                    LOG_CR("MDTM: returned byte 0x%02x\n",
>>>>>>>>>>>>>>> *cptr);
>>>>>>>>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: returned byte
>>>>>>>>>>>>>>> 0x%02x\n", *cptr);
>>>>>>>>>>>>>>>                            cptr++;
>>>>>>>>>>>>>>>                        }
>>>>>>>>>>>>>>>                        /* TIPC_RETDATA -The contents of a
>>>>>>>>>>>>>>> returned data message  so abort */
>>>>>>>>>>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>>>>>>>>>>>>> condition ancillary data: TIPC_RETDATA abort err :%s",
>>>>>>>>>>>>>>> strerror(errno) );
>>>>>>>>>>>>>>> -                abort();
>>>>>>>>>>>>>>> +                LOG_CR("MDTM: undelivered message
>>>>>>>>>>>>>>> + condition
>>>>>>>>>>>>>>> ancillary data: TIPC_RETDATA");
>>>>>>>>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered
>>>>>>>>>>>>>>> + message
>>>>>>>>>>>>>>> condition ancillary data: TIPC_RETDATA");
>>>>>>>>>>>>>>>                    } else if (anc->cmsg_type == 
>>>>>>>>>>>>>>> TIPC_DESTNAME) {
>>>>>>>>>>>>>>>                        if (sz == 0) {
>>>>>>>>>>>>>>> m_MDS_LOG_DBG("MDTM: recd bytes=0 on received on sock,
>>>>>>>>>>>>>>> abnormal/unknown condition. Ignoring");
>>
>> ------------------------------------------------------------------------------
>>  
>>
>> _______________________________________________
>> Opensaf-devel mailing list
>> Opensaf-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to