Hi HansN, >> any how GA is tagged.
Sorry I mean RC2 tagged -AVM On 9/21/2016 12:41 PM, A V Mahesh wrote: > Hi HansN, > > I just tested with uniform buffer sizes in all nodes and sending > messages with normal phase the results looks OK, > even after hitting the TIPC_ERR_OVERLOAD. > > So my conclusion is, in general all node will have same buffer sizes > let us go with V2 patch, any how GA is tagged , > so we have enough time for testing and if we get some issues we can > resolve them by next release. > > ================================================================================================== > > > > Sep 21 11:51:40 SC-1 osafamfd[15792]: NO Node 'PL-4' joined the cluster > Sep 21 11:51:40 SC-1 osafimmnd[15741]: NO Implementer connected: 17 > (MsgQueueService132111) <0, 2040f> > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: 7777 MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > > ================================================================================================== > > > > > On 9/21/2016 11:37 AM, A V Mahesh wrote: >> Hi HansN, >> >> On 9/20/2016 4:17 PM, Hans Nordebäck wrote: >>> Hi Mahesh, >>> >>> I think only logging is needed as proposed in the patch, as some >>> services are already handling dropped messages. This logging will >>> help in >>> trouble shooting. Keeping TIPC_DEST_DROPPABLE to true will only make >>> TIPC to silently drop messages, the original problem persists and >>> needs investigation, >>> i.e. why the socket receive buffer is overloaded, one reason may be >>> that the MDS poll/receive loop together with the "big" mutex lock, >>> (ticket #520). >> [AVM] One valid reason could be, in case of TIPC_ERR_OVERLOAD >> recd_bytes is NOT zero , so buffer is overloaded can occur at TIPC or >> MDS level , >> I will investigate more and update. >> >>> Did you check why MDS message loss mechanism doesn't detect on TIPC >>> dropped messages, AMF >>> do detect this via e.g "out of sync", "msg id mismatch" and so on? >> [AVM] You mean IMMD message loss mechanism ? >> >> -AVM >>> /Regards HansN >>> >>> -----Original Message----- >>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>> Sent: den 20 september 2016 12:29 >>> To: Anders Widell <anders.wid...@ericsson.com>; Hans Nordebäck >>> <hans.nordeb...@ericsson.com> >>> Cc: opensaf-devel@lists.sourceforge.net; mathi.naic...@oracle.com >>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >>> >>> HI Anders Widell / HansN, >>> >>> On 9/16/2016 2:03 PM, Anders Widell wrote: >>>> The idea was to just log reception of error info messages, for >>>> trouble-shooting purposes. >>> After multiple attempts, i manged to simulate TIPC_ERR_OVERLOAD >>> error. After TIPC_ERR_OVERLOAD error is hit >>> the cluster going to UN-recoverable state , because the send buffers >>> are full. >>> >>> So we have two options : >>> >>> 1) Set TIPC_DEST_DROPPABLE to false , log TIPC_ERR_OVERLOAD error >>> and then graceful exist of sender, >>> which allows remaining nodes to be survived. >>> >>> 2) keep the current configuration as it is ( TIPC_DEST_DROPPABLE to >>> true ) >>> >>> ================================================================================================================= >>> >>> >>> Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Received node_up from 2040f: >>> msg_id 1 >>> Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Node 'PL-4' joined the >>> cluster Sep 20 15:14:09 SC-1 osafimmnd[3695]: NO Implementer >>> connected: 19 >>> (MsgQueueService132111) <0, 2040f> >>> *Sep 20 15:16:59 SC-1 osafimmd[3684]: 77 MDTM: undelivered message >>> condition ancillary data: TIPC_ERR_OVERLOAD* Sep 20 15:17:00 SC-1 >>> osafimmnd[3695]: WA Director Service in NOACTIVE state - fevs >>> replies pending:1 fevs highest processed:218744 Sep 20 15:17:00 SC-1 >>> osafamfnd[3773]: NO >>> 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to >>> 'avaDown' : Recovery is 'nodeFailfast' >>> Sep 20 15:17:00 SC-1 osafamfnd[3773]: ER >>> safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due >>> to:avaDown Recovery is:nodeFailfast Sep 20 15:17:00 SC-1 >>> osafamfnd[3773]: Rebooting OpenSAF NodeId = 131343 EE Name = , >>> Reason: Component faulted: recovery is node failfast, OwnNodeId = >>> 131343, SupervisionTime = 60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: >>> WA DISCARD DUPLICATE FEVS >>> message:218744 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Error code 2 returned for >>> message type 82 - ignoring Sep 20 15:17:00 SC-1 opensaf_reboot: >>> Rebooting local node; timeout=60 Sep 20 15:17:00 SC-1 >>> osafimmnd[3695]: WA SC Absence IS allowed:900 IMMD service is DOWN >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO IMMD SERVICE IS DOWN, HYDRA >>> IS CONFIGURED => UNREGISTERING IMMND form MDS Sep 20 15:17:00 SC-1 >>> osafntfimcnd[3742]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE >>> (9) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client >>> id:20002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 1 <2, >>> 2010f> (safLogService) >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:d0d0002010f >>> sv_id:26 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:100002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 2 >>> <16, >>> 2010f> (@safLogService_appl) >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:130002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 3 >>> <19, >>> 2010f> (@OpenSafImmReplicatorA) >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:140002010f >>> sv_id:26 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:150002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 4 >>> <21, >>> 2010f> (safClmService) >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1a0002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 5 >>> <26, >>> 2010f> (safAmfService) >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1b0002010f >>> sv_id:26 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bc0002010f >>> sv_id:26 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bd0002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 6 >>> <1469, 2010f> (MsgQueueService131343) Sep 20 15:17:00 SC-1 >>> osafimmnd[3695]: NO Removing client id:5c00002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 10 >>> <1472, 2010f> (safEvtService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: >>> NO Removing client id:5c40002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 8 >>> <1476, 2010f> (safSmfService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: >>> NO Removing client id:5c60002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 9 >>> <1478, 2010f> (safLckService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: >>> NO Removing client id:5c70002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 7 >>> <1479, 2010f> (safMsgGrpService) Sep 20 15:17:00 SC-1 >>> osafimmnd[3695]: NO Removing client id:5cc0002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5ce0002010f >>> sv_id:27 >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 12 >>> <1486, 2010f> (safCheckPointService) Sep 20 15:17:00 SC-1 >>> osafimmnd[3695]: NO Implementer disconnected 13 <0, 2020f(down)> >>> (MsgQueueService131599) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO >>> Implementer disconnected 14 <0, 2020f(down)> >>> (@OpenSafImmReplicatorB) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO >>> Implementer disconnected 15 <0, 2020f(down)> (@safAmfService2020f) >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Impl Discarded node 2020f >>> Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 16 >>> <0, 2030f(down)> (MsgQueueService131855) Sep 20 15:17:00 SC-1 >>> osafimmnd[3695]: NO Impl Discarded node 2030f Sep 20 15:17:00 SC-1 >>> osafimmnd[3695]: NO Implementer disconnected 19 <0, 2040f(down)> >>> (MsgQueueService132111) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO >>> Impl Discarded node 2040f Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO >>> MDS unregisterede. sleeping ... >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO Sleep done registering >>> IMMND with MDS Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: >>> mds_register_callback: >>> dest 2010fe8fa0043 already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb60040 already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb6002e already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb60037 already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb60028 already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb6003d already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb6002b already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb6001c already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb60019 already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcba0012 already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb60028 already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO MDS: mds_register_callback: >>> dest 2010fdcb60019 already exist >>> Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO SUCCESS IN REGISTERING >>> IMMND WITH MDS Sep 20 15:17:01 SC-1 osafimmnd[3695]: NO Re-introduce-me >>> highestProcessed:218744 highestReceived:218744 Sep 20 15:17:03 SC-1 >>> kernel: [ 1794.198381] md: stopping all md devices. >>> Sep 20 15:17:03 SC-1 osafntfimcnd[8997]: WA ntfimcn_imm_init >>> saImmOiInitialize_2() returned SA_AIS_ERR_TIMEOUT (5) Sep 20 >>> 15:18:00 SC-1 syslog-ng[1221]: syslog-ng starting up; version='2.0.9' >>> ================================================================================================================= >>> >>> >>> >>> -AVM >>> >>> On 9/16/2016 2:03 PM, Anders Widell wrote: >>>> I don't think we need (or even should) inform the sender when MDS >>>> receives an error information message from TIPC. Note that these error >>>> information messages are received asynchronously, when the sender has >>>> already received an OK return code from the MDS send call. The idea >>>> was to just log reception of error info messages, for trouble-shooting >>>> purposes. We already have a mechanism in MDS that informs the receiver >>>> about lost MDS messages. If we wish to inform the sender we would need >>>> to introduce a second mechanism in MDS, and at this point I don't >>>> think it is needed. Another approach we could consider is that MDS >>>> retransmits the message transparently without informing the sender. >>>> This would require MDS to internally store sent messages for a while, >>>> so that they can be retransmitted. It would also require the receiver >>>> to re-order received messages, since a retransmitted message will be >>>> received out of sequence. >>>> >>>> regards, >>>> >>>> Anders Widell >>>> >>>> >>>> On 09/16/2016 06:40 AM, A V Mahesh wrote: >>>>> Hi HansN, >>>>> >>>>> I managed to create TIPC_ERRINFO/TIPC_RETDATA error cases ( not >>>>> TIPC_ERR_OVERLOAD error ) with normal messages and It is observed >>>>> that TIPC_DEST_DROPPABLE set to true even error TIPC_ERRINFO is NOT >>>>> notified ( it means TIPC_ERR_OVERLOAD ) , if TIPC_DEST_DROPPABLE set >>>>> to false TIPC_ERRINFO/TIPC_RETDATA errors are notified. >>>>> >>>>> Now I will also check implication of TIPC_DEST_DROPPABLE set to false >>>>> on multicast and broadcast messages, based on that we can re-arrange >>>>> the TIPC_DEST_DROPPABLE setting to false conditions based on agent >>>>> `i_msg_loss_indication = true` condition mds can return to agent the >>>>> same error TIPC_ERR_OVERLOAD. >>>>> >>>>> TIPC_DEST_DROPPABLE to false: >>>>> >>>>> ================================================================== >>>>> >>>>> Sep 15 16:10:39 SC-1 osafimmnd[32051]: NO Implementer disconnected 13 >>>>> <0, 2040f> (MsgQueueService132111) Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: NO MDS event >>>>> from svc_id 25 (change:4, dest:567413369208836) Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: >>>>> undelivered message condition ancillary data: TIPC_ERRINFO abort err >>>>> : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: 7777 MDTM: undelivered >>>>> message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: >>>>> undelivered message condition ancillary data: TIPC_ERRINFO abort err >>>>> : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: 7777 MDTM: undelivered >>>>> message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: >>>>> undelivered message condition ancillary data: TIPC_ERRINFO abort err >>>>> : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: 7777 MDTM: undelivered >>>>> message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: >>>>> undelivered message condition ancillary data: TIPC_ERRINFO abort err >>>>> : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: 7777 MDTM: undelivered >>>>> message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 >>>>> osafimmd[32040]: 7777 MDTM: undelivered message condition ancillary >>>>> data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafamfd[32114]: NO Node >>>>> 'PL-4' left the cluster >>>>> >>>>> ================================================================== >>>>> >>>>> TIPC_DEST_DROPPABLE to true: >>>>> >>>>> ================================================================== >>>>> >>>>> Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Implementer disconnected 13 >>>>> <0, 2040f> (MsgQueueService132111) Sep 15 15:59:55 SC-1 >>>>> osafimmd[26450]: NO MDS event from svc_id 25 (change:4, >>>>> dest:567412923957252) Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO >>>>> Global discard node received for nodeId:2040f pid:410 Sep 15 15:59:55 >>>>> SC-1 osafamfd[28810]: NO Node 'PL-4' left the cluster Sep 15 15:59:58 >>>>> SC-1 kernel: [ 5147.648737] tipc: Resetting link >>>>> <1.1.1:eth0-1.1.4:eth0>, peer not responding Sep 15 15:59:58 SC-1 >>>>> kernel: [ 5147.648756] tipc: Lost link <1.1.1:eth0-1.1.4:eth0> on >>>>> network plane A Sep 15 15:59:58 SC-1 kernel: [ 5147.648771] tipc: >>>>> Lost contact with <1.1.4> >>>>> >>>>> ================================================================== >>>>> >>>>> -AVM >>>>> >>>>> >>>>> On 9/1/2016 10:59 AM, Hans Nordebäck wrote: >>>>>> Hi Mahesh, >>>>>> >>>>>> I have not tested this, but the following should work: >>>>>> >>>>>> - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE >>>>>> >>>>>> - set socket receive buffer to a small value: >>>>>> >>>>>> optval = "small socket recieive buffer size" , 5000 ? >>>>>> >>>>>> setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, &optval, >>>>>> optlen) >>>>>> >>>>>> - sysctl -w net.tipc.tipc_rmem="5000 40000000 68240400" (or smaller >>>>>> values) >>>>>> >>>>>> - add some delays when processing messages in >>>>>> mdtm_process_recv_events(), to provoke overloading the socket >>>>>> receive buffer. >>>>>> >>>>>> We experience dropped packages in a 75 node system, and as a >>>>>> workaround increasing the default so receive buffer size it seems >>>>>> working for that setup. >>>>>> >>>>>> /Thanks HansN >>>>>> >>>>>> On 09/01/2016 05:50 AM, A V Mahesh wrote: >>>>>>> Hi HansN, >>>>>>> >>>>>>> Do you have any tips to created overload case, >>>>>>> >>>>>>> I would like test and observe TIPC_DEST_DROPPABLE enabled & >>>>>>> disabled cases. >>>>>>> >>>>>>> -AVM >>>>>>> >>>>>>> >>>>>>> On 9/1/2016 9:12 AM, A V Mahesh wrote: >>>>>>>> Hi HansN, >>>>>>>> >>>>>>>> Sorry for the delay. >>>>>>>> >>>>>>>> I will test it and get back to you soon. >>>>>>>> >>>>>>>> -AVM >>>>>>>> >>>>>>>> >>>>>>>> On 8/31/2016 4:29 PM, Hans Nordebäck wrote: >>>>>>>>> Hi Mahesh, >>>>>>>>> Any updates on this? >>>>>>>>> >>>>>>>>> /Regards HansN >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Anders Widell >>>>>>>>> Sent: den 25 augusti 2016 13:11 >>>>>>>>> To: A V Mahesh <mahesh.va...@oracle.com>; Hans Nordebäck >>>>>>>>> <hans.nordeb...@ericsson.com>; mathi.naic...@oracle.com >>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages >>>>>>>>> [#1957] >>>>>>>>> >>>>>>>>> Hi! >>>>>>>>> >>>>>>>>> This is what the TIPC user documentation says about >>>>>>>>> TIPC_DEST_DROPPABLE: >>>>>>>>> "This option governs the handling of messages sent by the socket >>>>>>>>> if the message cannot be delivered to its destination, either >>>>>>>>> because the receiver is congested or because the specified >>>>>>>>> receiver does not exist. >>>>>>>>> If enabled, the message is discarded; otherwise the message is >>>>>>>>> returned to the sender." >>>>>>>>> >>>>>>>>> This is what the TIPC user documentation says about the return >>>>>>>>> value from the recvmsg() system call: "When used with a >>>>>>>>> connectionless socket, a return value of 0 indicates the arrival >>>>>>>>> of a returned data message that was originally sent by this >>>>>>>>> socket." >>>>>>>>> >>>>>>>>> I think the documentation is pretty clear. If you set >>>>>>>>> TIPC_DEST_DROPPABLE to true, the receiver can discard messages >>>>>>>>> e.g. when the receive buffer is full. The sender will not be >>>>>>>>> notified in this case. If TIPC_DEST_DROPPABLE is set to false, >>>>>>>>> the message will be returned to the sender in case of a full >>>>>>>>> receive buffer. The sender knows that it has received such a >>>>>>>>> returned message when the recvmsg() call returns zero. >>>>>>>>> >>>>>>>>> regards, >>>>>>>>> Anders Widell >>>>>>>>> >>>>>>>>> On 08/25/2016 11:30 AM, A V Mahesh wrote: >>>>>>>>>> Hi HansN, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 8/23/2016 5:22 PM, Hans Nordebäck wrote: >>>>>>>>>> >>>>>>>>>>> Hi Mahesh, >>>>>>>>>>> >>>>>>>>>>> Yes, this is my understanding too, if TIPC_DROPPABLE = true >>>>>>>>>>> tipc may drop messages silently, at receive sock buffer full >>>>>>>>>>> condition, but do not return any ancillary message. >>>>>>>>>>> If TIPC_DROPPABLE = false tipc may drop message but will send >>>>>>>>>>> an ancillary message to inform about TIPC_ERR_OVERLOAD. >>>>>>>>>> [AVM] >>>>>>>>>> >>>>>>>>>> My observation are understanding is different, based on TIPC >>>>>>>>>> code and Linux TIPC 2.0 Programmer's Guide , that the >>>>>>>>>> TIPC_ERR_OVERLOAD error returned when TIPC is unable to enqueue >>>>>>>>>> an incoming message on the receiving socket's receive queue >>>>>>>>>> irrelevant of TIPC_DEST_DROPPABLE enabled or disabled. >>>>>>>>>> >>>>>>>>>> The only difference between TIPC_DEST_DROPPABLE enabled or >>>>>>>>>> disabled is , If TIPC_DEST_DROPPABLE enabled, the message is >>>>>>>>>> discarded and >>>>>>>>>> recvmsg() returned size is ZERO and application will get errors, >>>>>>>>>> if TIPC_DEST_DROPPABLE disabled the message is returned to the >>>>>>>>>> sender it means the recvmsg() returned size is user send data >>>>>>>>>> size and application will get errors . >>>>>>>>>> >>>>>>>>>> I did check the TIPC code and documentations and I haven't get >>>>>>>>>> any evidences that TIPC_ERR_OVERLOAD error code will be send >>>>>>>>>> only If TIPC_DEST_DROPPABLE = false. >>>>>>>>>> >>>>>>>>>> Even while testing #1227 >>>>>>>>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my >>>>>>>>>> observations and understanding was, an individual TIPC socket is >>>>>>>>>> only allowed to queue up >>>>>>>>>> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level >>>>>>>>>> before it starts rejecting them. >>>>>>>>>> Once a socket receiving queue length exceeds the maximum limit >>>>>>>>>> value, the receiving socket will send out a reject message with >>>>>>>>>> TIPC_ERR_OVERLOAD error code with cmsg_type as >>>>>>>>>> TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 >>>>>>>>>> Programmer's Guide confirmed the same . >>>>>>>>>> >>>>>>>>>> tipc/socket.c >>>>>>>>>> ======================================================= >>>>>>>>>> /* Reject message if there isn't room to queue it */ >>>>>>>>>> >>>>>>>>>> recv_q_len = (u32)atomic_read(&tipc_queue_size); >>>>>>>>>> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { >>>>>>>>>> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) >>>>>>>>>> return TIPC_ERR_OVERLOAD; } recv_q_len = >>>>>>>>>> skb_queue_len(&sk->sk_receive_queue); >>>>>>>>>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) { >>>>>>>>>> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE >>>>>>>>>> / 2)) >>>>>>>>>> return TIPC_ERR_OVERLOAD; } >>>>>>>>>> ======================================================= >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2.1.17. setsockopt() of TIPC 2.0 Programmer's Guide >>>>>>>>>> ======================================================= >>>>>>>>>> TIPC_DEST_DROPPABLE >>>>>>>>>> This option governs the handling of messages sent by the socket >>>>>>>>>> if the message cannot be delivered to its destination, either >>>>>>>>>> because the receiver is congested or because the specified >>>>>>>>>> receiver does not exist. If enabled, the message is discarded; >>>>>>>>>> otherwise the message is returned to the sender. >>>>>>>>>> >>>>>>>>>> By default, this option is disabled for SOCK_SEQPACKET and >>>>>>>>>> SOCK_STREAM socket types, and enabled for SOCK_RDM and >>>>>>>>>> SOCK_DGRAM, This arrangement ensures proper teardown of failed >>>>>>>>>> connections when connection-oriented data transfer is used, >>>>>>>>>> without increasing the complexity of connectionless data >>>>>>>>>> transfer. >>>>>>>>>> >>>>>>>>>> TIPC_SRC_DROPPABLE >>>>>>>>>> This option governs the handling of messages sent by the socket >>>>>>>>>> if link congestion occurs. If enabled, the message is discarded; >>>>>>>>>> otherwise the system queues the message for later transmission. >>>>>>>>>> By default, this option is disabled for SOCK_SEQPACKET, >>>>>>>>>> SOCK_STREAM, and SOCK_RDM socket types (resulting in "reliable" >>>>>>>>>> data transfer), and enabled for SOCK_DGRAM (resulting in >>>>>>>>>> "unreliable" data transfer). >>>>>>>>>> ======================================================= >>>>>>>>>> >>>>>>>>>> Now I will try to create OVERLOAD case and update you soon my >>>>>>>>>> latest observations. >>>>>>>>>> >>>>>>>>>> -AVM >>>>>>>>>> >>>>>>>>>>> Correcting this and adding an abort is not backward compatible >>>>>>>>>>> as some service already handle flow control in some way, only >>>>>>>>>>> log when packages are dropped. >>>>>>>>>>> Regarding ticket #1960 there are other solutions than >>>>>>>>>>> introducing flow control in MDS, e.g. expose an option to the >>>>>>>>>>> service to choose connection oriented or connection less. >>>>>>>>>>> The problem with dropped messages seems in one case related to, >>>>>>>>>>> (by MDS), intensive MDS logging. >>>>>>>>>>> >>>>>>>>>>> /Thanks HansN >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>>>>>>>>> Sent: den 23 augusti 2016 11:27 >>>>>>>>>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell >>>>>>>>>>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com >>>>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages >>>>>>>>>>> [#1957] >>>>>>>>>>> >>>>>>>>>>> Hi HansN, >>>>>>>>>>> >>>>>>>>>>> It seems I am missing some thing , please allow me to under >>>>>>>>>>> stand >>>>>>>>>>> >>>>>>>>>>> If I currently understand you observation : >>>>>>>>>>> >>>>>>>>>>> With current Opensaf code ( this #1957 patch NOT applied ) , by >>>>>>>>>>> default TIPC_DROPPABLE=true ,while running Opensaf with that >>>>>>>>>>> binary when TIPC_ERR_OVERLOAD occurring, TIPC is not given >>>>>>>>>>> errors TIPC_ERRINFO or TIPC_RETDATA and following code is not >>>>>>>>>>> being get hit of function recvfrom_connectionless(), is my >>>>>>>>>>> understanding right ? >>>>>>>>>>> >>>>>>>>>>> =============================================================== >>>>>>>>>>> ====== >>>>>>>>>>> >>>>>>>>>>> ======================================== >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *if (anc->cmsg_type == TIPC_ERRINFO) {* >>>>>>>>>>> /* TIPC_ERRINFO - TIPC error code associated with a >>>>>>>>>>> returned data message or a connection termination message so >>>>>>>>>>> abort */ >>>>>>>>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>>>> condition >>>>>>>>>>> ancillary >>>>>>>>>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) ); >>>>>>>>>>> *abort();* >>>>>>>>>>> *} else if (anc->cmsg_type == TIPC_RETDATA) {* >>>>>>>>>>> /* If we set TIPC_DEST_DROPPABLE off messge (configure >>>>>>>>>>> TIPC to return rejected messages to the sender ) >>>>>>>>>>> we will hit this when we implement MDS retransmit >>>>>>>>>>> lost messages abort can be replaced with flow control logic*/ >>>>>>>>>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>>>>>>>>>> m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", >>>>>>>>>>> *cptr); >>>>>>>>>>> cptr++; >>>>>>>>>>> } >>>>>>>>>>> /* TIPC_RETDATA -The contents of a returned data >>>>>>>>>>> message >>>>>>>>>>> so abort */ >>>>>>>>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>>>> condition >>>>>>>>>>> ancillary >>>>>>>>>>> data: TIPC_RETDATA abort err :%s", strerror(errno) ); >>>>>>>>>>> *abort();* >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> =============================================================== >>>>>>>>>>> ====== >>>>>>>>>>> >>>>>>>>>>> ======================================== >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -AVM >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 8/23/2016 1:08 PM, Hans Nordebäck wrote: >>>>>>>>>>>> Hi Mahesh, >>>>>>>>>>>> >>>>>>>>>>>> Please see response below with [HansN] /Thanks HansN >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>>>>>>>>>> Sent: den 23 augusti 2016 08:25 >>>>>>>>>>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders >>>>>>>>>>>> Widell <anders.wid...@ericsson.com>; mathi.naic...@oracle.com >>>>>>>>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>>>>>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages >>>>>>>>>>>> [#1957] >>>>>>>>>>>> >>>>>>>>>>>> Hi HansN >>>>>>>>>>>> >>>>>>>>>>>> Please see response below with [AVM] >>>>>>>>>>>> >>>>>>>>>>>> -AVM >>>>>>>>>>>> >>>>>>>>>>>> On 8/23/2016 11:41 AM, Hans Nordebäck wrote: >>>>>>>>>>>>> Hi Mahesh, >>>>>>>>>>>>> >>>>>>>>>>>>> please see comments below. >>>>>>>>>>>>> >>>>>>>>>>>>> /Thanks HansN >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 08/23/2016 07:21 AM, A V Mahesh wrote: >>>>>>>>>>>>>> Hi HansN, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Let us fist discuss the error handling and abort, then we >>>>>>>>>>>>>> can come back to interpretation of TIPC currently does >>>>>>>>>>>>>> permit OR does not permit an application to send a >>>>>>>>>>>>>> multicast message with the "destination droppable" setting >>>>>>>>>>>>>> disabled. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to >>>>>>>>>>>>>> return an undelivered multicast message to its sender and we >>>>>>>>>>>>>> can determine issue is because of TIPC_ERR_OVERLOAD, this >>>>>>>>>>>>>> helps in debugging , so that application may increased >>>>>>>>>>>>>> SO_SNDBUF/SO_RCVBUF to reduce the problem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> But still we need to abort(), the reason for that is current >>>>>>>>>>>>>> MDS implementations doesn't have flow control logic ( no >>>>>>>>>>>>>> retry because of error ) , so Application like AMF can go >>>>>>>>>>>>>> wrong and cluster will go into unstable/recoverble state. >>>>>>>>>>>>>> >>>>>>>>>>>>> [HansN] In the current implementation messages are dropped >>>>>>>>>>>>> silently and no abort is done. >>>>>>>>>>>> [AVM] I can see abort(); in current code , you mean abort(); >>>>>>>>>>>> is not working and application(amf) is not existing ? >>>>>>>>>>>> [HansN] In case of TIPC_DROPPABLE=true and messages are >>>>>>>>>>>> dropped, >>>>>>>>>>>> (TIPC_ERR_OVERLOAD) no abort is be performed, e.g amfd >>>>>>>>>>>> detects this in the msg sanity chk and logs "invalid msg id >>>>>>>>>>>> ..." >>>>>>>>>>>> ============================================================== >>>>>>>>>>>> ====== >>>>>>>>>>>> >>>>>>>>>>>> == >>>>>>>>>>>> ====== >>>>>>>>>>>> if (anc->cmsg_type == TIPC_ERRINFO) { >>>>>>>>>>>> /* TIPC_ERRINFO - TIPC error code associated with a >>>>>>>>>>>> returned data message or a connection termination message so >>>>>>>>>>>> abort */ >>>>>>>>>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>>>>> condition ancillary >>>>>>>>>>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) ); >>>>>>>>>>>> *abort();* >>>>>>>>>>>> } else if (anc->cmsg_type == TIPC_RETDATA) { >>>>>>>>>>>> /* If we set TIPC_DEST_DROPPABLE off messge >>>>>>>>>>>> (configure >>>>>>>>>>>> TIPC to return rejected messages to the sender ) >>>>>>>>>>>> we will hit this when we implement MDS retransmit >>>>>>>>>>>> lost messages abort can be replaced with flow control logic*/ >>>>>>>>>>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>>>>>>>>>>> m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", >>>>>>>>>>>> *cptr); >>>>>>>>>>>> cptr++; >>>>>>>>>>>> } >>>>>>>>>>>> /* TIPC_RETDATA -The contents of a returned data >>>>>>>>>>>> message so abort */ >>>>>>>>>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>>>>> condition ancillary >>>>>>>>>>>> data: TIPC_RETDATA abort err :%s", strerror(errno) ); >>>>>>>>>>>> *abort();* >>>>>>>>>>>> } >>>>>>>>>>>> ============================================================== >>>>>>>>>>>> ====== >>>>>>>>>>>> >>>>>>>>>>>> == >>>>>>>>>>>> ====== >>>>>>>>>>>>> This patch enables logging >>>>>>>>>>>>> when packages are dropped to help in debugging. I don't agree >>>>>>>>>>>>> that we should also introduce abort, but instead: >>>>>>>>>>>>> 1) Implement a solution to handle dropped packages, ticket >>>>>>>>>>>>> #1960 >>>>>>>>>>>> [AVM] This is nothing but flow control implementation in MDS, >>>>>>>>>>>> this is future enhancement >>>>>>>>>>>> >>>>>>>>>>>>> 2) Investigate why packages may be dropped, the receiving MDS >>>>>>>>>>>>> thread is a real time thread and should be able to consume a >>>>>>>>>>>>> large amount of incoming messages. >>>>>>>>>>>>> E.g. is the receiving MDS thread "live hanging" due to locks, >>>>>>>>>>>>> file I/O etc? >>>>>>>>>>>>>> This was the reason we haven't gone for it while addressing >>>>>>>>>>>>>> Ticket >>>>>>>>>>>>>> #1227 >>>>>>>>>>>>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/ >>>>>>>>>>>>>> ) So currently we don't have any advantage of disabling >>>>>>>>>>>>>> TIPC_DEST_DROPPABLE and not allowing multicast messages. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -AVM >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 8/18/2016 2:43 PM, Hans Nordeback wrote: >>>>>>>>>>>>>>> osaf/libs/core/mds/mds_dt_tipc.c | 32 >>>>>>>>>>>>>>> +++++++++++++++++++++++++------- >>>>>>>>>>>>>>> 1 files changed, 25 insertions(+), 7 deletions(-) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>>>>>>>>> b/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>>>>>>>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>>>>>>>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>>>>>>>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID >>>>>>>>>>>>>>> nodeid, >>>>>>>>>>>>>>> m_MDS_LOG_INFO("MDTM: Successfully set >>>>>>>>>>>>>>> default socket option TIPC_IMP = %d", TIPCIMPORTANCE); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> + int droppable = 0; >>>>>>>>>>>>>>> + if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >>>>>>>>>>>>>>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >>>>>>>>>>>>>>> + LOG_ER("MDTM: Can't set >>>>>>>>>>>>>>> TIPC_DEST_DROPPABLE to >>>>>>>>>>>>>>> + zero >>>>>>>>>>>>>>> err :%s\n", strerror(errno)); >>>>>>>>>>>>>>> + m_MDS_LOG_ERR("MDTM: Can't set >>>>>>>>>>>>>>> + TIPC_DEST_DROPPABLE >>>>>>>>>>>>>>> to zero err :%s\n", strerror(errno)); >>>>>>>>>>>>>>> + osafassert(0); >>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>> + m_MDS_LOG_NOTIFY("MDTM: Successfully set >>>>>>>>>>>>>>> TIPC_DEST_DROPPABLE to zero"); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> return NCSCC_RC_SUCCESS; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless >>>>>>>>>>>>>>> (int sd, >>>>>>>>>>>>>>> unsigned char *cptr; >>>>>>>>>>>>>>> int i; >>>>>>>>>>>>>>> int has_addr; >>>>>>>>>>>>>>> + int anc_data[2]; >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> ssize_t sz; >>>>>>>>>>>>>>> has_addr = (from != NULL) && (addrlen != >>>>>>>>>>>>>>> NULL); >>>>>>>>>>>>>>> @@ >>>>>>>>>>>>>>> -591,19 >>>>>>>>>>>>>>> +602,26 @@ ssize_t recvfrom_connectionless (int sd, >>>>>>>>>>>>>>> if the message was sent using a TIPC >>>>>>>>>>>>>>> name or name sequence as the >>>>>>>>>>>>>>> destination rather than a TIPC >>>>>>>>>>>>>>> port ID >>>>>>>>>>>>>>> So abort for TIPC_ERRINFO and TIPC_RETDATA*/ >>>>>>>>>>>>>>> if (anc->cmsg_type == TIPC_ERRINFO) { >>>>>>>>>>>>>>> - /* TIPC_ERRINFO - TIPC error code >>>>>>>>>>>>>>> associated with a >>>>>>>>>>>>>>> returned data message or a connection termination message >>>>>>>>>>>>>>> so abort */ >>>>>>>>>>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>>>>>>>> condition ancillary data: TIPC_ERRINFO abort err :%s", >>>>>>>>>>>>>>> strerror(errno) ); >>>>>>>>>>>>>>> - abort(); >>>>>>>>>>>>>>> + anc_data[0] = *((unsigned >>>>>>>>>>>>>>> int*)(CMSG_DATA(anc) + >>>>>>>>>>>>>>> 0)); >>>>>>>>>>>>>>> + if (anc_data[0] == TIPC_ERR_OVERLOAD) { >>>>>>>>>>>>>>> + LOG_CR("MDTM: undelivered message >>>>>>>>>>>>>>> condition >>>>>>>>>>>>>>> ancillary data: TIPC_ERR_OVERLOAD"); >>>>>>>>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>>>>>>>> condition ancillary data: TIPC_ERR_OVERLOAD"); >>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>> + /* TIPC_ERRINFO - TIPC error code >>>>>>>>>>>>>>> associated >>>>>>>>>>>>>>> with a returned data message or a connection termination >>>>>>>>>>>>>>> message so abort */ >>>>>>>>>>>>>>> + LOG_CR("MDTM: undelivered message >>>>>>>>>>>>>>> condition >>>>>>>>>>>>>>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >>>>>>>>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>>>>>>>> condition ancillary data: TIPC_ERRINFO abort err : %d", >>>>>>>>>>>>>>> anc_data[0]); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> } else if (anc->cmsg_type == >>>>>>>>>>>>>>> TIPC_RETDATA) { >>>>>>>>>>>>>>> - /* If we set TIPC_DEST_DROPPABLE off >>>>>>>>>>>>>>> messge >>>>>>>>>>>>>>> (configure TIPC to return rejected messages to the sender ) >>>>>>>>>>>>>>> + /* If we set TIPC_DEST_DROPPABLE off >>>>>>>>>>>>>>> + message >>>>>>>>>>>>>>> (configure TIPC to return rejected messages to the sender ) >>>>>>>>>>>>>>> we will hit this when we >>>>>>>>>>>>>>> implement >>>>>>>>>>>>>>> MDS retransmit lost messages abort can be replaced with >>>>>>>>>>>>>>> flow control logic*/ >>>>>>>>>>>>>>> for (i = anc->cmsg_len - >>>>>>>>>>>>>>> sizeof(*anc); >>>>>>>>>>>>>>> i > 0; >>>>>>>>>>>>>>> i--) { >>>>>>>>>>>>>>> - m_MDS_LOG_DBG("MDTM: returned byte >>>>>>>>>>>>>>> 0x%02x\n", >>>>>>>>>>>>>>> *cptr); >>>>>>>>>>>>>>> + LOG_CR("MDTM: returned byte 0x%02x\n", >>>>>>>>>>>>>>> *cptr); >>>>>>>>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: returned byte >>>>>>>>>>>>>>> 0x%02x\n", *cptr); >>>>>>>>>>>>>>> cptr++; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> /* TIPC_RETDATA -The contents of a >>>>>>>>>>>>>>> returned data message so abort */ >>>>>>>>>>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>>>>>>>> condition ancillary data: TIPC_RETDATA abort err :%s", >>>>>>>>>>>>>>> strerror(errno) ); >>>>>>>>>>>>>>> - abort(); >>>>>>>>>>>>>>> + LOG_CR("MDTM: undelivered message >>>>>>>>>>>>>>> + condition >>>>>>>>>>>>>>> ancillary data: TIPC_RETDATA"); >>>>>>>>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered >>>>>>>>>>>>>>> + message >>>>>>>>>>>>>>> condition ancillary data: TIPC_RETDATA"); >>>>>>>>>>>>>>> } else if (anc->cmsg_type == >>>>>>>>>>>>>>> TIPC_DESTNAME) { >>>>>>>>>>>>>>> if (sz == 0) { >>>>>>>>>>>>>>> m_MDS_LOG_DBG("MDTM: recd bytes=0 on received on sock, >>>>>>>>>>>>>>> abnormal/unknown condition. Ignoring"); >> >> ------------------------------------------------------------------------------ >> >> >> _______________________________________________ >> Opensaf-devel mailing list >> Opensaf-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel