Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, >> any how GA is tagged. Sorry I mean RC2 tagged -AVM On 9/21/2016 12:41 PM, A V Mahesh wrote: > Hi HansN, > > I just tested with uniform buffer sizes in all nodes and sending > messages with normal phase the results looks OK, > even after hitting the TIPC_ERR_OVERLOAD. > > So my conclusion is, in general all node will have same buffer sizes > let us go with V2 patch, any how GA is tagged , > so we have enough time for testing and if we get some issues we can > resolve them by next release. > > == > > > > Sep 21 11:51:40 SC-1 osafamfd[15792]: NO Node 'PL-4' joined the cluster > Sep 21 11:51:40 SC-1 osafimmnd[15741]: NO Implementer connected: 17 > (MsgQueueService132111) <0, 2040f> > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message > condition ancillary data size: 0 : TIPC_ERR_OVERLOAD > Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > > == > > > > > On 9/21/2016 11:37 AM, A V
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, I just tested with uniform buffer sizes in all nodes and sending messages with normal phase the results looks OK, even after hitting the TIPC_ERR_OVERLOAD. So my conclusion is, in general all node will have same buffer sizes let us go with V2 patch, any how GA is tagged , so we have enough time for testing and if we get some issues we can resolve them by next release. == Sep 21 11:51:40 SC-1 osafamfd[15792]: NO Node 'PL-4' joined the cluster Sep 21 11:51:40 SC-1 osafimmnd[15741]: NO Implementer connected: 17 (MsgQueueService132111) <0, 2040f> Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 21 11:52:41 SC-1 osafimmd[15730]: 77 MDTM: undelivered message condition ancillary data size: 0 : TIPC_ERR_OVERLOAD Sep 21 11:52:41 SC-1 osafimmd[15730]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA == On 9/21/2016 11:37 AM, A V Mahesh wrote: > Hi HansN, > > On 9/20/2016 4:17 PM, Hans Nordebäck wrote: >> Hi Mahesh, >> >> I think only logging is needed as proposed in the patch, as some services >> are already handling dropped messages. This logging will help in >> trouble shooting. Keeping TIPC_DEST_DROPPABLE to tr
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, On 9/20/2016 4:17 PM, Hans Nordebäck wrote: > Hi Mahesh, > > I think only logging is needed as proposed in the patch, as some services are > already handling dropped messages. This logging will help in > trouble shooting. Keeping TIPC_DEST_DROPPABLE to true will only make TIPC to > silently drop messages, the original problem persists and needs investigation, > i.e. why the socket receive buffer is overloaded, one reason may be that the > MDS poll/receive loop together with the "big" mutex lock, (ticket #520). [AVM] One valid reason could be, in case of TIPC_ERR_OVERLOAD recd_bytes is NOT zero , so buffer is overloaded can occur at TIPC or MDS level , I will investigate more and update. > Did you check why MDS message loss mechanism doesn't detect on TIPC dropped > messages, AMF > do detect this via e.g "out of sync", "msg id mismatch" and so on? [AVM] You mean IMMD message loss mechanism ? -AVM > > /Regards HansN > > -Original Message- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: den 20 september 2016 12:29 > To: Anders Widell ; Hans Nordebäck > > Cc: opensaf-devel@lists.sourceforge.net; mathi.naic...@oracle.com > Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] > > HI Anders Widell / HansN, > > On 9/16/2016 2:03 PM, Anders Widell wrote: >> The idea was to just log reception of error info messages, for >> trouble-shooting purposes. > After multiple attempts, i manged to simulate TIPC_ERR_OVERLOAD > error.After TIPC_ERR_OVERLOAD error is hit > the cluster going to UN-recoverable state , because the send buffers are full. > > So we have two options : > > 1) Set TIPC_DEST_DROPPABLE to false , log TIPC_ERR_OVERLOAD error and then > graceful exist of sender, >which allows remaining nodes to be survived. > > 2) keep the current configuration as it is ( TIPC_DEST_DROPPABLE to true ) > > = > Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Received node_up from 2040f: > msg_id 1 > Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Node 'PL-4' joined the cluster Sep 20 > 15:14:09 SC-1 osafimmnd[3695]: NO Implementer connected: 19 > (MsgQueueService132111) <0, 2040f> > *Sep 20 15:16:59 SC-1 osafimmd[3684]: 77 MDTM: undelivered message condition > ancillary data: TIPC_ERR_OVERLOAD* Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA > Director Service in NOACTIVE state - fevs replies pending:1 fevs highest > processed:218744 Sep 20 15:17:00 SC-1 osafamfnd[3773]: NO > 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : > Recovery is 'nodeFailfast' > Sep 20 15:17:00 SC-1 osafamfnd[3773]: ER > safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown > Recovery is:nodeFailfast Sep 20 15:17:00 SC-1 osafamfnd[3773]: Rebooting > OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is > node failfast, OwnNodeId = 131343, SupervisionTime = 60 Sep 20 15:17:00 SC-1 > osafimmnd[3695]: WA DISCARD DUPLICATE FEVS > message:218744 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Error code 2 returned for message > type 82 - ignoring Sep 20 15:17:00 SC-1 opensaf_reboot: Rebooting local node; > timeout=60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA SC Absence IS allowed:900 > IMMD service is DOWN Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO IMMD SERVICE IS > DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS Sep 20 15:17:00 > SC-1 osafntfimcnd[3742]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:20002010f > sv_id:27 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 1 <2, > 2010f> (safLogService) > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:d0d0002010f > sv_id:26 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:12010f > sv_id:27 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 2 <16, > 2010f> (@safLogService_appl) > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:130002010f > sv_id:27 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 3 <19, > 2010f> (@OpenSafImmReplicatorA) > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:140002010f > sv_id:26 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:150002010f > sv_id:27 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 4 <21, > 2010f> (safClmService) > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1a0002010f > sv_id:27 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 5 <26, > 2010f> (safAmfService) > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1b0002010f > sv_id:26 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bc0002010f > sv_id:26 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bd0002010f > sv_id:27 > Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implement
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi Mahesh, I think only logging is needed as proposed in the patch, as some services are already handling dropped messages. This logging will help in trouble shooting. Keeping TIPC_DEST_DROPPABLE to true will only make TIPC to silently drop messages, the original problem persists and needs investigation, i.e. why the socket receive buffer is overloaded, one reason may be that the MDS poll/receive loop together with the "big" mutex lock, (ticket #520). Did you check why MDS message loss mechanism doesn't detect on TIPC dropped messages, AMF do detect this via e.g "out of sync", "msg id mismatch" and so on? /Regards HansN -Original Message- From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: den 20 september 2016 12:29 To: Anders Widell ; Hans Nordebäck Cc: opensaf-devel@lists.sourceforge.net; mathi.naic...@oracle.com Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] HI Anders Widell / HansN, On 9/16/2016 2:03 PM, Anders Widell wrote: > The idea was to just log reception of error info messages, for > trouble-shooting purposes. After multiple attempts, i manged to simulate TIPC_ERR_OVERLOAD error.After TIPC_ERR_OVERLOAD error is hit the cluster going to UN-recoverable state , because the send buffers are full. So we have two options : 1) Set TIPC_DEST_DROPPABLE to false , log TIPC_ERR_OVERLOAD error and then graceful exist of sender, which allows remaining nodes to be survived. 2) keep the current configuration as it is ( TIPC_DEST_DROPPABLE to true ) = Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Received node_up from 2040f: msg_id 1 Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Node 'PL-4' joined the cluster Sep 20 15:14:09 SC-1 osafimmnd[3695]: NO Implementer connected: 19 (MsgQueueService132111) <0, 2040f> *Sep 20 15:16:59 SC-1 osafimmd[3684]: 77 MDTM: undelivered message condition ancillary data: TIPC_ERR_OVERLOAD* Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Director Service in NOACTIVE state - fevs replies pending:1 fevs highest processed:218744 Sep 20 15:17:00 SC-1 osafamfnd[3773]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Sep 20 15:17:00 SC-1 osafamfnd[3773]: ER safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Sep 20 15:17:00 SC-1 osafamfnd[3773]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA DISCARD DUPLICATE FEVS message:218744 Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Error code 2 returned for message type 82 - ignoring Sep 20 15:17:00 SC-1 opensaf_reboot: Rebooting local node; timeout=60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA SC Absence IS allowed:900 IMMD service is DOWN Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS Sep 20 15:17:00 SC-1 osafntfimcnd[3742]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:20002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 1 <2, 2010f> (safLogService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:d0d0002010f sv_id:26 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:12010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 2 <16, 2010f> (@safLogService_appl) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:130002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 3 <19, 2010f> (@OpenSafImmReplicatorA) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:140002010f sv_id:26 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:150002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 4 <21, 2010f> (safClmService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1a0002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 5 <26, 2010f> (safAmfService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1b0002010f sv_id:26 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bc0002010f sv_id:26 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bd0002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 6 <1469, 2010f> (MsgQueueService131343) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c2010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 10 <1472, 2010f> (safEvtService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c40002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 8 <1476, 2010f> (safSmfService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
HI Anders Widell / HansN, On 9/16/2016 2:03 PM, Anders Widell wrote: > The idea was to just log reception of error info messages, for > trouble-shooting purposes. After multiple attempts, i manged to simulate TIPC_ERR_OVERLOAD error.After TIPC_ERR_OVERLOAD error is hit the cluster going to UN-recoverable state , because the send buffers are full. So we have two options : 1) Set TIPC_DEST_DROPPABLE to false , log TIPC_ERR_OVERLOAD error and then graceful exist of sender, which allows remaining nodes to be survived. 2) keep the current configuration as it is ( TIPC_DEST_DROPPABLE to true ) = Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Received node_up from 2040f: msg_id 1 Sep 20 15:14:09 SC-1 osafamfd[3759]: NO Node 'PL-4' joined the cluster Sep 20 15:14:09 SC-1 osafimmnd[3695]: NO Implementer connected: 19 (MsgQueueService132111) <0, 2040f> *Sep 20 15:16:59 SC-1 osafimmd[3684]: 77 MDTM: undelivered message condition ancillary data: TIPC_ERR_OVERLOAD* Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Director Service in NOACTIVE state - fevs replies pending:1 fevs highest processed:218744 Sep 20 15:17:00 SC-1 osafamfnd[3773]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Sep 20 15:17:00 SC-1 osafamfnd[3773]: ER safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Sep 20 15:17:00 SC-1 osafamfnd[3773]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA DISCARD DUPLICATE FEVS message:218744 Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA Error code 2 returned for message type 82 - ignoring Sep 20 15:17:00 SC-1 opensaf_reboot: Rebooting local node; timeout=60 Sep 20 15:17:00 SC-1 osafimmnd[3695]: WA SC Absence IS allowed:900 IMMD service is DOWN Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS Sep 20 15:17:00 SC-1 osafntfimcnd[3742]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:20002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 1 <2, 2010f> (safLogService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:d0d0002010f sv_id:26 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:12010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 2 <16, 2010f> (@safLogService_appl) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:130002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 3 <19, 2010f> (@OpenSafImmReplicatorA) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:140002010f sv_id:26 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:150002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 4 <21, 2010f> (safClmService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1a0002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 5 <26, 2010f> (safAmfService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:1b0002010f sv_id:26 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bc0002010f sv_id:26 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5bd0002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 6 <1469, 2010f> (MsgQueueService131343) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c2010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 10 <1472, 2010f> (safEvtService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c40002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 8 <1476, 2010f> (safSmfService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c60002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 9 <1478, 2010f> (safLckService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5c70002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 7 <1479, 2010f> (safMsgGrpService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5cc0002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Removing client id:5ce0002010f sv_id:27 Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 12 <1486, 2010f> (safCheckPointService) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 13 <0, 2020f(down)> (MsgQueueService131599) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 14 <0, 2020f(down)> (@OpenSafImmReplicatorB) Sep 20 15:17:00 SC-1 osafimmnd[3695]: NO Implementer disconnected 15 <0, 2020f(down)> (@safAmfService2020f) Sep
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi Anders Widell, On 9/16/2016 2:03 PM, Anders Widell wrote: > Another approach we could consider is that MDS retransmits the message > transparently without informing the sender. This would require MDS to > internally store sent messages for a while, so that they can be > retransmitted. It would also require the receiver to re-order received > messages, since a retransmitted message will be received out of sequence. This is nothing but flow control , this is a long pending enhancement . -AVM On 9/16/2016 2:03 PM, Anders Widell wrote: > > I don't think we need (or even should) inform the sender when MDS > receives an error information message from TIPC. Note that these error > information messages are received asynchronously, when the sender has > already received an OK return code from the MDS send call. The idea > was to just log reception of error info messages, for trouble-shooting > purposes. We already have a mechanism in MDS that informs the receiver > about lost MDS messages. If we wish to inform the sender we would need > to introduce a second mechanism in MDS, and at this point I don't > think it is needed. Another approach we could consider is that MDS > retransmits the message transparently without informing the sender. > This would require MDS to internally store sent messages for a while, > so that they can be retransmitted. It would also require the receiver > to re-order received messages, since a retransmitted message will be > received out of sequence. > > regards, > > Anders Widell > > > On 09/16/2016 06:40 AM, A V Mahesh wrote: >> Hi HansN, >> >> I managed to create TIPC_ERRINFO/TIPC_RETDATA error cases ( not >> TIPC_ERR_OVERLOAD error ) with normal messages >> and It is observed that TIPC_DEST_DROPPABLE set to true even error >> TIPC_ERRINFO is NOT notified ( it means TIPC_ERR_OVERLOAD ) , >> if TIPC_DEST_DROPPABLE set to false TIPC_ERRINFO/TIPC_RETDATA errors >> are notified. >> >> Now I will also check implication of TIPC_DEST_DROPPABLE set to false >> on multicast and broadcast messages, based on that >> we can re-arrange the TIPC_DEST_DROPPABLE setting to false >> conditions based on agent `i_msg_loss_indication = true` condition >> mds can return to agent the same error TIPC_ERR_OVERLOAD. >> >> TIPC_DEST_DROPPABLE to false: >> >> == >> >> Sep 15 16:10:39 SC-1 osafimmnd[32051]: NO Implementer disconnected 13 >> <0, 2040f> (MsgQueueService132111) >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message >> condition ancillary data: TIPC_RETDATA >> Sep 15 16:10:39 SC-1 osafimmd[32040]: NO MDS event from svc_id 25 >> (change:4, dest:567413369208836) >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message >> condition ancillary data: TIPC_RETDATA >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message >> condition ancillary data: TIPC_RETDATA >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message >> condition ancillary data: TIPC_RETDATA >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message >> condition ancillary data: TIPC_RETDATA >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message >> condition ancillary data: TIPC_RETDATA >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message >> condition ancillary data: TIPC_RETDATA >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message >> condition ancillary data: TIPC_RETDATA >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message >> condition ancillary data: TIPC_RETDATA >> Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : 2 >> Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undel
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
I don't think we need (or even should) inform the sender when MDS receives an error information message from TIPC. Note that these error information messages are received asynchronously, when the sender has already received an OK return code from the MDS send call. The idea was to just log reception of error info messages, for trouble-shooting purposes. We already have a mechanism in MDS that informs the receiver about lost MDS messages. If we wish to inform the sender we would need to introduce a second mechanism in MDS, and at this point I don't think it is needed. Another approach we could consider is that MDS retransmits the message transparently without informing the sender. This would require MDS to internally store sent messages for a while, so that they can be retransmitted. It would also require the receiver to re-order received messages, since a retransmitted message will be received out of sequence. regards, Anders Widell On 09/16/2016 06:40 AM, A V Mahesh wrote: > Hi HansN, > > I managed to create TIPC_ERRINFO/TIPC_RETDATA error cases ( not > TIPC_ERR_OVERLOAD error ) with normal messages > and It is observed that TIPC_DEST_DROPPABLE set to true even error > TIPC_ERRINFO is NOT notified ( it means TIPC_ERR_OVERLOAD ) , > if TIPC_DEST_DROPPABLE set to false TIPC_ERRINFO/TIPC_RETDATA errors > are notified. > > Now I will also check implication of TIPC_DEST_DROPPABLE set to false > on multicast and broadcast messages, based on that > we can re-arrange the TIPC_DEST_DROPPABLE setting to false conditions > based on agent `i_msg_loss_indication = true` condition > mds can return to agent the same error TIPC_ERR_OVERLOAD. > > TIPC_DEST_DROPPABLE to false: > > == > > Sep 15 16:10:39 SC-1 osafimmnd[32051]: NO Implementer disconnected 13 > <0, 2040f> (MsgQueueService132111) > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafimmd[32040]: NO MDS event from svc_id 25 > (change:4, dest:567413369208836) > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : 2 > Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message > condition ancillary data: TIPC_RETDATA > Sep 15 16:10:39 SC-1 osafamfd[32114]: NO Node 'PL-4' left the cluster > > == > > TIPC_DEST_DROPPABLE to true: > > == > > Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Implementer disconnected 13 > <0, 2040f> (MsgQueueService132111) > Sep 15 15:59:55 SC-1 osafimmd[26450]: NO MDS event from svc_id 25 > (change:4, dest:567412923957252) > Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Global discard node received > for nodeId:2040f pid:410 > Sep 15 15:59:55 SC-1 osafamfd
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, I managed to create TIPC_ERRINFO/TIPC_RETDATA error cases ( not TIPC_ERR_OVERLOAD error ) with normal messages and It is observed that TIPC_DEST_DROPPABLE set to true even error TIPC_ERRINFO is NOT notified ( it means TIPC_ERR_OVERLOAD ) , if TIPC_DEST_DROPPABLE set to false TIPC_ERRINFO/TIPC_RETDATA errors are notified. Now I will also check implication of TIPC_DEST_DROPPABLE set to false on multicast and broadcast messages, based on that we can re-arrange the TIPC_DEST_DROPPABLE setting to false conditions based on agent `i_msg_loss_indication = true` condition mds can return to agent the same error TIPC_ERR_OVERLOAD. TIPC_DEST_DROPPABLE to false: == Sep 15 16:10:39 SC-1 osafimmnd[32051]: NO Implementer disconnected 13 <0, 2040f> (MsgQueueService132111) Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: NO MDS event from svc_id 25 (change:4, dest:567413369208836) Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafimmd[32040]: 777 MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : 2 Sep 15 16:10:39 SC-1 osafimmd[32040]: MDTM: undelivered message condition ancillary data: TIPC_RETDATA Sep 15 16:10:39 SC-1 osafamfd[32114]: NO Node 'PL-4' left the cluster == TIPC_DEST_DROPPABLE to true: == Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Implementer disconnected 13 <0, 2040f> (MsgQueueService132111) Sep 15 15:59:55 SC-1 osafimmd[26450]: NO MDS event from svc_id 25 (change:4, dest:567412923957252) Sep 15 15:59:55 SC-1 osafimmnd[26461]: NO Global discard node received for nodeId:2040f pid:410 Sep 15 15:59:55 SC-1 osafamfd[28810]: NO Node 'PL-4' left the cluster Sep 15 15:59:58 SC-1 kernel: [ 5147.648737] tipc: Resetting link <1.1.1:eth0-1.1.4:eth0>, peer not responding Sep 15 15:59:58 SC-1 kernel: [ 5147.648756] tipc: Lost link <1.1.1:eth0-1.1.4:eth0> on network plane A Sep 15 15:59:58 SC-1 kernel: [ 5147.648771] tipc: Lost contact with <1.1.4> == -AVM On 9/1/2016 10:59 AM, Hans Nordebäck wrote: > Hi Mahesh, > > I have not tested this, but the following should work: > > - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE > > - set socket receive buffer to a small value: > > optval = "small socket recieive buffer size" , 5000 ? > > setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, &optval, optlen) > > - sysctl -w net.tipc.tipc_rmem="5000 4000 68240400" (or smaller > values) > > - add some delays when processing messages in > mdtm_process_recv_events(), to provoke overloading the socket receive > buffer. > > We experience dropped packages in a 75 node system, and as a > workaround increasing the default so receive buffer size it seems > working for that setup. > > /Than
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, So far I was not successful in creating TIPC_ERR_OVERLOAD case , so I am planing to rebuilding `tipc.ko` with less OVERLOAD_LIMIT_BASE value of tipc. Currently I am working on priority open tickets on the 5.1.RC1 milestone, I will get back to you soon. -AVM On 9/8/2016 2:02 PM, Hans Nordebäck wrote: > Hi Mahesh, > > Any updates on this? > > /Thanks HansN > > -Original Message- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: den 1 september 2016 07:55 > To: Hans Nordebäck > Cc: opensaf-devel@lists.sourceforge.net; Anders Widell > ; mathi.naic...@oracle.com > Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] > > Hi HansN, > > >> I have not tested this > > Ok Thanks for the tips, I will check partially TIPC_DEST_DROPPABLE enabled & > disabled case and then we can conclude . > > -AVM > > On 9/1/2016 10:59 AM, Hans Nordebäck wrote: >> Hi Mahesh, >> >> I have not tested this, but the following should work: >> >> - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE >> >> - set socket receive buffer to a small value: >> >>optval = "small socket recieive buffer size" , 5000 ? >> >>setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, &optval, optlen) >> >> - sysctl -w net.tipc.tipc_rmem="5000 4000 68240400" (or smaller >> values) >> >> - add some delays when processing messages in >> mdtm_process_recv_events(), to provoke overloading the socket receive >> buffer. >> >> We experience dropped packages in a 75 node system, and as a >> workaround increasing the default so receive buffer size it seems >> working for that setup. >> >> /Thanks HansN >> >> On 09/01/2016 05:50 AM, A V Mahesh wrote: >>> Hi HansN, >>> >>> Do you have any tips to created overload case, >>> >>> I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled >>> cases. >>> >>> -AVM >>> >>> >>> On 9/1/2016 9:12 AM, A V Mahesh wrote: Hi HansN, Sorry for the delay. I will test it and get back to you soon. -AVM On 8/31/2016 4:29 PM, Hans Nordebäck wrote: > Hi Mahesh, > Any updates on this? > > /Regards HansN > > -Original Message- > From: Anders Widell > Sent: den 25 augusti 2016 13:11 > To: A V Mahesh ; Hans Nordebäck > ; mathi.naic...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] > > Hi! > > This is what the TIPC user documentation says about > TIPC_DEST_DROPPABLE: > "This option governs the handling of messages sent by the socket if > the message cannot be delivered to its destination, either because > the receiver is congested or because the specified receiver does > not exist. > If enabled, the message is discarded; otherwise the message is > returned to the sender." > > This is what the TIPC user documentation says about the return > value from the recvmsg() system call: "When used with a > connectionless socket, a return value of 0 indicates the arrival of > a returned data message that was originally sent by this socket." > > I think the documentation is pretty clear. If you set > TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. > when the receive buffer is full. The sender will not be notified in > this case. If TIPC_DEST_DROPPABLE is set to false, the message will > be returned to the sender in case of a full receive buffer. The > sender knows that it has received such a returned message when the > recvmsg() call returns zero. > > regards, > Anders Widell > > On 08/25/2016 11:30 AM, A V Mahesh wrote: >> Hi HansN, >> >> >> On 8/23/2016 5:22 PM, Hans Nordebäck wrote: >> >>> Hi Mahesh, >>> >>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc >>> may drop messages silently, at receive sock buffer full >>> condition, but do not return any ancillary message. >>> If TIPC_DROPPABLE = false tipc may drop message but will send an >>> ancillary message to inform about TIPC_ERR_OVERLOAD. >> [AVM] >> >> My observation are understanding is different, based on TIPC code >> and Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD >> error returned when TIPC is unable to enqueue an incoming message >> on the receiving socket's receive queue irrelevant of >> TIPC_DEST_DROPPABLE enabled or disabled. >> >> The only difference between TIPC_DEST_DROPPABLE enabled or >> disabled is , If TIPC_DEST_DROPPABLE enabled, the message is >> discarded and >> recvmsg() returned size is ZERO and application will get errors, >> if TIPC_DEST_DROPPABLE disabled the message is returned to the >> sender it means the recvmsg() returned size is user send data size >> and application will get errors . >> >> I did check the TIPC code and documentations and
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi Mahesh, Any updates on this? /Thanks HansN -Original Message- From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: den 1 september 2016 07:55 To: Hans Nordebäck Cc: opensaf-devel@lists.sourceforge.net; Anders Widell ; mathi.naic...@oracle.com Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] Hi HansN, >> I have not tested this Ok Thanks for the tips, I will check partially TIPC_DEST_DROPPABLE enabled & disabled case and then we can conclude . -AVM On 9/1/2016 10:59 AM, Hans Nordebäck wrote: > Hi Mahesh, > > I have not tested this, but the following should work: > > - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE > > - set socket receive buffer to a small value: > > optval = "small socket recieive buffer size" , 5000 ? > > setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, &optval, optlen) > > - sysctl -w net.tipc.tipc_rmem="5000 4000 68240400" (or smaller > values) > > - add some delays when processing messages in > mdtm_process_recv_events(), to provoke overloading the socket receive > buffer. > > We experience dropped packages in a 75 node system, and as a > workaround increasing the default so receive buffer size it seems > working for that setup. > > /Thanks HansN > > On 09/01/2016 05:50 AM, A V Mahesh wrote: >> Hi HansN, >> >> Do you have any tips to created overload case, >> >> I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled >> cases. >> >> -AVM >> >> >> On 9/1/2016 9:12 AM, A V Mahesh wrote: >>> Hi HansN, >>> >>> Sorry for the delay. >>> >>> I will test it and get back to you soon. >>> >>> -AVM >>> >>> >>> On 8/31/2016 4:29 PM, Hans Nordebäck wrote: Hi Mahesh, Any updates on this? /Regards HansN -Original Message- From: Anders Widell Sent: den 25 augusti 2016 13:11 To: A V Mahesh ; Hans Nordebäck ; mathi.naic...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] Hi! This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE: "This option governs the handling of messages sent by the socket if the message cannot be delivered to its destination, either because the receiver is congested or because the specified receiver does not exist. If enabled, the message is discarded; otherwise the message is returned to the sender." This is what the TIPC user documentation says about the return value from the recvmsg() system call: "When used with a connectionless socket, a return value of 0 indicates the arrival of a returned data message that was originally sent by this socket." I think the documentation is pretty clear. If you set TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. when the receive buffer is full. The sender will not be notified in this case. If TIPC_DEST_DROPPABLE is set to false, the message will be returned to the sender in case of a full receive buffer. The sender knows that it has received such a returned message when the recvmsg() call returns zero. regards, Anders Widell On 08/25/2016 11:30 AM, A V Mahesh wrote: > Hi HansN, > > > On 8/23/2016 5:22 PM, Hans Nordebäck wrote: > >> Hi Mahesh, >> >> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc >> may drop messages silently, at receive sock buffer full >> condition, but do not return any ancillary message. >> If TIPC_DROPPABLE = false tipc may drop message but will send an >> ancillary message to inform about TIPC_ERR_OVERLOAD. > [AVM] > > My observation are understanding is different, based on TIPC code > and Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD > error returned when TIPC is unable to enqueue an incoming message > on the receiving socket's receive queue irrelevant of > TIPC_DEST_DROPPABLE enabled or disabled. > > The only difference between TIPC_DEST_DROPPABLE enabled or > disabled is , If TIPC_DEST_DROPPABLE enabled, the message is > discarded and > recvmsg() returned size is ZERO and application will get errors, > if TIPC_DEST_DROPPABLE disabled the message is returned to the > sender it means the recvmsg() returned size is user send data size > and application will get errors . > > I did check the TIPC code and documentations and I haven't get > any evidences that TIPC_ERR_OVERLOAD error code will be send only > If TIPC_DEST_DROPPABLE = false. > > Even while testing #1227 > (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my > observations and understanding was, an individual TIPC socket is > only allowed to queue up > OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level > before it starts rejecting them
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, >> I have not tested this Ok Thanks for the tips, I will check partially TIPC_DEST_DROPPABLE enabled & disabled case and then we can conclude . -AVM On 9/1/2016 10:59 AM, Hans Nordebäck wrote: > Hi Mahesh, > > I have not tested this, but the following should work: > > - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE > > - set socket receive buffer to a small value: > > optval = "small socket recieive buffer size" , 5000 ? > > setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, &optval, optlen) > > - sysctl -w net.tipc.tipc_rmem="5000 4000 68240400" (or smaller > values) > > - add some delays when processing messages in > mdtm_process_recv_events(), to provoke overloading the socket receive > buffer. > > We experience dropped packages in a 75 node system, and as a > workaround increasing the default so receive buffer size it seems > working for that setup. > > /Thanks HansN > > On 09/01/2016 05:50 AM, A V Mahesh wrote: >> Hi HansN, >> >> Do you have any tips to created overload case, >> >> I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled >> cases. >> >> -AVM >> >> >> On 9/1/2016 9:12 AM, A V Mahesh wrote: >>> Hi HansN, >>> >>> Sorry for the delay. >>> >>> I will test it and get back to you soon. >>> >>> -AVM >>> >>> >>> On 8/31/2016 4:29 PM, Hans Nordebäck wrote: Hi Mahesh, Any updates on this? /Regards HansN -Original Message- From: Anders Widell Sent: den 25 augusti 2016 13:11 To: A V Mahesh ; Hans Nordebäck ; mathi.naic...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] Hi! This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE: "This option governs the handling of messages sent by the socket if the message cannot be delivered to its destination, either because the receiver is congested or because the specified receiver does not exist. If enabled, the message is discarded; otherwise the message is returned to the sender." This is what the TIPC user documentation says about the return value from the recvmsg() system call: "When used with a connectionless socket, a return value of 0 indicates the arrival of a returned data message that was originally sent by this socket." I think the documentation is pretty clear. If you set TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. when the receive buffer is full. The sender will not be notified in this case. If TIPC_DEST_DROPPABLE is set to false, the message will be returned to the sender in case of a full receive buffer. The sender knows that it has received such a returned message when the recvmsg() call returns zero. regards, Anders Widell On 08/25/2016 11:30 AM, A V Mahesh wrote: > Hi HansN, > > > On 8/23/2016 5:22 PM, Hans Nordebäck wrote: > >> Hi Mahesh, >> >> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may >> drop messages silently, at receive sock buffer full condition, but >> do not return any ancillary message. >> If TIPC_DROPPABLE = false tipc may drop message but will send an >> ancillary message to inform about TIPC_ERR_OVERLOAD. > [AVM] > > My observation are understanding is different, based on TIPC code and > Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error > returned when TIPC is unable to enqueue an incoming message on the > receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE > enabled or disabled. > > The only difference between TIPC_DEST_DROPPABLE enabled or > disabled is > , If TIPC_DEST_DROPPABLE enabled, the message is discarded and > recvmsg() returned size is ZERO and application will get errors, if > TIPC_DEST_DROPPABLE disabled the message is returned to the > sender it > means the recvmsg() returned size is user send data size and > application will get errors . > > I did check the TIPC code and documentations and I haven't get any > evidences that TIPC_ERR_OVERLOAD error code will be send only If > TIPC_DEST_DROPPABLE = false. > > Even while testing #1227 > (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my > observations and understanding was, an individual TIPC socket is only > allowed to queue up > OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before > it starts rejecting them. > Once a socket receiving queue length exceeds the maximum limit value, > the receiving socket will send out a reject message with > TIPC_ERR_OVERLOAD error code with cmsg_type as > TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 > Programmer's Guide confirmed the same . > > tipc/socket.
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi Mahesh, I have not tested this, but the following should work: - Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE - set socket receive buffer to a small value: optval = "small socket recieive buffer size" , 5000 ? setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, &optval, optlen) - sysctl -w net.tipc.tipc_rmem="5000 4000 68240400" (or smaller values) - add some delays when processing messages in mdtm_process_recv_events(), to provoke overloading the socket receive buffer. We experience dropped packages in a 75 node system, and as a workaround increasing the default so receive buffer size it seems working for that setup. /Thanks HansN On 09/01/2016 05:50 AM, A V Mahesh wrote: > Hi HansN, > > Do you have any tips to created overload case, > > I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled > cases. > > -AVM > > > On 9/1/2016 9:12 AM, A V Mahesh wrote: >> Hi HansN, >> >> Sorry for the delay. >> >> I will test it and get back to you soon. >> >> -AVM >> >> >> On 8/31/2016 4:29 PM, Hans Nordebäck wrote: >>> Hi Mahesh, >>> Any updates on this? >>> >>> /Regards HansN >>> >>> -Original Message- >>> From: Anders Widell >>> Sent: den 25 augusti 2016 13:11 >>> To: A V Mahesh ; Hans Nordebäck >>> ; mathi.naic...@oracle.com >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >>> >>> Hi! >>> >>> This is what the TIPC user documentation says about >>> TIPC_DEST_DROPPABLE: >>> "This option governs the handling of messages sent by the socket if >>> the message cannot be delivered to its destination, either because >>> the receiver is congested or because the specified receiver does not >>> exist. >>> If enabled, the message is discarded; otherwise the message is >>> returned to the sender." >>> >>> This is what the TIPC user documentation says about the return value >>> from the recvmsg() system call: "When used with a connectionless >>> socket, a return value of 0 indicates the arrival of a returned data >>> message that was originally sent by this socket." >>> >>> I think the documentation is pretty clear. If you set >>> TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. >>> when the receive buffer is full. The sender will not be notified in >>> this case. If TIPC_DEST_DROPPABLE is set to false, the message will >>> be returned to the sender in case of a full receive buffer. The >>> sender knows that it has received such a returned message when the >>> recvmsg() call returns zero. >>> >>> regards, >>> Anders Widell >>> >>> On 08/25/2016 11:30 AM, A V Mahesh wrote: Hi HansN, On 8/23/2016 5:22 PM, Hans Nordebäck wrote: > Hi Mahesh, > > Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may > drop messages silently, at receive sock buffer full condition, but > do not return any ancillary message. > If TIPC_DROPPABLE = false tipc may drop message but will send an > ancillary message to inform about TIPC_ERR_OVERLOAD. [AVM] My observation are understanding is different, based on TIPC code and Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error returned when TIPC is unable to enqueue an incoming message on the receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE enabled or disabled. The only difference between TIPC_DEST_DROPPABLE enabled or disabled is , If TIPC_DEST_DROPPABLE enabled, the message is discarded and recvmsg() returned size is ZERO and application will get errors, if TIPC_DEST_DROPPABLE disabled the message is returned to the sender it means the recvmsg() returned size is user send data size and application will get errors . I did check the TIPC code and documentations and I haven't get any evidences that TIPC_ERR_OVERLOAD error code will be send only If TIPC_DEST_DROPPABLE = false. Even while testing #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my observations and understanding was, an individual TIPC socket is only allowed to queue up OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before it starts rejecting them. Once a socket receiving queue length exceeds the maximum limit value, the receiving socket will send out a reject message with TIPC_ERR_OVERLOAD error code with cmsg_type as TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 Programmer's Guide confirmed the same . tipc/socket.c === /* Reject message if there isn't room to queue it */ recv_q_len = (u32)atomic_read(&tipc_queue_size); if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) return TIPC_ERR_OVERLOAD; } recv_q_len = sk
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, Do you have any tips to created overload case, I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled cases. -AVM On 9/1/2016 9:12 AM, A V Mahesh wrote: > Hi HansN, > > Sorry for the delay. > > I will test it and get back to you soon. > > -AVM > > > On 8/31/2016 4:29 PM, Hans Nordebäck wrote: >> Hi Mahesh, >> Any updates on this? >> >> /Regards HansN >> >> -Original Message- >> From: Anders Widell >> Sent: den 25 augusti 2016 13:11 >> To: A V Mahesh ; Hans Nordebäck >> ; mathi.naic...@oracle.com >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >> >> Hi! >> >> This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE: >> "This option governs the handling of messages sent by the socket if >> the message cannot be delivered to its destination, either because >> the receiver is congested or because the specified receiver does not >> exist. >> If enabled, the message is discarded; otherwise the message is >> returned to the sender." >> >> This is what the TIPC user documentation says about the return value >> from the recvmsg() system call: "When used with a connectionless >> socket, a return value of 0 indicates the arrival of a returned data >> message that was originally sent by this socket." >> >> I think the documentation is pretty clear. If you set >> TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. >> when the receive buffer is full. The sender will not be notified in >> this case. If TIPC_DEST_DROPPABLE is set to false, the message will >> be returned to the sender in case of a full receive buffer. The >> sender knows that it has received such a returned message when the >> recvmsg() call returns zero. >> >> regards, >> Anders Widell >> >> On 08/25/2016 11:30 AM, A V Mahesh wrote: >>> Hi HansN, >>> >>> >>> On 8/23/2016 5:22 PM, Hans Nordebäck wrote: >>> Hi Mahesh, Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may drop messages silently, at receive sock buffer full condition, but do not return any ancillary message. If TIPC_DROPPABLE = false tipc may drop message but will send an ancillary message to inform about TIPC_ERR_OVERLOAD. >>> [AVM] >>> >>> My observation are understanding is different, based on TIPC code and >>> Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error >>> returned when TIPC is unable to enqueue an incoming message on the >>> receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE >>> enabled or disabled. >>> >>> The only difference between TIPC_DEST_DROPPABLE enabled or disabled is >>> , If TIPC_DEST_DROPPABLE enabled, the message is discarded and >>> recvmsg() returned size is ZERO and application will get errors, if >>> TIPC_DEST_DROPPABLE disabled the message is returned to the sender it >>> means the recvmsg() returned size is user send data size and >>> application will get errors . >>> >>> I did check the TIPC code and documentations and I haven't get any >>> evidences that TIPC_ERR_OVERLOAD error code will be send only If >>> TIPC_DEST_DROPPABLE = false. >>> >>> Even while testing #1227 >>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my >>> observations and understanding was, an individual TIPC socket is only >>> allowed to queue up >>> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before >>> it starts rejecting them. >>> Once a socket receiving queue length exceeds the maximum limit value, >>> the receiving socket will send out a reject message with >>> TIPC_ERR_OVERLOAD error code with cmsg_type as >>> TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 >>> Programmer's Guide confirmed the same . >>> >>> tipc/socket.c >>> === >>> /* Reject message if there isn't room to queue it */ >>> >>> recv_q_len = (u32)atomic_read(&tipc_queue_size); >>> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { >>> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) >>> return TIPC_ERR_OVERLOAD; >>> } >>> recv_q_len = skb_queue_len(&sk->sk_receive_queue); >>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) { >>> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2)) >>> return TIPC_ERR_OVERLOAD; >>> } >>> === >>> >>> >>> 2.1.17. setsockopt() of TIPC 2.0 Programmer's Guide >>> === >>> TIPC_DEST_DROPPABLE >>> This option governs the handling of messages sent by the socket if the >>> message cannot be delivered to its destination, either because the >>> receiver is congested or because the specified receiver does not >>> exist. If enabled, the message is discarded; otherwise the message is >>> returned to the sender. >>> >>> By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM >>> socket types, and en
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, Sorry for the delay. I will test it and get back to you soon. -AVM On 8/31/2016 4:29 PM, Hans Nordebäck wrote: > Hi Mahesh, > Any updates on this? > > /Regards HansN > > -Original Message- > From: Anders Widell > Sent: den 25 augusti 2016 13:11 > To: A V Mahesh ; Hans Nordebäck > ; mathi.naic...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] > > Hi! > > This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE: > "This option governs the handling of messages sent by the socket if the > message cannot be delivered to its destination, either because the receiver > is congested or because the specified receiver does not exist. > If enabled, the message is discarded; otherwise the message is returned to > the sender." > > This is what the TIPC user documentation says about the return value from the > recvmsg() system call: "When used with a connectionless socket, a return > value of 0 indicates the arrival of a returned data message that was > originally sent by this socket." > > I think the documentation is pretty clear. If you set TIPC_DEST_DROPPABLE to > true, the receiver can discard messages e.g. when the receive buffer is full. > The sender will not be notified in this case. If TIPC_DEST_DROPPABLE is set > to false, the message will be returned to the sender in case of a full > receive buffer. The sender knows that it has received such a returned message > when the recvmsg() call returns zero. > > regards, > Anders Widell > > On 08/25/2016 11:30 AM, A V Mahesh wrote: >> Hi HansN, >> >> >> On 8/23/2016 5:22 PM, Hans Nordebäck wrote: >> >>> Hi Mahesh, >>> >>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may >>> drop messages silently, at receive sock buffer full condition, but >>> do not return any ancillary message. >>> If TIPC_DROPPABLE = false tipc may drop message but will send an >>> ancillary message to inform about TIPC_ERR_OVERLOAD. >> [AVM] >> >> My observation are understanding is different, based on TIPC code and >> Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error >> returned when TIPC is unable to enqueue an incoming message on the >> receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE >> enabled or disabled. >> >> The only difference between TIPC_DEST_DROPPABLE enabled or disabled is >> , If TIPC_DEST_DROPPABLE enabled, the message is discarded and >> recvmsg() returned size is ZERO and application will get errors, if >> TIPC_DEST_DROPPABLE disabled the message is returned to the sender it >> means the recvmsg() returned size is user send data size and >> application will get errors . >> >> I did check the TIPC code and documentations and I haven't get any >> evidences that TIPC_ERR_OVERLOAD error code will be send only If >> TIPC_DEST_DROPPABLE = false. >> >> Even while testing #1227 >> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my >> observations and understanding was, an individual TIPC socket is only >> allowed to queue up >> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before >> it starts rejecting them. >> Once a socket receiving queue length exceeds the maximum limit value, >> the receiving socket will send out a reject message with >> TIPC_ERR_OVERLOAD error code with cmsg_type as >> TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 >> Programmer's Guide confirmed the same . >> >> tipc/socket.c >> === >> /* Reject message if there isn't room to queue it */ >> >> recv_q_len = (u32)atomic_read(&tipc_queue_size); >> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { >> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) >> return TIPC_ERR_OVERLOAD; >> } >> recv_q_len = skb_queue_len(&sk->sk_receive_queue); >> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) { >> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2)) >> return TIPC_ERR_OVERLOAD; >> } >> === >> >> >> 2.1.17. setsockopt() of TIPC 2.0 Programmer's Guide >> === >> TIPC_DEST_DROPPABLE >> This option governs the handling of messages sent by the socket if the >> message cannot be delivered to its destination, either because the >> receiver is congested or because the specified receiver does not >> exist. If enabled, the message is discarded; otherwise the message is >> returned to the sender. >> >> By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM >> socket types, and enabled for SOCK_RDM and SOCK_DGRAM, This >> arrangement ensures proper teardown of failed connections when >> connection-oriented data transfer is used, without increasing the >> complexity of connectionless data transfer. >> >> TIPC_SRC_DROPPABLE >> This option governs the handling of messages sent by the socket
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi Mahesh, Any updates on this? /Regards HansN -Original Message- From: Anders Widell Sent: den 25 augusti 2016 13:11 To: A V Mahesh ; Hans Nordebäck ; mathi.naic...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] Hi! This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE: "This option governs the handling of messages sent by the socket if the message cannot be delivered to its destination, either because the receiver is congested or because the specified receiver does not exist. If enabled, the message is discarded; otherwise the message is returned to the sender." This is what the TIPC user documentation says about the return value from the recvmsg() system call: "When used with a connectionless socket, a return value of 0 indicates the arrival of a returned data message that was originally sent by this socket." I think the documentation is pretty clear. If you set TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. when the receive buffer is full. The sender will not be notified in this case. If TIPC_DEST_DROPPABLE is set to false, the message will be returned to the sender in case of a full receive buffer. The sender knows that it has received such a returned message when the recvmsg() call returns zero. regards, Anders Widell On 08/25/2016 11:30 AM, A V Mahesh wrote: > Hi HansN, > > > On 8/23/2016 5:22 PM, Hans Nordebäck wrote: > >> Hi Mahesh, >> >> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may >> drop messages silently, at receive sock buffer full condition, but >> do not return any ancillary message. >> If TIPC_DROPPABLE = false tipc may drop message but will send an >> ancillary message to inform about TIPC_ERR_OVERLOAD. > [AVM] > > My observation are understanding is different, based on TIPC code and > Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error > returned when TIPC is unable to enqueue an incoming message on the > receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE > enabled or disabled. > > The only difference between TIPC_DEST_DROPPABLE enabled or disabled is > , If TIPC_DEST_DROPPABLE enabled, the message is discarded and > recvmsg() returned size is ZERO and application will get errors, if > TIPC_DEST_DROPPABLE disabled the message is returned to the sender it > means the recvmsg() returned size is user send data size and > application will get errors . > > I did check the TIPC code and documentations and I haven't get any > evidences that TIPC_ERR_OVERLOAD error code will be send only If > TIPC_DEST_DROPPABLE = false. > > Even while testing #1227 > (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my > observations and understanding was, an individual TIPC socket is only > allowed to queue up > OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before > it starts rejecting them. > Once a socket receiving queue length exceeds the maximum limit value, > the receiving socket will send out a reject message with > TIPC_ERR_OVERLOAD error code with cmsg_type as > TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 > Programmer's Guide confirmed the same . > > tipc/socket.c > === > /* Reject message if there isn't room to queue it */ > > recv_q_len = (u32)atomic_read(&tipc_queue_size); > if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { > if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) > return TIPC_ERR_OVERLOAD; > } > recv_q_len = skb_queue_len(&sk->sk_receive_queue); > if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) { > if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2)) > return TIPC_ERR_OVERLOAD; > } > === > > > 2.1.17. setsockopt() of TIPC 2.0 Programmer's Guide > === > TIPC_DEST_DROPPABLE > This option governs the handling of messages sent by the socket if the > message cannot be delivered to its destination, either because the > receiver is congested or because the specified receiver does not > exist. If enabled, the message is discarded; otherwise the message is > returned to the sender. > > By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM > socket types, and enabled for SOCK_RDM and SOCK_DGRAM, This > arrangement ensures proper teardown of failed connections when > connection-oriented data transfer is used, without increasing the > complexity of connectionless data transfer. > > TIPC_SRC_DROPPABLE > This option governs the handling of messages sent by the socket if > link congestion occurs. If enabled, the message is discarded; > otherwise the system queues the message for later transmission. > By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM, > and SOCK_RDM socket types (result
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi! This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE: "This option governs the handling of messages sent by the socket if the message cannot be delivered to its destination, either because the receiver is congested or because the specified receiver does not exist. If enabled, the message is discarded; otherwise the message is returned to the sender." This is what the TIPC user documentation says about the return value from the recvmsg() system call: "When used with a connectionless socket, a return value of 0 indicates the arrival of a returned data message that was originally sent by this socket." I think the documentation is pretty clear. If you set TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. when the receive buffer is full. The sender will not be notified in this case. If TIPC_DEST_DROPPABLE is set to false, the message will be returned to the sender in case of a full receive buffer. The sender knows that it has received such a returned message when the recvmsg() call returns zero. regards, Anders Widell On 08/25/2016 11:30 AM, A V Mahesh wrote: > Hi HansN, > > > On 8/23/2016 5:22 PM, Hans Nordebäck wrote: > >> Hi Mahesh, >> >> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may >> drop messages silently, at receive sock buffer full condition, but >> do not return any ancillary message. >> If TIPC_DROPPABLE = false tipc may drop message but will send an >> ancillary message to inform about TIPC_ERR_OVERLOAD. > [AVM] > > My observation are understanding is different, based on TIPC code and > Linux TIPC 2.0 Programmer's Guide , > that the TIPC_ERR_OVERLOAD error returned when TIPC is unable to > enqueue an incoming message on the receiving socket's receive queue > irrelevant of TIPC_DEST_DROPPABLE enabled or disabled. > > The only difference between TIPC_DEST_DROPPABLE enabled or disabled is > , If TIPC_DEST_DROPPABLE enabled, the message is discarded and > recvmsg() returned size is ZERO and application will get errors, > if TIPC_DEST_DROPPABLE disabled the message is returned to the sender > it means the recvmsg() returned size is user send data size and > application will get errors . > > I did check the TIPC code and documentations and I haven't get any > evidences that TIPC_ERR_OVERLOAD error code will be send only > If TIPC_DEST_DROPPABLE = false. > > Even while testing #1227 > (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my > observations and understanding was, > an individual TIPC socket is only allowed to queue up > OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before > it starts rejecting them. > Once a socket receiving queue length exceeds the maximum limit value, > the receiving socket will send out a reject message with > TIPC_ERR_OVERLOAD error code > with cmsg_type as TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and > Linux TIPC 2.0 Programmer's Guide confirmed the same . > > tipc/socket.c > === > /* Reject message if there isn't room to queue it */ > > recv_q_len = (u32)atomic_read(&tipc_queue_size); > if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { > if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) > return TIPC_ERR_OVERLOAD; > } > recv_q_len = skb_queue_len(&sk->sk_receive_queue); > if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) { > if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2)) > return TIPC_ERR_OVERLOAD; > } > === > > > 2.1.17. setsockopt() of TIPC 2.0 Programmer's Guide > === > TIPC_DEST_DROPPABLE > This option governs the handling of messages sent by the socket if the > message cannot be delivered to its destination, > either because the receiver is congested or because the specified > receiver does not exist. If enabled, the message is discarded; > otherwise the message is returned to the sender. > > By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM > socket types, and enabled for SOCK_RDM and SOCK_DGRAM, > This arrangement ensures proper teardown of failed connections when > connection-oriented data transfer is used, without increasing the > complexity of connectionless data transfer. > > TIPC_SRC_DROPPABLE > This option governs the handling of messages sent by the socket if > link congestion occurs. If enabled, the message is discarded; > otherwise the system queues the message for later transmission. > By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM, > and SOCK_RDM socket types (resulting in "reliable" data transfer), and > enabled for SOCK_DGRAM (resulting in "unreliable" data transfer). > === > > Now I will try to create OVERLOAD case and update you soon my latest > observations. > > -AVM > >> Correcting this and adding an ab
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, On 8/23/2016 5:22 PM, Hans Nordebäck wrote: > Hi Mahesh, > > Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may drop > messages silently, at receive sock buffer full condition, but do not return > any ancillary message. > If TIPC_DROPPABLE = false tipc may drop message but will send an ancillary > message to inform about TIPC_ERR_OVERLOAD. [AVM] My observation are understanding is different, based on TIPC code and Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error returned when TIPC is unable to enqueue an incoming message on the receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE enabled or disabled. The only difference between TIPC_DEST_DROPPABLE enabled or disabled is , If TIPC_DEST_DROPPABLE enabled, the message is discarded and recvmsg() returned size is ZERO and application will get errors, if TIPC_DEST_DROPPABLE disabled the message is returned to the sender it means the recvmsg() returned size is user send data size and application will get errors . I did check the TIPC code and documentations and I haven't get any evidences that TIPC_ERR_OVERLOAD error code will be send only If TIPC_DEST_DROPPABLE = false. Even while testing #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my observations and understanding was, an individual TIPC socket is only allowed to queue up OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before it starts rejecting them. Once a socket receiving queue length exceeds the maximum limit value, the receiving socket will send out a reject message with TIPC_ERR_OVERLOAD error code with cmsg_type as TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 Programmer's Guide confirmed the same . tipc/socket.c === /* Reject message if there isn't room to queue it */ recv_q_len = (u32)atomic_read(&tipc_queue_size); if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) return TIPC_ERR_OVERLOAD; } recv_q_len = skb_queue_len(&sk->sk_receive_queue); if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) { if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2)) return TIPC_ERR_OVERLOAD; } === 2.1.17. setsockopt() of TIPC 2.0 Programmer's Guide === TIPC_DEST_DROPPABLE This option governs the handling of messages sent by the socket if the message cannot be delivered to its destination, either because the receiver is congested or because the specified receiver does not exist. If enabled, the message is discarded; otherwise the message is returned to the sender. By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM socket types, and enabled for SOCK_RDM and SOCK_DGRAM, This arrangement ensures proper teardown of failed connections when connection-oriented data transfer is used, without increasing the complexity of connectionless data transfer. TIPC_SRC_DROPPABLE This option governs the handling of messages sent by the socket if link congestion occurs. If enabled, the message is discarded; otherwise the system queues the message for later transmission. By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM, and SOCK_RDM socket types (resulting in "reliable" data transfer), and enabled for SOCK_DGRAM (resulting in "unreliable" data transfer). === Now I will try to create OVERLOAD case and update you soon my latest observations. -AVM > Correcting this and adding an abort is not backward compatible as some > service already handle flow control in some way, only log when packages are > dropped. > Regarding ticket #1960 there are other solutions than introducing flow > control in MDS, e.g. expose an option to the service to choose connection > oriented > or connection less. > The problem with dropped messages seems in one case related to, (by MDS), > intensive MDS logging. > > /Thanks HansN > -Original Message- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: den 23 augusti 2016 11:27 > To: Hans Nordebäck ; Anders Widell > ; mathi.naic...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] > > Hi HansN, > > It seems I am missing some thing , please allow me to under stand > > If I currently understand you observation : > > With current Opensaf code ( this #1957 patch NOT applied ) , by default > TIPC_DROPPABLE=true ,while running Opensaf with that binary when > TIPC_ERR_OVERLOAD occurring, TIPC is not given errors TIPC_ERRINFO or > TIPC_RETDATA and following code is not being get hit of function > recvfrom_connectionless(), is my understanding right ? > > ==
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi Mahesh, Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may drop messages silently, at receive sock buffer full condition, but do not return any ancillary message. If TIPC_DROPPABLE = false tipc may drop message but will send an ancillary message to inform about TIPC_ERR_OVERLOAD. Correcting this and adding an abort is not backward compatible as some service already handle flow control in some way, only log when packages are dropped. Regarding ticket #1960 there are other solutions than introducing flow control in MDS, e.g. expose an option to the service to choose connection oriented or connection less. The problem with dropped messages seems in one case related to, (by MDS), intensive MDS logging. /Thanks HansN -Original Message- From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: den 23 augusti 2016 11:27 To: Hans Nordebäck ; Anders Widell ; mathi.naic...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] Hi HansN, It seems I am missing some thing , please allow me to under stand If I currently understand you observation : With current Opensaf code ( this #1957 patch NOT applied ) , by default TIPC_DROPPABLE=true ,while running Opensaf with that binary when TIPC_ERR_OVERLOAD occurring, TIPC is not given errors TIPC_ERRINFO or TIPC_RETDATA and following code is not being get hit of function recvfrom_connectionless(), is my understanding right ? = *if (anc->cmsg_type == TIPC_ERRINFO) {* /* TIPC_ERRINFO - TIPC error code associated with a returned data message or a connection termination message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) ); *abort();* *} else if (anc->cmsg_type == TIPC_RETDATA) {* /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return rejected messages to the sender ) we will hit this when we implement MDS retransmit lost messages abort can be replaced with flow control logic*/ for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); cptr++; } /* TIPC_RETDATA -The contents of a returned data message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_RETDATA abort err :%s", strerror(errno) ); *abort();* } = -AVM On 8/23/2016 1:08 PM, Hans Nordebäck wrote: > Hi Mahesh, > > Please see response below with [HansN] /Thanks HansN > > -Original Message- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: den 23 augusti 2016 08:25 > To: Hans Nordebäck ; Anders Widell > ; mathi.naic...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] > > Hi HansN > > Please see response below with [AVM] > > -AVM > > On 8/23/2016 11:41 AM, Hans Nordebäck wrote: >> Hi Mahesh, >> >> please see comments below. >> >> /Thanks HansN >> >> >> On 08/23/2016 07:21 AM, A V Mahesh wrote: >>> Hi HansN, >>> >>> Let us fist discuss the error handling and abort, then we can come >>> back to interpretation of TIPC currently does permit OR does not >>> permit an application to send a multicast message with the >>> "destination droppable" setting disabled. >>> >>> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to return >>> an undelivered multicast message to its sender and we can determine >>> issue is because of TIPC_ERR_OVERLOAD, this helps in debugging , so >>> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the >>> problem. >>> >>> But still we need to abort(), the reason for that is current MDS >>> implementations doesn't have flow control logic ( no retry because >>> of error ) , so Application like AMF can go wrong and cluster will >>> go into unstable/recoverble state. >>> >> [HansN] In the current implementation messages are dropped silently >> and no abort is done. > [AVM] I can see abort(); in current code , you mean abort(); is not working > and application(amf) is not existing ? > [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, > (TIPC_ERR_OVERLOAD) no abort is be performed, e.g amfd detects this in the > msg sanity chk and logs "invalid msg id ..." > == > == > if (anc->cmsg_type == TIPC_ERRINFO) { > /* TIPC_ERRINFO - TIPC error code associated with a returned data > message or a connection termination message so abort */ > m_MDS_LOG_CRITICAL("MDTM: undelivered message condition > ancillary > data: TIPC_ERRINFO abort err :%s", strerror(errno) ); > *abort();* > } else if (anc->cmsg_type == TIPC_RE
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, It seems I am missing some thing , please allow me to under stand If I currently understand you observation : With current Opensaf code ( this #1957 patch NOT applied ) , by default TIPC_DROPPABLE=true ,while running Opensaf with that binary when TIPC_ERR_OVERLOAD occurring, TIPC is not given errors TIPC_ERRINFO or TIPC_RETDATA and following code is not being get hit of function recvfrom_connectionless(), is my understanding right ? = *if (anc->cmsg_type == TIPC_ERRINFO) {* /* TIPC_ERRINFO - TIPC error code associated with a returned data message or a connection termination message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) ); *abort();* *} else if (anc->cmsg_type == TIPC_RETDATA) {* /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return rejected messages to the sender ) we will hit this when we implement MDS retransmit lost messages abort can be replaced with flow control logic*/ for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); cptr++; } /* TIPC_RETDATA -The contents of a returned data message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_RETDATA abort err :%s", strerror(errno) ); *abort();* } = -AVM On 8/23/2016 1:08 PM, Hans Nordebäck wrote: > Hi Mahesh, > > Please see response below with [HansN] > /Thanks HansN > > -Original Message- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: den 23 augusti 2016 08:25 > To: Hans Nordebäck ; Anders Widell > ; mathi.naic...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] > > Hi HansN > > Please see response below with [AVM] > > -AVM > > On 8/23/2016 11:41 AM, Hans Nordebäck wrote: >> Hi Mahesh, >> >> please see comments below. >> >> /Thanks HansN >> >> >> On 08/23/2016 07:21 AM, A V Mahesh wrote: >>> Hi HansN, >>> >>> Let us fist discuss the error handling and abort, then we can come >>> back to interpretation of TIPC currently does permit OR does not >>> permit an application to send a multicast message with the >>> "destination droppable" setting disabled. >>> >>> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to return >>> an undelivered multicast message to its sender and we can determine >>> issue is because of TIPC_ERR_OVERLOAD, this helps in debugging , so >>> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the >>> problem. >>> >>> But still we need to abort(), the reason for that is current MDS >>> implementations doesn't have flow control logic ( no retry because of >>> error ) , so Application like AMF can go wrong and cluster will go >>> into unstable/recoverble state. >>> >> [HansN] In the current implementation messages are dropped silently >> and no abort is done. > [AVM] I can see abort(); in current code , you mean abort(); is not working > and application(amf) is not existing ? > [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, > (TIPC_ERR_OVERLOAD) no abort is be performed, e.g > amfd detects this in the msg sanity chk and logs "invalid msg id ..." > > if (anc->cmsg_type == TIPC_ERRINFO) { > /* TIPC_ERRINFO - TIPC error code associated with a returned data > message or a connection termination message so abort */ > m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary > data: TIPC_ERRINFO abort err :%s", strerror(errno) ); > *abort();* > } else if (anc->cmsg_type == TIPC_RETDATA) { > /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return > rejected messages to the sender ) > we will hit this when we implement MDS retransmit lost messages > abort can be replaced with flow control logic*/ > for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { > m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); > cptr++; > } > /* TIPC_RETDATA -The contents of a returned data message so abort */ > m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary > data: TIPC_RETDATA abort err :%s", strerror(errno) ); > *abort();* > } > >> This patch enables logging >> when packages are dropped to help in debugging. I don't agree that we >> should also introduce abort, but instead: >> 1) Implement a solution to handle dropped packages, ticket #1960 > [AVM] This is nothing but flow control implementation in MDS, this is future > enhancement > >> 2) Investigate why pa
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi Mahesh, Please see response below with [HansN] /Thanks HansN -Original Message- From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: den 23 augusti 2016 08:25 To: Hans Nordebäck ; Anders Widell ; mathi.naic...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] Hi HansN Please see response below with [AVM] -AVM On 8/23/2016 11:41 AM, Hans Nordebäck wrote: > Hi Mahesh, > > please see comments below. > > /Thanks HansN > > > On 08/23/2016 07:21 AM, A V Mahesh wrote: >> Hi HansN, >> >> Let us fist discuss the error handling and abort, then we can come >> back to interpretation of TIPC currently does permit OR does not >> permit an application to send a multicast message with the >> "destination droppable" setting disabled. >> >> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to return >> an undelivered multicast message to its sender and we can determine >> issue is because of TIPC_ERR_OVERLOAD, this helps in debugging , so >> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the >> problem. >> >> But still we need to abort(), the reason for that is current MDS >> implementations doesn't have flow control logic ( no retry because of >> error ) , so Application like AMF can go wrong and cluster will go >> into unstable/recoverble state. >> > [HansN] In the current implementation messages are dropped silently > and no abort is done. [AVM] I can see abort(); in current code , you mean abort(); is not working and application(amf) is not existing ? [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, (TIPC_ERR_OVERLOAD) no abort is be performed, e.g amfd detects this in the msg sanity chk and logs "invalid msg id ..." if (anc->cmsg_type == TIPC_ERRINFO) { /* TIPC_ERRINFO - TIPC error code associated with a returned data message or a connection termination message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) ); *abort();* } else if (anc->cmsg_type == TIPC_RETDATA) { /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return rejected messages to the sender ) we will hit this when we implement MDS retransmit lost messages abort can be replaced with flow control logic*/ for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); cptr++; } /* TIPC_RETDATA -The contents of a returned data message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_RETDATA abort err :%s", strerror(errno) ); *abort();* } > This patch enables logging > when packages are dropped to help in debugging. I don't agree that we > should also introduce abort, but instead: > 1) Implement a solution to handle dropped packages, ticket #1960 [AVM] This is nothing but flow control implementation in MDS, this is future enhancement > 2) Investigate why packages may be dropped, the receiving MDS thread > is a real time thread and should be able to consume a large amount of > incoming messages. > E.g. is the receiving MDS thread "live hanging" due to locks, file I/O > etc? >> This was the reason we haven't gone for it while addressing Ticket >> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) >> So currently we don't have any advantage of disabling >> TIPC_DEST_DROPPABLE and not allowing multicast messages. >> >> -AVM >> >> >> On 8/18/2016 2:43 PM, Hans Nordeback wrote: >>> osaf/libs/core/mds/mds_dt_tipc.c | 32 >>> +--- >>> 1 files changed, 25 insertions(+), 7 deletions(-) >>> >>> >>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >>> b/osaf/libs/core/mds/mds_dt_tipc.c >>> --- a/osaf/libs/core/mds/mds_dt_tipc.c >>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >>> m_MDS_LOG_INFO("MDTM: Successfully set default >>> socket option TIPC_IMP = %d", TIPCIMPORTANCE); >>> } >>> +int droppable = 0; >>> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >>> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero >>> err :%s\n", strerror(errno)); >>> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE >>> to zero err :%s\n", strerror(errno)); >>> +osafassert(0); >>> +} else { >>> +m_MDS_LOG_NOTIFY("MDTM: Successfully set >>> TIPC_DEST_DROPPABLE to zero"); >>> +} >>> + >>> return NCSCC_RC_SUCCESS; >>> } >>> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >>> unsigned char *cptr; >>> int i; >>> int has_
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi! I don't think the sender would need to unconditionally abort() down in the MDS layer when it gets back an undelivered message from TIPC. We have the message loss callback in MDS, which can be used by the receiver to detect lost messages. The receiver can take an appropriate action when it receives this callback. If the appropriate action is to restart the sender, then the receiver can inform the sender about the message loss so that the sender can restart itself. regards, Anders Widell On 08/23/2016 07:21 AM, A V Mahesh wrote: > Hi HansN, > > Let us fist discuss the error handling and abort, then we can come > back to > interpretation of TIPC currently does permit OR does not permit an > application to send > a multicast message with the "destination droppable" setting disabled. > > Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to return > an undelivered multicast message to its sender > and we can determine issue is because of TIPC_ERR_OVERLOAD, this > helps in debugging , > so that application may increased SO_SNDBUF/SO_RCVBUF to reduce the > problem. > > But still we need to abort(), the reason for that is current MDS > implementations doesn't > have flow control logic ( no retry because of error ) , so Application > like AMF can go wrong and cluster will go into unstable/recoverble state. > > This was the reason we haven't gone for it while addressing Ticket > #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) > So currently we don't have any advantage of disabling > TIPC_DEST_DROPPABLE and not allowing multicast messages. > > -AVM > > > On 8/18/2016 2:43 PM, Hans Nordeback wrote: >> osaf/libs/core/mds/mds_dt_tipc.c | 32 >> +--- >> 1 files changed, 25 insertions(+), 7 deletions(-) >> >> >> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >> b/osaf/libs/core/mds/mds_dt_tipc.c >> --- a/osaf/libs/core/mds/mds_dt_tipc.c >> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >> m_MDS_LOG_INFO("MDTM: Successfully set default >> socket option TIPC_IMP = %d", TIPCIMPORTANCE); >> } >> +int droppable = 0; >> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero >> err :%s\n", strerror(errno)); >> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE >> to zero err :%s\n", strerror(errno)); >> +osafassert(0); >> +} else { >> +m_MDS_LOG_NOTIFY("MDTM: Successfully set >> TIPC_DEST_DROPPABLE to zero"); >> +} >> + >> return NCSCC_RC_SUCCESS; >> } >> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >> unsigned char *cptr; >> int i; >> int has_addr; >> +int anc_data[2]; >> + >> ssize_t sz; >> has_addr = (from != NULL) && (addrlen != NULL); >> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd, >> if the message was sent using a TIPC name or name >> sequence as the >> destination rather than a TIPC port ID So abort for >> TIPC_ERRINFO and TIPC_RETDATA*/ >> if (anc->cmsg_type == TIPC_ERRINFO) { >> -/* TIPC_ERRINFO - TIPC error code associated with a >> returned data message or a connection termination message so abort */ >> -m_MDS_LOG_CRITICAL("MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err :%s", >> strerror(errno) ); >> -abort(); >> +anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0)); >> +if (anc_data[0] == TIPC_ERR_OVERLOAD) { >> +LOG_CR("MDTM: undelivered message condition >> ancillary data: TIPC_ERR_OVERLOAD"); >> +m_MDS_LOG_CRITICAL("MDTM: undelivered message >> condition ancillary data: TIPC_ERR_OVERLOAD"); >> +} else { >> +/* TIPC_ERRINFO - TIPC error code associated >> with a returned data message or a connection termination message so >> abort */ >> +LOG_CR("MDTM: undelivered message condition >> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >> +m_MDS_LOG_CRITICAL("MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >> +} >> } else if (anc->cmsg_type == TIPC_RETDATA) { >> -/* If we set TIPC_DEST_DROPPABLE off messge >> (configure TIPC to return rejected messages to the sender ) >> +/* If we set TIPC_DEST_DROPPABLE off message >> (configure TIPC to return rejected messages to the sender ) >> we will hit this when we implement MDS >> retransmit lost messages abort can be replaced with flow control >> logic*/ >> for (i = anc->c
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN Please see response below with [AVM] -AVM On 8/23/2016 11:41 AM, Hans Nordebäck wrote: > Hi Mahesh, > > please see comments below. > > /Thanks HansN > > > On 08/23/2016 07:21 AM, A V Mahesh wrote: >> Hi HansN, >> >> Let us fist discuss the error handling and abort, then we can come >> back to >> interpretation of TIPC currently does permit OR does not permit an >> application to send >> a multicast message with the "destination droppable" setting disabled. >> >> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to return >> an undelivered multicast message to its sender >> and we can determine issue is because of TIPC_ERR_OVERLOAD, this >> helps in debugging , >> so that application may increased SO_SNDBUF/SO_RCVBUF to reduce the >> problem. >> >> But still we need to abort(), the reason for that is current MDS >> implementations doesn't >> have flow control logic ( no retry because of error ) , so >> Application like AMF can go wrong and cluster will go into >> unstable/recoverble state. >> > [HansN] In the current implementation messages are dropped silently > and no abort is done. [AVM] I can see abort(); in current code , you mean abort(); is not working and application(amf) is not existing ? if (anc->cmsg_type == TIPC_ERRINFO) { /* TIPC_ERRINFO - TIPC error code associated with a returned data message or a connection termination message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) ); *abort();* } else if (anc->cmsg_type == TIPC_RETDATA) { /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return rejected messages to the sender ) we will hit this when we implement MDS retransmit lost messages abort can be replaced with flow control logic*/ for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); cptr++; } /* TIPC_RETDATA -The contents of a returned data message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_RETDATA abort err :%s", strerror(errno) ); *abort();* } > This patch enables logging > when packages are dropped to help in debugging. I don't agree that we > should also introduce abort, but instead: > 1) Implement a solution to handle dropped packages, ticket #1960 [AVM] This is nothing but flow control implementation in MDS, this is future enhancement > 2) Investigate why packages may be dropped, the receiving MDS thread > is a real time thread and should be able to consume a large amount of > incoming messages. > E.g. is the receiving MDS thread "live hanging" due to locks, file I/O > etc? >> This was the reason we haven't gone for it while addressing Ticket >> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) >> So currently we don't have any advantage of disabling >> TIPC_DEST_DROPPABLE and not allowing multicast messages. >> >> -AVM >> >> >> On 8/18/2016 2:43 PM, Hans Nordeback wrote: >>> osaf/libs/core/mds/mds_dt_tipc.c | 32 >>> +--- >>> 1 files changed, 25 insertions(+), 7 deletions(-) >>> >>> >>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >>> b/osaf/libs/core/mds/mds_dt_tipc.c >>> --- a/osaf/libs/core/mds/mds_dt_tipc.c >>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >>> m_MDS_LOG_INFO("MDTM: Successfully set default >>> socket option TIPC_IMP = %d", TIPCIMPORTANCE); >>> } >>> +int droppable = 0; >>> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >>> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero >>> err :%s\n", strerror(errno)); >>> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE >>> to zero err :%s\n", strerror(errno)); >>> +osafassert(0); >>> +} else { >>> +m_MDS_LOG_NOTIFY("MDTM: Successfully set >>> TIPC_DEST_DROPPABLE to zero"); >>> +} >>> + >>> return NCSCC_RC_SUCCESS; >>> } >>> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >>> unsigned char *cptr; >>> int i; >>> int has_addr; >>> +int anc_data[2]; >>> + >>> ssize_t sz; >>> has_addr = (from != NULL) && (addrlen != NULL); >>> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd, >>> if the message was sent using a TIPC name or name >>> sequence as the >>> destination rather than a TIPC port ID So abort for >>> TIPC_ERRINFO and TIPC_RETDATA*/ >>> if (anc->cmsg_type == TIPC_ERRINFO) { >>> -/* TIPC_ERRINFO - TIPC error code associated with a
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi Mahesh, please see comments below. /Thanks HansN On 08/23/2016 07:21 AM, A V Mahesh wrote: > Hi HansN, > > Let us fist discuss the error handling and abort, then we can come > back to > interpretation of TIPC currently does permit OR does not permit an > application to send > a multicast message with the "destination droppable" setting disabled. > > Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to return > an undelivered multicast message to its sender > and we can determine issue is because of TIPC_ERR_OVERLOAD, this > helps in debugging , > so that application may increased SO_SNDBUF/SO_RCVBUF to reduce the > problem. > > But still we need to abort(), the reason for that is current MDS > implementations doesn't > have flow control logic ( no retry because of error ) , so Application > like AMF can go wrong and cluster will go into unstable/recoverble state. > [HansN] In the current implementation messages are dropped silently and no abort is done. This patch enables logging when packages are dropped to help in debugging. I don't agree that we should also introduce abort, but instead: 1) Implement a solution to handle dropped packages, ticket #1960 2) Investigate why packages may be dropped, the receiving MDS thread is a real time thread and should be able to consume a large amount of incoming messages. E.g. is the receiving MDS thread "live hanging" due to locks, file I/O etc? > This was the reason we haven't gone for it while addressing Ticket > #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) > So currently we don't have any advantage of disabling > TIPC_DEST_DROPPABLE and not allowing multicast messages. > > -AVM > > > On 8/18/2016 2:43 PM, Hans Nordeback wrote: >> osaf/libs/core/mds/mds_dt_tipc.c | 32 >> +--- >> 1 files changed, 25 insertions(+), 7 deletions(-) >> >> >> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >> b/osaf/libs/core/mds/mds_dt_tipc.c >> --- a/osaf/libs/core/mds/mds_dt_tipc.c >> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >> m_MDS_LOG_INFO("MDTM: Successfully set default >> socket option TIPC_IMP = %d", TIPCIMPORTANCE); >> } >> +int droppable = 0; >> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero >> err :%s\n", strerror(errno)); >> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE >> to zero err :%s\n", strerror(errno)); >> +osafassert(0); >> +} else { >> +m_MDS_LOG_NOTIFY("MDTM: Successfully set >> TIPC_DEST_DROPPABLE to zero"); >> +} >> + >> return NCSCC_RC_SUCCESS; >> } >> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >> unsigned char *cptr; >> int i; >> int has_addr; >> +int anc_data[2]; >> + >> ssize_t sz; >> has_addr = (from != NULL) && (addrlen != NULL); >> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd, >> if the message was sent using a TIPC name or name >> sequence as the >> destination rather than a TIPC port ID So abort for >> TIPC_ERRINFO and TIPC_RETDATA*/ >> if (anc->cmsg_type == TIPC_ERRINFO) { >> -/* TIPC_ERRINFO - TIPC error code associated with a >> returned data message or a connection termination message so abort */ >> -m_MDS_LOG_CRITICAL("MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err :%s", >> strerror(errno) ); >> -abort(); >> +anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0)); >> +if (anc_data[0] == TIPC_ERR_OVERLOAD) { >> +LOG_CR("MDTM: undelivered message condition >> ancillary data: TIPC_ERR_OVERLOAD"); >> +m_MDS_LOG_CRITICAL("MDTM: undelivered message >> condition ancillary data: TIPC_ERR_OVERLOAD"); >> +} else { >> +/* TIPC_ERRINFO - TIPC error code associated >> with a returned data message or a connection termination message so >> abort */ >> +LOG_CR("MDTM: undelivered message condition >> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >> +m_MDS_LOG_CRITICAL("MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >> +} >> } else if (anc->cmsg_type == TIPC_RETDATA) { >> -/* If we set TIPC_DEST_DROPPABLE off messge >> (configure TIPC to return rejected messages to the sender ) >> +/* If we set TIPC_DEST_DROPPABLE off message >> (configure TIPC to return rejected messages to the sender ) >> we will hit this when we implement MDS >> retransmit lost messages abort
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, Let us fist discuss the error handling and abort, then we can come back to interpretation of TIPC currently does permit OR does not permit an application to send a multicast message with the "destination droppable" setting disabled. Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to return an undelivered multicast message to its sender and we can determine issue is because of TIPC_ERR_OVERLOAD, this helps in debugging , so that application may increased SO_SNDBUF/SO_RCVBUF to reduce the problem. But still we need to abort(), the reason for that is current MDS implementations doesn't have flow control logic ( no retry because of error ) , so Application like AMF can go wrong and cluster will go into unstable/recoverble state. This was the reason we haven't gone for it while addressing Ticket #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) So currently we don't have any advantage of disabling TIPC_DEST_DROPPABLE and not allowing multicast messages. -AVM On 8/18/2016 2:43 PM, Hans Nordeback wrote: > osaf/libs/core/mds/mds_dt_tipc.c | 32 +--- > 1 files changed, 25 insertions(+), 7 deletions(-) > > > diff --git a/osaf/libs/core/mds/mds_dt_tipc.c > b/osaf/libs/core/mds/mds_dt_tipc.c > --- a/osaf/libs/core/mds/mds_dt_tipc.c > +++ b/osaf/libs/core/mds/mds_dt_tipc.c > @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, > m_MDS_LOG_INFO("MDTM: Successfully set default socket > option TIPC_IMP = %d", TIPCIMPORTANCE); > } > > +int droppable = 0; > +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, TIPC_DEST_DROPPABLE, > &droppable, sizeof(droppable)) != 0) { > +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero err > :%s\n", strerror(errno)); > +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE to zero > err :%s\n", strerror(errno)); > +osafassert(0); > +} else { > +m_MDS_LOG_NOTIFY("MDTM: Successfully set TIPC_DEST_DROPPABLE > to zero"); > +} > + > return NCSCC_RC_SUCCESS; > } > > @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, > unsigned char *cptr; > int i; > int has_addr; > + int anc_data[2]; > + > ssize_t sz; > > has_addr = (from != NULL) && (addrlen != NULL); > @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd, > if the message was sent using a TIPC name or name > sequence as the > destination rather than a TIPC port ID So abort for > TIPC_ERRINFO and TIPC_RETDATA*/ > if (anc->cmsg_type == TIPC_ERRINFO) { > - /* TIPC_ERRINFO - TIPC error code associated > with a returned data message or a connection termination message so abort */ > - m_MDS_LOG_CRITICAL("MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) ); > - abort(); > + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) > + 0)); > + if (anc_data[0] == TIPC_ERR_OVERLOAD) { > + LOG_CR("MDTM: undelivered message > condition ancillary data: TIPC_ERR_OVERLOAD"); > + m_MDS_LOG_CRITICAL("MDTM: undelivered > message condition ancillary data: TIPC_ERR_OVERLOAD"); > + } else { > + /* TIPC_ERRINFO - TIPC error code > associated with a returned data message or a connection termination message > so abort */ > + LOG_CR("MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); > + m_MDS_LOG_CRITICAL("MDTM: undelivered > message condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); > + } > } else if (anc->cmsg_type == TIPC_RETDATA) { > - /* If we set TIPC_DEST_DROPPABLE off messge > (configure TIPC to return rejected messages to the sender ) > + /* If we set TIPC_DEST_DROPPABLE off message > (configure TIPC to return rejected messages to the sender ) > we will hit this when we implement MDS > retransmit lost messages abort can be replaced with flow control logic*/ > for (i = anc->cmsg_len - sizeof(*anc); i > 0; > i--) { > - m_MDS_LOG_DBG("MDTM: returned byte > 0x%02x\n", *cptr); > + LOG_CR("MDTM: returned byte 0x%02x\n", > *cptr); > + m_MDS_LOG_CRITICAL("MDTM: returned byte > 0x%02x\n", *cptr); > cptr++; >
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Ok I need some time to re-check the TIPC code , will back to you soon. -AVM On 8/19/2016 1:46 PM, Hans Nordebäck wrote: > Hi Mahesh, > > there is a problem that TIPC may silently drop messages at overload > situations, as MDS uses the SOCK_RDM option. > > At least it has to be logged when messages are dropped. It is allowed > in TIPC to set TIPC_DROPPABLE=false and also > > use multicast. The concern may be that the send buffer size may also > be overloaded at receive buffer full, > > as the ancillary message has to be sent, this case is though very > unlikely. > > I'll update the patch with the logging of the returned message > removed, and only log that a message has been dropped, which > > should be enough for debugging purposes. > > /Thanks HansN > > > > On 08/18/2016 11:27 AM, A V Mahesh wrote: >> Hi HansN, >> >> It seem you missed to see below : >> >> On 8/12/2016 9:11 AM, A V Mahesh wrote: >>> Hi HansN, >>> >>> We were having ticket for this raised by Hans Feldt >>> `https://sourceforge.net/p/opensaf/tickets/634/` >>> >>> at that time i have give my analysis base the MDS code at that time >>> as below please check. >>> >>> >>> >>> >>> >>> The Linux TIPC 2.0 Programmer's Guide in section 1.5.7. Multicast >>> Message Delivery mention that. >>> >>> The TIPC currently does not permit an application to send a >>> multicast message with the "destination droppable" setting disabled. >>> Consequently, TIPC will never try to return an undeliverable >>> multicast message to its sender. >>> >>> so if we set destination droppable disabled , multicast is not >>> permitted >>> I experimented setting TIPC_DEST_DROPPABLE=off in multicast_demo and >>> observed that multicast is working >>> >>> As if The Opensaf using multicast , it is not allowed to set >>> TIPC_DEST_DROPPABLE=off >>> >>> == >>> >> >> >> >> So the TIPC_DEST_DROPPABLE should be enabled only if >> MDS_TIPC_MCAST_ENABLED is disabled, >> currently by default TIPC Multicast Messaging Setting enabled >> (MDS_TIPC_MCAST_ENABLED =1 ) >> in /etc/opensaf/nid.conf , if TIPC Multicast Messagingis disabled we >> can set TIPC_DEST_DROPPABLE >> dynamically. >> >> == >> >> >> # This is valid when above MDS_TRANSPORT is set to TIPC. >> # Setting MDS_TIPC_MCAST_ENABLED to 1 or 0, allows OpenSAF >> # to enable or disable TIPC Multicast Messaging. >> # By Default TIPC Multicast Messaging is Enabled. >> # Note: In case of TIPC Multicast Messaging disabled (0), the >> performance >> # of OpenSAF will be considerably lower as compared to Enabled (1). >> export MDS_TIPC_MCAST_ENABLED=1 >> >> == >> >> >> -AVM >> >> >> On 8/18/2016 2:43 PM, Hans Nordeback wrote: >>> osaf/libs/core/mds/mds_dt_tipc.c | 32 >>> +--- >>> 1 files changed, 25 insertions(+), 7 deletions(-) >>> >>> >>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >>> b/osaf/libs/core/mds/mds_dt_tipc.c >>> --- a/osaf/libs/core/mds/mds_dt_tipc.c >>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >>> m_MDS_LOG_INFO("MDTM: Successfully set default >>> socket option TIPC_IMP = %d", TIPCIMPORTANCE); >>> } >>> +int droppable = 0; >>> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >>> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero >>> err :%s\n", strerror(errno)); >>> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE >>> to zero err :%s\n", strerror(errno)); >>> +osafassert(0); >>> +} else { >>> +m_MDS_LOG_NOTIFY("MDTM: Successfully set >>> TIPC_DEST_DROPPABLE to zero"); >>> +} >>> + >>> return NCSCC_RC_SUCCESS; >>> } >>> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >>> unsigned char *cptr; >>> int i; >>> int has_addr; >>> +int anc_data[2]; >>> + >>> ssize_t sz; >>> has_addr = (from != NULL) && (addrlen != NULL); >>> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd, >>> if the message was sent using a TIPC name or name >>> sequence as the >>> destination rather than a TIPC port ID So abort for >>> TIPC_ERRINFO and TIPC_RETDATA*/ >>> if (anc->cmsg_type == TIPC_ERRINFO) { >>> -/* TIPC_ERRINFO - TIPC error code associated with a >>> returned data message or a connection termination message so abort */ >>> -m_MDS_LOG_CRITICAL("MDTM: undelivered message >>> condition ancillary data: TIPC_ERRINFO abort err :%s", >>> strerror(errn
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi Mahesh, there is a problem that TIPC may silently drop messages at overload situations, as MDS uses the SOCK_RDM option. At least it has to be logged when messages are dropped. It is allowed in TIPC to set TIPC_DROPPABLE=false and also use multicast. The concern may be that the send buffer size may also be overloaded at receive buffer full, as the ancillary message has to be sent, this case is though very unlikely. I'll update the patch with the logging of the returned message removed, and only log that a message has been dropped, which should be enough for debugging purposes. /Thanks HansN On 08/18/2016 11:27 AM, A V Mahesh wrote: > Hi HansN, > > It seem you missed to see below : > > On 8/12/2016 9:11 AM, A V Mahesh wrote: >> Hi HansN, >> >> We were having ticket for this raised by Hans Feldt >> `https://sourceforge.net/p/opensaf/tickets/634/` >> >> at that time i have give my analysis base the MDS code at that time >> as below please check. >> >> >> >> >> >> The Linux TIPC 2.0 Programmer's Guide in section 1.5.7. Multicast >> Message Delivery mention that. >> >> The TIPC currently does not permit an application to send a multicast >> message with the "destination droppable" setting disabled. >> Consequently, TIPC will never try to return an undeliverable >> multicast message to its sender. >> >> so if we set destination droppable disabled , multicast is not permitted >> I experimented setting TIPC_DEST_DROPPABLE=off in multicast_demo and >> observed that multicast is working >> >> As if The Opensaf using multicast , it is not allowed to set >> TIPC_DEST_DROPPABLE=off >> >> == >> > > > So the TIPC_DEST_DROPPABLE should be enabled only if > MDS_TIPC_MCAST_ENABLED is disabled, > currently by default TIPC Multicast Messaging Setting enabled > (MDS_TIPC_MCAST_ENABLED =1 ) > in /etc/opensaf/nid.conf , if TIPC Multicast Messagingis disabled we > can set TIPC_DEST_DROPPABLE > dynamically. > > == > > > # This is valid when above MDS_TRANSPORT is set to TIPC. > # Setting MDS_TIPC_MCAST_ENABLED to 1 or 0, allows OpenSAF > # to enable or disable TIPC Multicast Messaging. > # By Default TIPC Multicast Messaging is Enabled. > # Note: In case of TIPC Multicast Messaging disabled (0), the performance > # of OpenSAF will be considerably lower as compared to Enabled (1). > export MDS_TIPC_MCAST_ENABLED=1 > > == > > > -AVM > > > On 8/18/2016 2:43 PM, Hans Nordeback wrote: >> osaf/libs/core/mds/mds_dt_tipc.c | 32 >> +--- >> 1 files changed, 25 insertions(+), 7 deletions(-) >> >> >> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >> b/osaf/libs/core/mds/mds_dt_tipc.c >> --- a/osaf/libs/core/mds/mds_dt_tipc.c >> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >> m_MDS_LOG_INFO("MDTM: Successfully set default >> socket option TIPC_IMP = %d", TIPCIMPORTANCE); >> } >> +int droppable = 0; >> +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >> +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero >> err :%s\n", strerror(errno)); >> +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE >> to zero err :%s\n", strerror(errno)); >> +osafassert(0); >> +} else { >> +m_MDS_LOG_NOTIFY("MDTM: Successfully set >> TIPC_DEST_DROPPABLE to zero"); >> +} >> + >> return NCSCC_RC_SUCCESS; >> } >> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >> unsigned char *cptr; >> int i; >> int has_addr; >> +int anc_data[2]; >> + >> ssize_t sz; >> has_addr = (from != NULL) && (addrlen != NULL); >> @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd, >> if the message was sent using a TIPC name or name >> sequence as the >> destination rather than a TIPC port ID So abort for >> TIPC_ERRINFO and TIPC_RETDATA*/ >> if (anc->cmsg_type == TIPC_ERRINFO) { >> -/* TIPC_ERRINFO - TIPC error code associated with a >> returned data message or a connection termination message so abort */ >> -m_MDS_LOG_CRITICAL("MDTM: undelivered message >> condition ancillary data: TIPC_ERRINFO abort err :%s", >> strerror(errno) ); >> -abort(); >> +anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0)); >> +if (anc_data[0] == TIPC_ERR_OVERLOAD) { >> +LOG_CR("MDTM: undelivered message condition >> ancillary data: TIPC_ERR_OVERLOAD"); >> +
Re: [devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
Hi HansN, It seem you missed to see below : On 8/12/2016 9:11 AM, A V Mahesh wrote: > Hi HansN, > > We were having ticket for this raised by Hans Feldt > `https://sourceforge.net/p/opensaf/tickets/634/` > > at that time i have give my analysis base the MDS code at that time as > below please check. > > > > > > The Linux TIPC 2.0 Programmer's Guide in section 1.5.7. Multicast > Message Delivery mention that. > > The TIPC currently does not permit an application to send a multicast > message with the "destination droppable" setting disabled. > Consequently, TIPC will never try to return an undeliverable multicast > message to its sender. > > so if we set destination droppable disabled , multicast is not permitted > I experimented setting TIPC_DEST_DROPPABLE=off in multicast_demo and > observed that multicast is working > > As if The Opensaf using multicast , it is not allowed to set > TIPC_DEST_DROPPABLE=off > > == > So the TIPC_DEST_DROPPABLE should be enabled only if MDS_TIPC_MCAST_ENABLED is disabled, currently by default TIPC Multicast Messaging Setting enabled (MDS_TIPC_MCAST_ENABLED =1 ) in /etc/opensaf/nid.conf , if TIPC Multicast Messagingis disabled we can set TIPC_DEST_DROPPABLE dynamically. == # This is valid when above MDS_TRANSPORT is set to TIPC. # Setting MDS_TIPC_MCAST_ENABLED to 1 or 0, allows OpenSAF # to enable or disable TIPC Multicast Messaging. # By Default TIPC Multicast Messaging is Enabled. # Note: In case of TIPC Multicast Messaging disabled (0), the performance # of OpenSAF will be considerably lower as compared to Enabled (1). export MDS_TIPC_MCAST_ENABLED=1 == -AVM On 8/18/2016 2:43 PM, Hans Nordeback wrote: > osaf/libs/core/mds/mds_dt_tipc.c | 32 +--- > 1 files changed, 25 insertions(+), 7 deletions(-) > > > diff --git a/osaf/libs/core/mds/mds_dt_tipc.c > b/osaf/libs/core/mds/mds_dt_tipc.c > --- a/osaf/libs/core/mds/mds_dt_tipc.c > +++ b/osaf/libs/core/mds/mds_dt_tipc.c > @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, > m_MDS_LOG_INFO("MDTM: Successfully set default socket > option TIPC_IMP = %d", TIPCIMPORTANCE); > } > > +int droppable = 0; > +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, TIPC_DEST_DROPPABLE, > &droppable, sizeof(droppable)) != 0) { > +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero err > :%s\n", strerror(errno)); > +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE to zero > err :%s\n", strerror(errno)); > +osafassert(0); > +} else { > +m_MDS_LOG_NOTIFY("MDTM: Successfully set TIPC_DEST_DROPPABLE > to zero"); > +} > + > return NCSCC_RC_SUCCESS; > } > > @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, > unsigned char *cptr; > int i; > int has_addr; > + int anc_data[2]; > + > ssize_t sz; > > has_addr = (from != NULL) && (addrlen != NULL); > @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd, > if the message was sent using a TIPC name or name > sequence as the > destination rather than a TIPC port ID So abort for > TIPC_ERRINFO and TIPC_RETDATA*/ > if (anc->cmsg_type == TIPC_ERRINFO) { > - /* TIPC_ERRINFO - TIPC error code associated > with a returned data message or a connection termination message so abort */ > - m_MDS_LOG_CRITICAL("MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) ); > - abort(); > + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) > + 0)); > + if (anc_data[0] == TIPC_ERR_OVERLOAD) { > + LOG_CR("MDTM: undelivered message > condition ancillary data: TIPC_ERR_OVERLOAD"); > + m_MDS_LOG_CRITICAL("MDTM: undelivered > message condition ancillary data: TIPC_ERR_OVERLOAD"); > + } else { > + /* TIPC_ERRINFO - TIPC error code > associated with a returned data message or a connection termination message > so abort */ > + LOG_CR("MDTM: undelivered message > condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); > + m_MDS_LOG_CRITICAL("MDTM: undelivered > message condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); > + } >
[devel] [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
osaf/libs/core/mds/mds_dt_tipc.c | 32 +--- 1 files changed, 25 insertions(+), 7 deletions(-) diff --git a/osaf/libs/core/mds/mds_dt_tipc.c b/osaf/libs/core/mds/mds_dt_tipc.c --- a/osaf/libs/core/mds/mds_dt_tipc.c +++ b/osaf/libs/core/mds/mds_dt_tipc.c @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, m_MDS_LOG_INFO("MDTM: Successfully set default socket option TIPC_IMP = %d", TIPCIMPORTANCE); } +int droppable = 0; +if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { +LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero err :%s\n", strerror(errno)); +m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE to zero err :%s\n", strerror(errno)); +osafassert(0); +} else { +m_MDS_LOG_NOTIFY("MDTM: Successfully set TIPC_DEST_DROPPABLE to zero"); +} + return NCSCC_RC_SUCCESS; } @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, unsigned char *cptr; int i; int has_addr; + int anc_data[2]; + ssize_t sz; has_addr = (from != NULL) && (addrlen != NULL); @@ -591,19 +602,26 @@ ssize_t recvfrom_connectionless (int sd, if the message was sent using a TIPC name or name sequence as the destination rather than a TIPC port ID So abort for TIPC_ERRINFO and TIPC_RETDATA*/ if (anc->cmsg_type == TIPC_ERRINFO) { - /* TIPC_ERRINFO - TIPC error code associated with a returned data message or a connection termination message so abort */ - m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) ); - abort(); + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0)); + if (anc_data[0] == TIPC_ERR_OVERLOAD) { + LOG_CR("MDTM: undelivered message condition ancillary data: TIPC_ERR_OVERLOAD"); + m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_ERR_OVERLOAD"); + } else { + /* TIPC_ERRINFO - TIPC error code associated with a returned data message or a connection termination message so abort */ + LOG_CR("MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); + m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); + } } else if (anc->cmsg_type == TIPC_RETDATA) { - /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return rejected messages to the sender ) + /* If we set TIPC_DEST_DROPPABLE off message (configure TIPC to return rejected messages to the sender ) we will hit this when we implement MDS retransmit lost messages abort can be replaced with flow control logic*/ for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { - m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); + LOG_CR("MDTM: returned byte 0x%02x\n", *cptr); + m_MDS_LOG_CRITICAL("MDTM: returned byte 0x%02x\n", *cptr); cptr++; } /* TIPC_RETDATA -The contents of a returned data message so abort */ - m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_RETDATA abort err :%s", strerror(errno) ); - abort(); + LOG_CR("MDTM: undelivered message condition ancillary data: TIPC_RETDATA"); + m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_RETDATA"); } else if (anc->cmsg_type == TIPC_DESTNAME) { if (sz == 0) { m_MDS_LOG_DBG("MDTM: recd bytes=0 on received on sock, abnormal/unknown condition. Ignoring"); -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel