Hi Mahesh, I have not tested this, but the following should work:
- Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE - set socket receive buffer to a small value: optval = "small socket recieive buffer size" , 5000 ? setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, &optval, optlen) - sysctl -w net.tipc.tipc_rmem="5000 40000000 68240400" (or smaller values) - add some delays when processing messages in mdtm_process_recv_events(), to provoke overloading the socket receive buffer. We experience dropped packages in a 75 node system, and as a workaround increasing the default so receive buffer size it seems working for that setup. /Thanks HansN On 09/01/2016 05:50 AM, A V Mahesh wrote: > Hi HansN, > > Do you have any tips to created overload case, > > I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled > cases. > > -AVM > > > On 9/1/2016 9:12 AM, A V Mahesh wrote: >> Hi HansN, >> >> Sorry for the delay. >> >> I will test it and get back to you soon. >> >> -AVM >> >> >> On 8/31/2016 4:29 PM, Hans Nordebäck wrote: >>> Hi Mahesh, >>> Any updates on this? >>> >>> /Regards HansN >>> >>> -----Original Message----- >>> From: Anders Widell >>> Sent: den 25 augusti 2016 13:11 >>> To: A V Mahesh <mahesh.va...@oracle.com>; Hans Nordebäck >>> <hans.nordeb...@ericsson.com>; mathi.naic...@oracle.com >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >>> >>> Hi! >>> >>> This is what the TIPC user documentation says about >>> TIPC_DEST_DROPPABLE: >>> "This option governs the handling of messages sent by the socket if >>> the message cannot be delivered to its destination, either because >>> the receiver is congested or because the specified receiver does not >>> exist. >>> If enabled, the message is discarded; otherwise the message is >>> returned to the sender." >>> >>> This is what the TIPC user documentation says about the return value >>> from the recvmsg() system call: "When used with a connectionless >>> socket, a return value of 0 indicates the arrival of a returned data >>> message that was originally sent by this socket." >>> >>> I think the documentation is pretty clear. If you set >>> TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. >>> when the receive buffer is full. The sender will not be notified in >>> this case. If TIPC_DEST_DROPPABLE is set to false, the message will >>> be returned to the sender in case of a full receive buffer. The >>> sender knows that it has received such a returned message when the >>> recvmsg() call returns zero. >>> >>> regards, >>> Anders Widell >>> >>> On 08/25/2016 11:30 AM, A V Mahesh wrote: >>>> Hi HansN, >>>> >>>> >>>> On 8/23/2016 5:22 PM, Hans Nordebäck wrote: >>>> >>>>> Hi Mahesh, >>>>> >>>>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may >>>>> drop messages silently, at receive sock buffer full condition, but >>>>> do not return any ancillary message. >>>>> If TIPC_DROPPABLE = false tipc may drop message but will send an >>>>> ancillary message to inform about TIPC_ERR_OVERLOAD. >>>> [AVM] >>>> >>>> My observation are understanding is different, based on TIPC code and >>>> Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error >>>> returned when TIPC is unable to enqueue an incoming message on the >>>> receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE >>>> enabled or disabled. >>>> >>>> The only difference between TIPC_DEST_DROPPABLE enabled or disabled is >>>> , If TIPC_DEST_DROPPABLE enabled, the message is discarded and >>>> recvmsg() returned size is ZERO and application will get errors, if >>>> TIPC_DEST_DROPPABLE disabled the message is returned to the sender it >>>> means the recvmsg() returned size is user send data size and >>>> application will get errors . >>>> >>>> I did check the TIPC code and documentations and I haven't get any >>>> evidences that TIPC_ERR_OVERLOAD error code will be send only If >>>> TIPC_DEST_DROPPABLE = false. >>>> >>>> Even while testing #1227 >>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my >>>> observations and understanding was, an individual TIPC socket is only >>>> allowed to queue up >>>> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before >>>> it starts rejecting them. >>>> Once a socket receiving queue length exceeds the maximum limit value, >>>> the receiving socket will send out a reject message with >>>> TIPC_ERR_OVERLOAD error code with cmsg_type as >>>> TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 >>>> Programmer's Guide confirmed the same . >>>> >>>> tipc/socket.c >>>> ======================================================= >>>> /* Reject message if there isn't room to queue it */ >>>> >>>> recv_q_len = (u32)atomic_read(&tipc_queue_size); >>>> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { >>>> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) >>>> return TIPC_ERR_OVERLOAD; >>>> } >>>> recv_q_len = skb_queue_len(&sk->sk_receive_queue); >>>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) { >>>> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2)) >>>> return TIPC_ERR_OVERLOAD; >>>> } >>>> ======================================================= >>>> >>>> >>>> 2.1.17. setsockopt() of TIPC 2.0 Programmer's Guide >>>> ======================================================= >>>> TIPC_DEST_DROPPABLE >>>> This option governs the handling of messages sent by the socket if the >>>> message cannot be delivered to its destination, either because the >>>> receiver is congested or because the specified receiver does not >>>> exist. If enabled, the message is discarded; otherwise the message is >>>> returned to the sender. >>>> >>>> By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM >>>> socket types, and enabled for SOCK_RDM and SOCK_DGRAM, This >>>> arrangement ensures proper teardown of failed connections when >>>> connection-oriented data transfer is used, without increasing the >>>> complexity of connectionless data transfer. >>>> >>>> TIPC_SRC_DROPPABLE >>>> This option governs the handling of messages sent by the socket if >>>> link congestion occurs. If enabled, the message is discarded; >>>> otherwise the system queues the message for later transmission. >>>> By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM, >>>> and SOCK_RDM socket types (resulting in "reliable" data transfer), and >>>> enabled for SOCK_DGRAM (resulting in "unreliable" data transfer). >>>> ======================================================= >>>> >>>> Now I will try to create OVERLOAD case and update you soon my latest >>>> observations. >>>> >>>> -AVM >>>> >>>>> Correcting this and adding an abort is not backward compatible as >>>>> some service already handle flow control in some way, only log when >>>>> packages are dropped. >>>>> Regarding ticket #1960 there are other solutions than introducing >>>>> flow control in MDS, e.g. expose an option to the service to choose >>>>> connection oriented or connection less. >>>>> The problem with dropped messages seems in one case related to, (by >>>>> MDS), intensive MDS logging. >>>>> >>>>> /Thanks HansN >>>>> -----Original Message----- >>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>>> Sent: den 23 augusti 2016 11:27 >>>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell >>>>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com >>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >>>>> >>>>> Hi HansN, >>>>> >>>>> It seems I am missing some thing , please allow me to under stand >>>>> >>>>> If I currently understand you observation : >>>>> >>>>> With current Opensaf code ( this #1957 patch NOT applied ) , by >>>>> default TIPC_DROPPABLE=true ,while running Opensaf with that binary >>>>> when TIPC_ERR_OVERLOAD occurring, TIPC is not given errors >>>>> TIPC_ERRINFO or TIPC_RETDATA and following code is not being get hit >>>>> of function recvfrom_connectionless(), is my understanding right ? >>>>> >>>>> ===================================================================== >>>>> ======================================== >>>>> >>>>> >>>>> *if (anc->cmsg_type == TIPC_ERRINFO) {* >>>>> /* TIPC_ERRINFO - TIPC error code associated with a returned >>>>> data message or a connection termination message so abort */ >>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>>> ancillary >>>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) ); >>>>> *abort();* >>>>> *} else if (anc->cmsg_type == TIPC_RETDATA) {* >>>>> /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to >>>>> return rejected messages to the sender ) >>>>> we will hit this when we implement MDS retransmit lost >>>>> messages abort can be replaced with flow control logic*/ >>>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>>>> m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); >>>>> cptr++; >>>>> } >>>>> /* TIPC_RETDATA -The contents of a returned data message so >>>>> abort */ >>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>>> ancillary >>>>> data: TIPC_RETDATA abort err :%s", strerror(errno) ); >>>>> *abort();* >>>>> } >>>>> >>>>> ===================================================================== >>>>> ======================================== >>>>> >>>>> >>>>> -AVM >>>>> >>>>> >>>>> On 8/23/2016 1:08 PM, Hans Nordebäck wrote: >>>>>> Hi Mahesh, >>>>>> >>>>>> Please see response below with [HansN] /Thanks HansN >>>>>> >>>>>> -----Original Message----- >>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>>>> Sent: den 23 augusti 2016 08:25 >>>>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell >>>>>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com >>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >>>>>> >>>>>> Hi HansN >>>>>> >>>>>> Please see response below with [AVM] >>>>>> >>>>>> -AVM >>>>>> >>>>>> On 8/23/2016 11:41 AM, Hans Nordebäck wrote: >>>>>>> Hi Mahesh, >>>>>>> >>>>>>> please see comments below. >>>>>>> >>>>>>> /Thanks HansN >>>>>>> >>>>>>> >>>>>>> On 08/23/2016 07:21 AM, A V Mahesh wrote: >>>>>>>> Hi HansN, >>>>>>>> >>>>>>>> Let us fist discuss the error handling and abort, then we can come >>>>>>>> back to interpretation of TIPC currently does permit OR does >>>>>>>> not permit an application to send a multicast message with the >>>>>>>> "destination droppable" setting disabled. >>>>>>>> >>>>>>>> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to >>>>>>>> return an undelivered multicast message to its sender and we can >>>>>>>> determine issue is because of TIPC_ERR_OVERLOAD, this helps in >>>>>>>> debugging , so that application may increased SO_SNDBUF/SO_RCVBUF >>>>>>>> to reduce the problem. >>>>>>>> >>>>>>>> But still we need to abort(), the reason for that is current MDS >>>>>>>> implementations doesn't have flow control logic ( no retry because >>>>>>>> of error ) , so Application like AMF can go wrong and cluster will >>>>>>>> go into unstable/recoverble state. >>>>>>>> >>>>>>> [HansN] In the current implementation messages are dropped silently >>>>>>> and no abort is done. >>>>>> [AVM] I can see abort(); in current code , you mean abort(); is >>>>>> not working and application(amf) is not existing ? >>>>>> [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, >>>>>> (TIPC_ERR_OVERLOAD) no abort is be performed, e.g amfd detects this >>>>>> in the msg sanity chk and logs "invalid msg id ..." >>>>>> ==================================================================== >>>>>> == >>>>>> ====== >>>>>> if (anc->cmsg_type == TIPC_ERRINFO) { >>>>>> /* TIPC_ERRINFO - TIPC error code associated with a returned >>>>>> data message or a connection termination message so abort */ >>>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>>>> ancillary >>>>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) ); >>>>>> *abort();* >>>>>> } else if (anc->cmsg_type == TIPC_RETDATA) { >>>>>> /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC >>>>>> to return rejected messages to the sender ) >>>>>> we will hit this when we implement MDS retransmit lost >>>>>> messages abort can be replaced with flow control logic*/ >>>>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>>>>> m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); >>>>>> cptr++; >>>>>> } >>>>>> /* TIPC_RETDATA -The contents of a returned data message so >>>>>> abort */ >>>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>>>> ancillary >>>>>> data: TIPC_RETDATA abort err :%s", strerror(errno) ); >>>>>> *abort();* >>>>>> } >>>>>> ==================================================================== >>>>>> == >>>>>> ====== >>>>>>> This patch enables logging >>>>>>> when packages are dropped to help in debugging. I don't agree that >>>>>>> we should also introduce abort, but instead: >>>>>>> 1) Implement a solution to handle dropped packages, ticket #1960 >>>>>> [AVM] This is nothing but flow control implementation in MDS, this >>>>>> is future enhancement >>>>>> >>>>>>> 2) Investigate why packages may be dropped, the receiving MDS >>>>>>> thread is a real time thread and should be able to consume a large >>>>>>> amount of incoming messages. >>>>>>> E.g. is the receiving MDS thread "live hanging" due to locks, file >>>>>>> I/O etc? >>>>>>>> This was the reason we haven't gone for it while addressing Ticket >>>>>>>> #1227 >>>>>>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) >>>>>>>> So currently we don't have any advantage of disabling >>>>>>>> TIPC_DEST_DROPPABLE and not allowing multicast messages. >>>>>>>> >>>>>>>> -AVM >>>>>>>> >>>>>>>> >>>>>>>> On 8/18/2016 2:43 PM, Hans Nordeback wrote: >>>>>>>>> osaf/libs/core/mds/mds_dt_tipc.c | 32 >>>>>>>>> +++++++++++++++++++++++++------- >>>>>>>>> 1 files changed, 25 insertions(+), 7 deletions(-) >>>>>>>>> >>>>>>>>> >>>>>>>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>>> b/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >>>>>>>>> m_MDS_LOG_INFO("MDTM: Successfully set >>>>>>>>> default socket option TIPC_IMP = %d", TIPCIMPORTANCE); >>>>>>>>> } >>>>>>>>> + int droppable = 0; >>>>>>>>> + if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >>>>>>>>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >>>>>>>>> + LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to >>>>>>>>> + zero >>>>>>>>> err :%s\n", strerror(errno)); >>>>>>>>> + m_MDS_LOG_ERR("MDTM: Can't set >>>>>>>>> + TIPC_DEST_DROPPABLE >>>>>>>>> to zero err :%s\n", strerror(errno)); >>>>>>>>> + osafassert(0); >>>>>>>>> + } else { >>>>>>>>> + m_MDS_LOG_NOTIFY("MDTM: Successfully set >>>>>>>>> TIPC_DEST_DROPPABLE to zero"); >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> return NCSCC_RC_SUCCESS; >>>>>>>>> } >>>>>>>>> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >>>>>>>>> unsigned char *cptr; >>>>>>>>> int i; >>>>>>>>> int has_addr; >>>>>>>>> + int anc_data[2]; >>>>>>>>> + >>>>>>>>> ssize_t sz; >>>>>>>>> has_addr = (from != NULL) && (addrlen != NULL); @@ >>>>>>>>> -591,19 >>>>>>>>> +602,26 @@ ssize_t recvfrom_connectionless (int sd, >>>>>>>>> if the message was sent using a TIPC name or >>>>>>>>> name sequence as the >>>>>>>>> destination rather than a TIPC port ID So >>>>>>>>> abort for TIPC_ERRINFO and TIPC_RETDATA*/ >>>>>>>>> if (anc->cmsg_type == TIPC_ERRINFO) { >>>>>>>>> - /* TIPC_ERRINFO - TIPC error code associated >>>>>>>>> with a >>>>>>>>> returned data message or a connection termination message so >>>>>>>>> abort */ >>>>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>> condition ancillary data: TIPC_ERRINFO abort err :%s", >>>>>>>>> strerror(errno) ); >>>>>>>>> - abort(); >>>>>>>>> + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + >>>>>>>>> 0)); >>>>>>>>> + if (anc_data[0] == TIPC_ERR_OVERLOAD) { >>>>>>>>> + LOG_CR("MDTM: undelivered message condition >>>>>>>>> ancillary data: TIPC_ERR_OVERLOAD"); >>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered >>>>>>>>> + message >>>>>>>>> condition ancillary data: TIPC_ERR_OVERLOAD"); >>>>>>>>> + } else { >>>>>>>>> + /* TIPC_ERRINFO - TIPC error code associated >>>>>>>>> with a returned data message or a connection termination message >>>>>>>>> so abort */ >>>>>>>>> + LOG_CR("MDTM: undelivered message condition >>>>>>>>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered >>>>>>>>> + message >>>>>>>>> condition ancillary data: TIPC_ERRINFO abort err : %d", >>>>>>>>> anc_data[0]); >>>>>>>>> + } >>>>>>>>> } else if (anc->cmsg_type == TIPC_RETDATA) { >>>>>>>>> - /* If we set TIPC_DEST_DROPPABLE off messge >>>>>>>>> (configure TIPC to return rejected messages to the sender ) >>>>>>>>> + /* If we set TIPC_DEST_DROPPABLE off message >>>>>>>>> (configure TIPC to return rejected messages to the sender ) >>>>>>>>> we will hit this when we implement MDS >>>>>>>>> retransmit lost messages abort can be replaced with flow control >>>>>>>>> logic*/ >>>>>>>>> for (i = anc->cmsg_len - sizeof(*anc); i >>>>>>>>> > 0; >>>>>>>>> i--) { >>>>>>>>> - m_MDS_LOG_DBG("MDTM: returned byte >>>>>>>>> 0x%02x\n", >>>>>>>>> *cptr); >>>>>>>>> + LOG_CR("MDTM: returned byte 0x%02x\n", >>>>>>>>> *cptr); >>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: returned byte >>>>>>>>> 0x%02x\n", *cptr); >>>>>>>>> cptr++; >>>>>>>>> } >>>>>>>>> /* TIPC_RETDATA -The contents of a returned >>>>>>>>> data message so abort */ >>>>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>> condition ancillary data: TIPC_RETDATA abort err :%s", >>>>>>>>> strerror(errno) ); >>>>>>>>> - abort(); >>>>>>>>> + LOG_CR("MDTM: undelivered message condition >>>>>>>>> ancillary data: TIPC_RETDATA"); >>>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>>> condition ancillary data: TIPC_RETDATA"); >>>>>>>>> } else if (anc->cmsg_type == TIPC_DESTNAME) { >>>>>>>>> if (sz == 0) { >>>>>>>>> m_MDS_LOG_DBG("MDTM: recd bytes=0 on >>>>>>>>> received on sock, abnormal/unknown condition. Ignoring"); >>> >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel