Hi HansN, Sorry for the delay.
I will test it and get back to you soon. -AVM On 8/31/2016 4:29 PM, Hans Nordebäck wrote: > Hi Mahesh, > Any updates on this? > > /Regards HansN > > -----Original Message----- > From: Anders Widell > Sent: den 25 augusti 2016 13:11 > To: A V Mahesh <mahesh.va...@oracle.com>; Hans Nordebäck > <hans.nordeb...@ericsson.com>; mathi.naic...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] > > Hi! > > This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE: > "This option governs the handling of messages sent by the socket if the > message cannot be delivered to its destination, either because the receiver > is congested or because the specified receiver does not exist. > If enabled, the message is discarded; otherwise the message is returned to > the sender." > > This is what the TIPC user documentation says about the return value from the > recvmsg() system call: "When used with a connectionless socket, a return > value of 0 indicates the arrival of a returned data message that was > originally sent by this socket." > > I think the documentation is pretty clear. If you set TIPC_DEST_DROPPABLE to > true, the receiver can discard messages e.g. when the receive buffer is full. > The sender will not be notified in this case. If TIPC_DEST_DROPPABLE is set > to false, the message will be returned to the sender in case of a full > receive buffer. The sender knows that it has received such a returned message > when the recvmsg() call returns zero. > > regards, > Anders Widell > > On 08/25/2016 11:30 AM, A V Mahesh wrote: >> Hi HansN, >> >> >> On 8/23/2016 5:22 PM, Hans Nordebäck wrote: >> >>> Hi Mahesh, >>> >>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may >>> drop messages silently, at receive sock buffer full condition, but >>> do not return any ancillary message. >>> If TIPC_DROPPABLE = false tipc may drop message but will send an >>> ancillary message to inform about TIPC_ERR_OVERLOAD. >> [AVM] >> >> My observation are understanding is different, based on TIPC code and >> Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error >> returned when TIPC is unable to enqueue an incoming message on the >> receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE >> enabled or disabled. >> >> The only difference between TIPC_DEST_DROPPABLE enabled or disabled is >> , If TIPC_DEST_DROPPABLE enabled, the message is discarded and >> recvmsg() returned size is ZERO and application will get errors, if >> TIPC_DEST_DROPPABLE disabled the message is returned to the sender it >> means the recvmsg() returned size is user send data size and >> application will get errors . >> >> I did check the TIPC code and documentations and I haven't get any >> evidences that TIPC_ERR_OVERLOAD error code will be send only If >> TIPC_DEST_DROPPABLE = false. >> >> Even while testing #1227 >> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my >> observations and understanding was, an individual TIPC socket is only >> allowed to queue up >> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before >> it starts rejecting them. >> Once a socket receiving queue length exceeds the maximum limit value, >> the receiving socket will send out a reject message with >> TIPC_ERR_OVERLOAD error code with cmsg_type as >> TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 >> Programmer's Guide confirmed the same . >> >> tipc/socket.c >> ======================================================= >> /* Reject message if there isn't room to queue it */ >> >> recv_q_len = (u32)atomic_read(&tipc_queue_size); >> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { >> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) >> return TIPC_ERR_OVERLOAD; >> } >> recv_q_len = skb_queue_len(&sk->sk_receive_queue); >> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) { >> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2)) >> return TIPC_ERR_OVERLOAD; >> } >> ======================================================= >> >> >> 2.1.17. setsockopt() of TIPC 2.0 Programmer's Guide >> ======================================================= >> TIPC_DEST_DROPPABLE >> This option governs the handling of messages sent by the socket if the >> message cannot be delivered to its destination, either because the >> receiver is congested or because the specified receiver does not >> exist. If enabled, the message is discarded; otherwise the message is >> returned to the sender. >> >> By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM >> socket types, and enabled for SOCK_RDM and SOCK_DGRAM, This >> arrangement ensures proper teardown of failed connections when >> connection-oriented data transfer is used, without increasing the >> complexity of connectionless data transfer. >> >> TIPC_SRC_DROPPABLE >> This option governs the handling of messages sent by the socket if >> link congestion occurs. If enabled, the message is discarded; >> otherwise the system queues the message for later transmission. >> By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM, >> and SOCK_RDM socket types (resulting in "reliable" data transfer), and >> enabled for SOCK_DGRAM (resulting in "unreliable" data transfer). >> ======================================================= >> >> Now I will try to create OVERLOAD case and update you soon my latest >> observations. >> >> -AVM >> >>> Correcting this and adding an abort is not backward compatible as >>> some service already handle flow control in some way, only log when >>> packages are dropped. >>> Regarding ticket #1960 there are other solutions than introducing >>> flow control in MDS, e.g. expose an option to the service to choose >>> connection oriented or connection less. >>> The problem with dropped messages seems in one case related to, (by >>> MDS), intensive MDS logging. >>> >>> /Thanks HansN >>> -----Original Message----- >>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>> Sent: den 23 augusti 2016 11:27 >>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell >>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >>> >>> Hi HansN, >>> >>> It seems I am missing some thing , please allow me to under stand >>> >>> If I currently understand you observation : >>> >>> With current Opensaf code ( this #1957 patch NOT applied ) , by >>> default TIPC_DROPPABLE=true ,while running Opensaf with that binary >>> when TIPC_ERR_OVERLOAD occurring, TIPC is not given errors >>> TIPC_ERRINFO or TIPC_RETDATA and following code is not being get hit >>> of function recvfrom_connectionless(), is my understanding right ? >>> >>> ===================================================================== >>> ======================================== >>> >>> >>> *if (anc->cmsg_type == TIPC_ERRINFO) {* >>> /* TIPC_ERRINFO - TIPC error code associated with a returned >>> data message or a connection termination message so abort */ >>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>> ancillary >>> data: TIPC_ERRINFO abort err :%s", strerror(errno) ); >>> *abort();* >>> *} else if (anc->cmsg_type == TIPC_RETDATA) {* >>> /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to >>> return rejected messages to the sender ) >>> we will hit this when we implement MDS retransmit lost >>> messages abort can be replaced with flow control logic*/ >>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>> m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); >>> cptr++; >>> } >>> /* TIPC_RETDATA -The contents of a returned data message so >>> abort */ >>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>> ancillary >>> data: TIPC_RETDATA abort err :%s", strerror(errno) ); >>> *abort();* >>> } >>> >>> ===================================================================== >>> ======================================== >>> >>> >>> -AVM >>> >>> >>> On 8/23/2016 1:08 PM, Hans Nordebäck wrote: >>>> Hi Mahesh, >>>> >>>> Please see response below with [HansN] /Thanks HansN >>>> >>>> -----Original Message----- >>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>> Sent: den 23 augusti 2016 08:25 >>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell >>>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com >>>> Cc: opensaf-devel@lists.sourceforge.net >>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >>>> >>>> Hi HansN >>>> >>>> Please see response below with [AVM] >>>> >>>> -AVM >>>> >>>> On 8/23/2016 11:41 AM, Hans Nordebäck wrote: >>>>> Hi Mahesh, >>>>> >>>>> please see comments below. >>>>> >>>>> /Thanks HansN >>>>> >>>>> >>>>> On 08/23/2016 07:21 AM, A V Mahesh wrote: >>>>>> Hi HansN, >>>>>> >>>>>> Let us fist discuss the error handling and abort, then we can come >>>>>> back to interpretation of TIPC currently does permit OR does >>>>>> not permit an application to send a multicast message with the >>>>>> "destination droppable" setting disabled. >>>>>> >>>>>> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to >>>>>> return an undelivered multicast message to its sender and we can >>>>>> determine issue is because of TIPC_ERR_OVERLOAD, this helps in >>>>>> debugging , so that application may increased SO_SNDBUF/SO_RCVBUF >>>>>> to reduce the problem. >>>>>> >>>>>> But still we need to abort(), the reason for that is current MDS >>>>>> implementations doesn't have flow control logic ( no retry because >>>>>> of error ) , so Application like AMF can go wrong and cluster will >>>>>> go into unstable/recoverble state. >>>>>> >>>>> [HansN] In the current implementation messages are dropped silently >>>>> and no abort is done. >>>> [AVM] I can see abort(); in current code , you mean abort(); is >>>> not working and application(amf) is not existing ? >>>> [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, >>>> (TIPC_ERR_OVERLOAD) no abort is be performed, e.g amfd detects this >>>> in the msg sanity chk and logs "invalid msg id ..." >>>> ==================================================================== >>>> == >>>> ====== >>>> if (anc->cmsg_type == TIPC_ERRINFO) { >>>> /* TIPC_ERRINFO - TIPC error code associated with a returned >>>> data message or a connection termination message so abort */ >>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>> ancillary >>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) ); >>>> *abort();* >>>> } else if (anc->cmsg_type == TIPC_RETDATA) { >>>> /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC >>>> to return rejected messages to the sender ) >>>> we will hit this when we implement MDS retransmit lost >>>> messages abort can be replaced with flow control logic*/ >>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>>> m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); >>>> cptr++; >>>> } >>>> /* TIPC_RETDATA -The contents of a returned data message so >>>> abort */ >>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>> ancillary >>>> data: TIPC_RETDATA abort err :%s", strerror(errno) ); >>>> *abort();* >>>> } >>>> ==================================================================== >>>> == >>>> ====== >>>>> This patch enables logging >>>>> when packages are dropped to help in debugging. I don't agree that >>>>> we should also introduce abort, but instead: >>>>> 1) Implement a solution to handle dropped packages, ticket #1960 >>>> [AVM] This is nothing but flow control implementation in MDS, this >>>> is future enhancement >>>> >>>>> 2) Investigate why packages may be dropped, the receiving MDS >>>>> thread is a real time thread and should be able to consume a large >>>>> amount of incoming messages. >>>>> E.g. is the receiving MDS thread "live hanging" due to locks, file >>>>> I/O etc? >>>>>> This was the reason we haven't gone for it while addressing Ticket >>>>>> #1227 >>>>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) >>>>>> So currently we don't have any advantage of disabling >>>>>> TIPC_DEST_DROPPABLE and not allowing multicast messages. >>>>>> >>>>>> -AVM >>>>>> >>>>>> >>>>>> On 8/18/2016 2:43 PM, Hans Nordeback wrote: >>>>>>> osaf/libs/core/mds/mds_dt_tipc.c | 32 >>>>>>> +++++++++++++++++++++++++------- >>>>>>> 1 files changed, 25 insertions(+), 7 deletions(-) >>>>>>> >>>>>>> >>>>>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>> b/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >>>>>>> m_MDS_LOG_INFO("MDTM: Successfully set >>>>>>> default socket option TIPC_IMP = %d", TIPCIMPORTANCE); >>>>>>> } >>>>>>> + int droppable = 0; >>>>>>> + if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >>>>>>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >>>>>>> + LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to >>>>>>> + zero >>>>>>> err :%s\n", strerror(errno)); >>>>>>> + m_MDS_LOG_ERR("MDTM: Can't set >>>>>>> + TIPC_DEST_DROPPABLE >>>>>>> to zero err :%s\n", strerror(errno)); >>>>>>> + osafassert(0); >>>>>>> + } else { >>>>>>> + m_MDS_LOG_NOTIFY("MDTM: Successfully set >>>>>>> TIPC_DEST_DROPPABLE to zero"); >>>>>>> + } >>>>>>> + >>>>>>> return NCSCC_RC_SUCCESS; >>>>>>> } >>>>>>> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >>>>>>> unsigned char *cptr; >>>>>>> int i; >>>>>>> int has_addr; >>>>>>> + int anc_data[2]; >>>>>>> + >>>>>>> ssize_t sz; >>>>>>> has_addr = (from != NULL) && (addrlen != NULL); @@ >>>>>>> -591,19 >>>>>>> +602,26 @@ ssize_t recvfrom_connectionless (int sd, >>>>>>> if the message was sent using a TIPC name or >>>>>>> name sequence as the >>>>>>> destination rather than a TIPC port ID So >>>>>>> abort for TIPC_ERRINFO and TIPC_RETDATA*/ >>>>>>> if (anc->cmsg_type == TIPC_ERRINFO) { >>>>>>> - /* TIPC_ERRINFO - TIPC error code associated with a >>>>>>> returned data message or a connection termination message so >>>>>>> abort */ >>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>> condition ancillary data: TIPC_ERRINFO abort err :%s", >>>>>>> strerror(errno) ); >>>>>>> - abort(); >>>>>>> + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + >>>>>>> 0)); >>>>>>> + if (anc_data[0] == TIPC_ERR_OVERLOAD) { >>>>>>> + LOG_CR("MDTM: undelivered message condition >>>>>>> ancillary data: TIPC_ERR_OVERLOAD"); >>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered >>>>>>> + message >>>>>>> condition ancillary data: TIPC_ERR_OVERLOAD"); >>>>>>> + } else { >>>>>>> + /* TIPC_ERRINFO - TIPC error code associated >>>>>>> with a returned data message or a connection termination message >>>>>>> so abort */ >>>>>>> + LOG_CR("MDTM: undelivered message condition >>>>>>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered >>>>>>> + message >>>>>>> condition ancillary data: TIPC_ERRINFO abort err : %d", >>>>>>> anc_data[0]); >>>>>>> + } >>>>>>> } else if (anc->cmsg_type == TIPC_RETDATA) { >>>>>>> - /* If we set TIPC_DEST_DROPPABLE off messge >>>>>>> (configure TIPC to return rejected messages to the sender ) >>>>>>> + /* If we set TIPC_DEST_DROPPABLE off message >>>>>>> (configure TIPC to return rejected messages to the sender ) >>>>>>> we will hit this when we implement MDS >>>>>>> retransmit lost messages abort can be replaced with flow control >>>>>>> logic*/ >>>>>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; >>>>>>> i--) { >>>>>>> - m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", >>>>>>> *cptr); >>>>>>> + LOG_CR("MDTM: returned byte 0x%02x\n", *cptr); >>>>>>> + m_MDS_LOG_CRITICAL("MDTM: returned byte >>>>>>> 0x%02x\n", *cptr); >>>>>>> cptr++; >>>>>>> } >>>>>>> /* TIPC_RETDATA -The contents of a returned >>>>>>> data message so abort */ >>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>> condition ancillary data: TIPC_RETDATA abort err :%s", >>>>>>> strerror(errno) ); >>>>>>> - abort(); >>>>>>> + LOG_CR("MDTM: undelivered message condition >>>>>>> ancillary data: TIPC_RETDATA"); >>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>> condition ancillary data: TIPC_RETDATA"); >>>>>>> } else if (anc->cmsg_type == TIPC_DESTNAME) { >>>>>>> if (sz == 0) { >>>>>>> m_MDS_LOG_DBG("MDTM: recd bytes=0 on >>>>>>> received on sock, abnormal/unknown condition. Ignoring"); > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel