Hi HansN, It seems I am missing some thing , please allow me to under stand
If I currently understand you observation : With current Opensaf code ( this #1957 patch NOT applied ) , by default TIPC_DROPPABLE=true ,while running Opensaf with that binary when TIPC_ERR_OVERLOAD occurring, TIPC is not given errors TIPC_ERRINFO or TIPC_RETDATA and following code is not being get hit of function recvfrom_connectionless(), is my understanding right ? ============================================================================================================= *if (anc->cmsg_type == TIPC_ERRINFO) {* /* TIPC_ERRINFO - TIPC error code associated with a returned data message or a connection termination message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) ); *abort();* *} else if (anc->cmsg_type == TIPC_RETDATA) {* /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return rejected messages to the sender ) we will hit this when we implement MDS retransmit lost messages abort can be replaced with flow control logic*/ for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); cptr++; } /* TIPC_RETDATA -The contents of a returned data message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_RETDATA abort err :%s", strerror(errno) ); *abort();* } ============================================================================================================= -AVM On 8/23/2016 1:08 PM, Hans Nordebäck wrote: > Hi Mahesh, > > Please see response below with [HansN] > /Thanks HansN > > -----Original Message----- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: den 23 augusti 2016 08:25 > To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell > <anders.wid...@ericsson.com>; mathi.naic...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] > > Hi HansN > > Please see response below with [AVM] > > -AVM > > On 8/23/2016 11:41 AM, Hans Nordebäck wrote: >> Hi Mahesh, >> >> please see comments below. >> >> /Thanks HansN >> >> >> On 08/23/2016 07:21 AM, A V Mahesh wrote: >>> Hi HansN, >>> >>> Let us fist discuss the error handling and abort, then we can come >>> back to interpretation of TIPC currently does permit OR does not >>> permit an application to send a multicast message with the >>> "destination droppable" setting disabled. >>> >>> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to return >>> an undelivered multicast message to its sender and we can determine >>> issue is because of TIPC_ERR_OVERLOAD, this helps in debugging , so >>> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the >>> problem. >>> >>> But still we need to abort(), the reason for that is current MDS >>> implementations doesn't have flow control logic ( no retry because of >>> error ) , so Application like AMF can go wrong and cluster will go >>> into unstable/recoverble state. >>> >> [HansN] In the current implementation messages are dropped silently >> and no abort is done. > [AVM] I can see abort(); in current code , you mean abort(); is not working > and application(amf) is not existing ? > [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, > (TIPC_ERR_OVERLOAD) no abort is be performed, e.g > amfd detects this in the msg sanity chk and logs "invalid msg id ..." > ============================================================================ > if (anc->cmsg_type == TIPC_ERRINFO) { > /* TIPC_ERRINFO - TIPC error code associated with a returned data > message or a connection termination message so abort */ > m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary > data: TIPC_ERRINFO abort err :%s", strerror(errno) ); > *abort();* > } else if (anc->cmsg_type == TIPC_RETDATA) { > /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return > rejected messages to the sender ) > we will hit this when we implement MDS retransmit lost messages > abort can be replaced with flow control logic*/ > for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { > m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); > cptr++; > } > /* TIPC_RETDATA -The contents of a returned data message so abort */ > m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary > data: TIPC_RETDATA abort err :%s", strerror(errno) ); > *abort();* > } > ============================================================================ >> This patch enables logging >> when packages are dropped to help in debugging. I don't agree that we >> should also introduce abort, but instead: >> 1) Implement a solution to handle dropped packages, ticket #1960 > [AVM] This is nothing but flow control implementation in MDS, this is future > enhancement > >> 2) Investigate why packages may be dropped, the receiving MDS thread >> is a real time thread and should be able to consume a large amount of >> incoming messages. >> E.g. is the receiving MDS thread "live hanging" due to locks, file I/O >> etc? >>> This was the reason we haven't gone for it while addressing Ticket >>> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) >>> So currently we don't have any advantage of disabling >>> TIPC_DEST_DROPPABLE and not allowing multicast messages. >>> >>> -AVM >>> >>> >>> On 8/18/2016 2:43 PM, Hans Nordeback wrote: >>>> osaf/libs/core/mds/mds_dt_tipc.c | 32 >>>> +++++++++++++++++++++++++------- >>>> 1 files changed, 25 insertions(+), 7 deletions(-) >>>> >>>> >>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >>>> b/osaf/libs/core/mds/mds_dt_tipc.c >>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c >>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >>>> m_MDS_LOG_INFO("MDTM: Successfully set default >>>> socket option TIPC_IMP = %d", TIPCIMPORTANCE); >>>> } >>>> + int droppable = 0; >>>> + if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >>>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >>>> + LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero >>>> err :%s\n", strerror(errno)); >>>> + m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE >>>> to zero err :%s\n", strerror(errno)); >>>> + osafassert(0); >>>> + } else { >>>> + m_MDS_LOG_NOTIFY("MDTM: Successfully set >>>> TIPC_DEST_DROPPABLE to zero"); >>>> + } >>>> + >>>> return NCSCC_RC_SUCCESS; >>>> } >>>> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >>>> unsigned char *cptr; >>>> int i; >>>> int has_addr; >>>> + int anc_data[2]; >>>> + >>>> ssize_t sz; >>>> has_addr = (from != NULL) && (addrlen != NULL); @@ -591,19 >>>> +602,26 @@ ssize_t recvfrom_connectionless (int sd, >>>> if the message was sent using a TIPC name or name >>>> sequence as the >>>> destination rather than a TIPC port ID So abort for >>>> TIPC_ERRINFO and TIPC_RETDATA*/ >>>> if (anc->cmsg_type == TIPC_ERRINFO) { >>>> - /* TIPC_ERRINFO - TIPC error code associated with a >>>> returned data message or a connection termination message so abort */ >>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>> condition ancillary data: TIPC_ERRINFO abort err :%s", >>>> strerror(errno) ); >>>> - abort(); >>>> + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0)); >>>> + if (anc_data[0] == TIPC_ERR_OVERLOAD) { >>>> + LOG_CR("MDTM: undelivered message condition >>>> ancillary data: TIPC_ERR_OVERLOAD"); >>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>> condition ancillary data: TIPC_ERR_OVERLOAD"); >>>> + } else { >>>> + /* TIPC_ERRINFO - TIPC error code associated >>>> with a returned data message or a connection termination message so >>>> abort */ >>>> + LOG_CR("MDTM: undelivered message condition >>>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>> condition ancillary data: TIPC_ERRINFO abort err : %d", >>>> anc_data[0]); >>>> + } >>>> } else if (anc->cmsg_type == TIPC_RETDATA) { >>>> - /* If we set TIPC_DEST_DROPPABLE off messge >>>> (configure TIPC to return rejected messages to the sender ) >>>> + /* If we set TIPC_DEST_DROPPABLE off message >>>> (configure TIPC to return rejected messages to the sender ) >>>> we will hit this when we implement MDS >>>> retransmit lost messages abort can be replaced with flow control >>>> logic*/ >>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>>> - m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", >>>> *cptr); >>>> + LOG_CR("MDTM: returned byte 0x%02x\n", *cptr); >>>> + m_MDS_LOG_CRITICAL("MDTM: returned byte >>>> 0x%02x\n", *cptr); >>>> cptr++; >>>> } >>>> /* TIPC_RETDATA -The contents of a returned data >>>> message so abort */ >>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>> condition ancillary data: TIPC_RETDATA abort err :%s", >>>> strerror(errno) ); >>>> - abort(); >>>> + LOG_CR("MDTM: undelivered message condition >>>> ancillary data: TIPC_RETDATA"); >>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>> condition ancillary data: TIPC_RETDATA"); >>>> } else if (anc->cmsg_type == TIPC_DESTNAME) { >>>> if (sz == 0) { >>>> m_MDS_LOG_DBG("MDTM: recd bytes=0 on received >>>> on sock, abnormal/unknown condition. Ignoring"); ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel