Hi Mahesh,

I have not tested this, but the following should work:

- Set BSRsock TIPC_IMPORTANCE to TIPC_LOW_IMPORTANCE

- set socket receive buffer to a small value:

   optval = "small socket recieive buffer size" , 5000 ?

   setsockopt(tipc_cb.BSRsock, SOL_SOCKET, SO_RCVBUF, &optval, optlen)

-  sysctl -w net.tipc.tipc_rmem="5000 40000000 68240400" (or smaller values)

- add some delays when processing messages in 
mdtm_process_recv_events(), to provoke overloading the socket receive 
buffer.

We experience dropped packages in a 75 node system, and as a workaround 
increasing the default so receive buffer size it seems working for that 
setup.

/Thanks HansN

On 09/01/2016 05:50 AM, A V Mahesh wrote:
> Hi HansN,
>
> Do you have any tips to created overload case,
>
> I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled 
> cases.
>
> -AVM
>
>
> On 9/1/2016 9:12 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> Sorry for the delay.
>>
>> I will test it and get back to you soon.
>>
>> -AVM
>>
>>
>> On 8/31/2016 4:29 PM, Hans Nordebäck wrote:
>>> Hi Mahesh,
>>> Any updates on this?
>>>
>>> /Regards HansN
>>>
>>> -----Original Message-----
>>> From: Anders Widell
>>> Sent: den 25 augusti 2016 13:11
>>> To: A V Mahesh <mahesh.va...@oracle.com>; Hans Nordebäck 
>>> <hans.nordeb...@ericsson.com>; mathi.naic...@oracle.com
>>> Cc: opensaf-devel@lists.sourceforge.net
>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>>>
>>> Hi!
>>>
>>> This is what the TIPC user documentation says about 
>>> TIPC_DEST_DROPPABLE:
>>> "This option governs the handling of messages sent by the socket if 
>>> the message cannot be delivered to its destination, either because 
>>> the receiver is congested or because the specified receiver does not 
>>> exist.
>>> If enabled, the message is discarded; otherwise the message is 
>>> returned to the sender."
>>>
>>> This is what the TIPC user documentation says about the return value 
>>> from the recvmsg() system call: "When used with a connectionless 
>>> socket, a return value of 0 indicates the arrival of a returned data 
>>> message that was originally sent by this socket."
>>>
>>> I think the documentation is pretty clear. If you set 
>>> TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. 
>>> when the receive buffer is full. The sender will not be notified in 
>>> this case. If TIPC_DEST_DROPPABLE is set to false, the message will 
>>> be returned to the sender in case of a full receive buffer. The 
>>> sender knows that it has received such a returned message when the 
>>> recvmsg() call returns zero.
>>>
>>> regards,
>>> Anders Widell
>>>
>>> On 08/25/2016 11:30 AM, A V Mahesh wrote:
>>>> Hi HansN,
>>>>
>>>>
>>>> On 8/23/2016 5:22 PM, Hans Nordebäck wrote:
>>>>
>>>>> Hi Mahesh,
>>>>>
>>>>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may
>>>>> drop messages silently,  at receive sock buffer full condition,  but
>>>>> do not return any ancillary message.
>>>>> If TIPC_DROPPABLE = false tipc may drop message but will send an
>>>>> ancillary message to inform about TIPC_ERR_OVERLOAD.
>>>> [AVM]
>>>>
>>>> My observation are understanding is different, based on TIPC code and
>>>> Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error
>>>> returned when TIPC is unable to enqueue an incoming message on the
>>>> receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE
>>>> enabled or disabled.
>>>>
>>>> The only difference between TIPC_DEST_DROPPABLE enabled or disabled is
>>>> , If  TIPC_DEST_DROPPABLE enabled, the message is discarded and
>>>> recvmsg() returned size is ZERO and application will get errors, if
>>>> TIPC_DEST_DROPPABLE disabled  the message is returned to the sender it
>>>> means the recvmsg() returned size is user send data size and
>>>> application will get errors .
>>>>
>>>> I did check the TIPC code and documentations  and I haven't get any
>>>> evidences that  TIPC_ERR_OVERLOAD error code will be send only If
>>>> TIPC_DEST_DROPPABLE = false.
>>>>
>>>> Even while testing #1227
>>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my
>>>> observations and understanding was, an individual TIPC socket is only
>>>> allowed to queue up
>>>> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before
>>>> it starts rejecting them.
>>>> Once a socket receiving queue length exceeds the maximum limit value,
>>>> the receiving socket will send out a reject message  with
>>>> TIPC_ERR_OVERLOAD error code with cmsg_type as
>>>> TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0
>>>> Programmer's Guide  confirmed the same .
>>>>
>>>> tipc/socket.c
>>>> =======================================================
>>>> /* Reject message if there isn't room to queue it */
>>>>
>>>> recv_q_len = (u32)atomic_read(&tipc_queue_size);
>>>> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
>>>>      if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
>>>>          return TIPC_ERR_OVERLOAD;
>>>> }
>>>> recv_q_len = skb_queue_len(&sk->sk_receive_queue);
>>>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
>>>>      if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
>>>>          return TIPC_ERR_OVERLOAD;
>>>> }
>>>> =======================================================
>>>>
>>>>
>>>> 2.1.17. setsockopt() of  TIPC 2.0 Programmer's Guide
>>>> =======================================================
>>>> TIPC_DEST_DROPPABLE
>>>> This option governs the handling of messages sent by the socket if the
>>>> message cannot be delivered to its destination, either because the
>>>> receiver is congested or because the specified receiver does not
>>>> exist. If enabled, the message is discarded; otherwise the message is
>>>> returned to the sender.
>>>>
>>>> By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM
>>>> socket types, and enabled for SOCK_RDM and SOCK_DGRAM, This
>>>> arrangement ensures proper teardown of failed connections when
>>>> connection-oriented data transfer is used, without increasing the
>>>> complexity of connectionless data transfer.
>>>>
>>>> TIPC_SRC_DROPPABLE
>>>> This option governs the handling of messages sent by the socket if
>>>> link congestion occurs. If enabled, the message is discarded;
>>>> otherwise the system queues the message for later transmission.
>>>> By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM,
>>>> and SOCK_RDM socket types (resulting in "reliable" data transfer), and
>>>> enabled for SOCK_DGRAM (resulting in "unreliable" data transfer).
>>>> =======================================================
>>>>
>>>> Now I will try to create OVERLOAD case and update you soon my latest
>>>> observations.
>>>>
>>>> -AVM
>>>>
>>>>> Correcting this and adding an abort is not backward compatible as
>>>>> some service already handle flow control in some way, only log when
>>>>> packages are dropped.
>>>>> Regarding ticket #1960 there are other solutions than introducing
>>>>> flow control in MDS, e.g. expose an option to the service to choose
>>>>> connection oriented or connection less.
>>>>> The problem with dropped messages seems in one case related to, (by
>>>>> MDS), intensive MDS logging.
>>>>>
>>>>> /Thanks HansN
>>>>> -----Original Message-----
>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>>> Sent: den 23 augusti 2016 11:27
>>>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell
>>>>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com
>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>>>>>
>>>>> Hi HansN,
>>>>>
>>>>> It seems I am missing some thing , please allow me to under stand
>>>>>
>>>>> If I currently understand you observation :
>>>>>
>>>>> With current Opensaf code ( this #1957 patch NOT applied ) , by
>>>>> default TIPC_DROPPABLE=true ,while running Opensaf with that binary
>>>>> when TIPC_ERR_OVERLOAD  occurring, TIPC is not  given errors
>>>>> TIPC_ERRINFO or  TIPC_RETDATA and following code is not being get hit
>>>>> of function recvfrom_connectionless(), is my understanding right ?
>>>>>
>>>>> =====================================================================
>>>>> ========================================
>>>>>
>>>>>
>>>>> *if (anc->cmsg_type == TIPC_ERRINFO) {*
>>>>>        /* TIPC_ERRINFO - TIPC error code associated with a returned
>>>>> data message or a connection termination message  so abort */
>>>>>        m_MDS_LOG_CRITICAL("MDTM: undelivered message condition
>>>>> ancillary
>>>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) );
>>>>> *abort();*
>>>>> *} else if (anc->cmsg_type == TIPC_RETDATA) {*
>>>>>        /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to
>>>>> return rejected messages to the sender )
>>>>>           we will hit this when we implement MDS retransmit lost
>>>>> messages abort can be replaced with flow control logic*/
>>>>>        for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
>>>>>            m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
>>>>>            cptr++;
>>>>>        }
>>>>>        /* TIPC_RETDATA -The contents of a returned data message so
>>>>> abort */
>>>>>        m_MDS_LOG_CRITICAL("MDTM: undelivered message condition
>>>>> ancillary
>>>>> data: TIPC_RETDATA abort err :%s", strerror(errno) );
>>>>> *abort();*
>>>>> }
>>>>>
>>>>> =====================================================================
>>>>> ========================================
>>>>>
>>>>>
>>>>> -AVM
>>>>>
>>>>>
>>>>> On 8/23/2016 1:08 PM, Hans Nordebäck wrote:
>>>>>> Hi Mahesh,
>>>>>>
>>>>>> Please see response below with [HansN] /Thanks HansN
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>>>>>> Sent: den 23 augusti 2016 08:25
>>>>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell
>>>>>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com
>>>>>> Cc: opensaf-devel@lists.sourceforge.net
>>>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>>>>>>
>>>>>> Hi HansN
>>>>>>
>>>>>> Please see response below with [AVM]
>>>>>>
>>>>>> -AVM
>>>>>>
>>>>>> On 8/23/2016 11:41 AM, Hans Nordebäck wrote:
>>>>>>> Hi Mahesh,
>>>>>>>
>>>>>>> please see comments below.
>>>>>>>
>>>>>>> /Thanks HansN
>>>>>>>
>>>>>>>
>>>>>>> On 08/23/2016 07:21 AM, A V Mahesh wrote:
>>>>>>>> Hi HansN,
>>>>>>>>
>>>>>>>> Let us fist discuss the error handling and abort, then we can come
>>>>>>>> back to interpretation of  TIPC currently  does permit  OR does
>>>>>>>> not permit an application to send a multicast message with the
>>>>>>>> "destination droppable" setting disabled.
>>>>>>>>
>>>>>>>> Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to
>>>>>>>> return an undelivered multicast message to its sender and we can
>>>>>>>> determine issue is  because of TIPC_ERR_OVERLOAD, this helps in
>>>>>>>> debugging , so that application may increased SO_SNDBUF/SO_RCVBUF
>>>>>>>> to reduce the problem.
>>>>>>>>
>>>>>>>> But still we need to abort(), the reason for that is current MDS
>>>>>>>> implementations doesn't have flow control logic ( no retry because
>>>>>>>> of error ) , so Application like AMF can go wrong and cluster will
>>>>>>>> go into unstable/recoverble state.
>>>>>>>>
>>>>>>> [HansN] In the current implementation messages are dropped silently
>>>>>>> and no abort is done.
>>>>>> [AVM]  I can see  abort(); in current code , you mean abort(); is
>>>>>> not working and application(amf) is not existing ?
>>>>>> [HansN] In case of TIPC_DROPPABLE=true and messages are dropped,
>>>>>> (TIPC_ERR_OVERLOAD)  no abort is be performed, e.g amfd detects this
>>>>>> in the msg sanity chk and logs "invalid msg id ..."
>>>>>> ====================================================================
>>>>>> ==
>>>>>> ======
>>>>>> if (anc->cmsg_type == TIPC_ERRINFO) {
>>>>>>         /* TIPC_ERRINFO - TIPC error code associated with a returned
>>>>>> data message or a connection termination message  so abort */
>>>>>>         m_MDS_LOG_CRITICAL("MDTM: undelivered message condition
>>>>>> ancillary
>>>>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) );
>>>>>> *abort();*
>>>>>> } else if (anc->cmsg_type == TIPC_RETDATA) {
>>>>>>         /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC
>>>>>> to return rejected messages to the sender )
>>>>>>            we will hit this when we implement MDS retransmit lost
>>>>>> messages abort can be replaced with flow control logic*/
>>>>>>         for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
>>>>>>             m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
>>>>>>             cptr++;
>>>>>>         }
>>>>>>         /* TIPC_RETDATA -The contents of a returned data message  so
>>>>>> abort */
>>>>>>         m_MDS_LOG_CRITICAL("MDTM: undelivered message condition
>>>>>> ancillary
>>>>>> data: TIPC_RETDATA abort err :%s", strerror(errno) );
>>>>>> *abort();*
>>>>>> }
>>>>>> ====================================================================
>>>>>> ==
>>>>>> ======
>>>>>>> This patch enables logging
>>>>>>> when packages are dropped to help in debugging. I don't agree that
>>>>>>> we should also introduce abort, but instead:
>>>>>>> 1) Implement a solution to handle dropped packages, ticket #1960
>>>>>> [AVM]  This is nothing but flow control implementation in MDS, this
>>>>>> is future enhancement
>>>>>>
>>>>>>> 2) Investigate why packages may be dropped, the receiving MDS
>>>>>>> thread is a real time thread and should be able to consume a large
>>>>>>> amount of incoming messages.
>>>>>>> E.g. is the receiving MDS thread "live hanging" due to locks, file
>>>>>>> I/O etc?
>>>>>>>> This was the reason we haven't gone for it while addressing Ticket
>>>>>>>> #1227
>>>>>>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/)
>>>>>>>> So currently we don't have any advantage of disabling
>>>>>>>> TIPC_DEST_DROPPABLE and not allowing multicast messages.
>>>>>>>>
>>>>>>>> -AVM
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>>>>>>>> osaf/libs/core/mds/mds_dt_tipc.c |  32
>>>>>>>>> +++++++++++++++++++++++++-------
>>>>>>>>>      1 files changed, 25 insertions(+), 7 deletions(-)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>>> b/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>>>>>>>>>                      m_MDS_LOG_INFO("MDTM: Successfully set
>>>>>>>>> default socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>>>>>>>>              }
>>>>>>>>>      +        int droppable = 0;
>>>>>>>>> +        if (setsockopt(tipc_cb.BSRsock, SOL_TIPC,
>>>>>>>>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) {
>>>>>>>>> +                LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to
>>>>>>>>> + zero
>>>>>>>>> err :%s\n", strerror(errno));
>>>>>>>>> +                m_MDS_LOG_ERR("MDTM: Can't set
>>>>>>>>> + TIPC_DEST_DROPPABLE
>>>>>>>>> to zero err :%s\n", strerror(errno));
>>>>>>>>> +                osafassert(0);
>>>>>>>>> +        } else {
>>>>>>>>> +                m_MDS_LOG_NOTIFY("MDTM: Successfully set
>>>>>>>>> TIPC_DEST_DROPPABLE to zero");
>>>>>>>>> +        }
>>>>>>>>> +
>>>>>>>>>          return NCSCC_RC_SUCCESS;
>>>>>>>>>      }
>>>>>>>>>      @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>>>>>>>>>          unsigned char *cptr;
>>>>>>>>>          int i;
>>>>>>>>>          int has_addr;
>>>>>>>>> +    int anc_data[2];
>>>>>>>>> +
>>>>>>>>>          ssize_t sz;
>>>>>>>>>            has_addr = (from != NULL) && (addrlen != NULL); @@
>>>>>>>>> -591,19
>>>>>>>>> +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>>>>>>>>>                     if the message was sent using a TIPC name or
>>>>>>>>> name sequence as the
>>>>>>>>>                     destination rather than a TIPC port ID So
>>>>>>>>> abort for TIPC_ERRINFO and TIPC_RETDATA*/
>>>>>>>>>                  if (anc->cmsg_type == TIPC_ERRINFO) {
>>>>>>>>> -                /* TIPC_ERRINFO - TIPC error code associated 
>>>>>>>>> with a
>>>>>>>>> returned data message or a connection termination message  so
>>>>>>>>> abort */
>>>>>>>>> -                m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>>>>>>> condition ancillary data: TIPC_ERRINFO abort err :%s",
>>>>>>>>> strerror(errno) );
>>>>>>>>> -                abort();
>>>>>>>>> +                anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) +
>>>>>>>>> 0));
>>>>>>>>> +                if (anc_data[0] == TIPC_ERR_OVERLOAD) {
>>>>>>>>> +                    LOG_CR("MDTM: undelivered message condition
>>>>>>>>> ancillary data: TIPC_ERR_OVERLOAD");
>>>>>>>>> +                    m_MDS_LOG_CRITICAL("MDTM: undelivered
>>>>>>>>> + message
>>>>>>>>> condition ancillary data: TIPC_ERR_OVERLOAD");
>>>>>>>>> +                } else {
>>>>>>>>> +                    /* TIPC_ERRINFO - TIPC error code associated
>>>>>>>>> with a returned data message or a connection termination message
>>>>>>>>> so abort */
>>>>>>>>> +                    LOG_CR("MDTM: undelivered message condition
>>>>>>>>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
>>>>>>>>> +                    m_MDS_LOG_CRITICAL("MDTM: undelivered
>>>>>>>>> + message
>>>>>>>>> condition ancillary data: TIPC_ERRINFO abort err : %d",
>>>>>>>>> anc_data[0]);
>>>>>>>>> +                }
>>>>>>>>>                  } else if (anc->cmsg_type == TIPC_RETDATA) {
>>>>>>>>> -                /* If we set TIPC_DEST_DROPPABLE off messge
>>>>>>>>> (configure TIPC to return rejected messages to the sender )
>>>>>>>>> +                /* If we set TIPC_DEST_DROPPABLE off message
>>>>>>>>> (configure TIPC to return rejected messages to the sender )
>>>>>>>>>                         we will hit this when we implement MDS
>>>>>>>>> retransmit lost messages  abort can be replaced with flow control
>>>>>>>>> logic*/
>>>>>>>>>                      for (i = anc->cmsg_len - sizeof(*anc); i 
>>>>>>>>> > 0;
>>>>>>>>> i--) {
>>>>>>>>> -                    m_MDS_LOG_DBG("MDTM: returned byte 
>>>>>>>>> 0x%02x\n",
>>>>>>>>> *cptr);
>>>>>>>>> +                    LOG_CR("MDTM: returned byte 0x%02x\n", 
>>>>>>>>> *cptr);
>>>>>>>>> +                    m_MDS_LOG_CRITICAL("MDTM: returned byte
>>>>>>>>> 0x%02x\n", *cptr);
>>>>>>>>>                          cptr++;
>>>>>>>>>                      }
>>>>>>>>>                      /* TIPC_RETDATA -The contents of a returned
>>>>>>>>> data message  so abort */
>>>>>>>>> -                m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>>>>>>> condition ancillary data: TIPC_RETDATA abort err :%s",
>>>>>>>>> strerror(errno) );
>>>>>>>>> -                abort();
>>>>>>>>> +                LOG_CR("MDTM: undelivered message condition
>>>>>>>>> ancillary data: TIPC_RETDATA");
>>>>>>>>> +                m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>>>>>>> condition ancillary data: TIPC_RETDATA");
>>>>>>>>>                  } else if (anc->cmsg_type == TIPC_DESTNAME) {
>>>>>>>>>                      if (sz == 0) {
>>>>>>>>>                          m_MDS_LOG_DBG("MDTM: recd bytes=0 on
>>>>>>>>> received on sock, abnormal/unknown  condition. Ignoring");
>>>
>>
>



------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to