Re: [devel] [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119]

Minh Hon Chau Tue, 12 Nov 2019 19:05:56 -0800

Hi Thuan,

Please see comment inline.


Thanks

Minh

On 13/11/19 1:11 pm, Tran Thuan wrote:

Hi Minh,

Thanks for comments, please check my replies inline.

Best Regards,
ThuanTr

-----Original Message-----
From: Minh Hon Chau <minh.c...@dektech.com.au>
Sent: Wednesday, November 13, 2019 7:47 AM
To: thuan.tran <thuan.t...@dektech.com.au>; 'Nguyen Minh Vu' 
<vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
messages [#3119]

Hi Thuan,

Some comments inline.

Thanks

Minh

On 12/11/19 5:04 pm, thuan.tran wrote:

When overload happens, sender will wait for chunkAck to continue
sending more messages, it should send number of message equal chunkAck
size of receiver. If not, receiver don't receive enough messages to send
chunkAck and wait until timer timeout to send chunkAck to sender.
This loop will make sender take very long time to sending all messages.
---
   src/mds/mds_tipc_fctrl_portid.cc | 30 +++++++-----------------------
   1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc
index 3704baddb..1fff4c855 100644
--- a/src/mds/mds_tipc_fctrl_portid.cc
+++ b/src/mds/mds_tipc_fctrl_portid.cc
@@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t 
length,
           sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
     } else {
       ++sndwnd_.send_;
+    sndwnd_.nacked_space_ += length;

[Minh] We haven't sent the msg out to wait for ack, thus nacked_space_
should not be increased

       m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
           "QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
           "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
@@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t 
chksize) {
       // the nacked_space_ of sender
       uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1),
           Seq16(fseq));
+    assert(sndwnd_.nacked_space_ >= acked_bytes);
       sndwnd_.nacked_space_ -= acked_bytes;

// try to send a few pending msg

       DataMessage* msg = nullptr;
-    uint64_t resend_bytes = 0;
-    while (resend_bytes < acked_bytes) {
+    uint16_t send_msg_cnt = 0;
+    while (send_msg_cnt++ < chunk_size_) {
         // find the lowest sequence unsent yet
         msg = sndqueue_.FirstUnsent();
         if (msg == nullptr) {
           break;
         } else {
-        if (resend_bytes < acked_bytes) {
             if (Send(msg->msg_data_, msg->header_.msg_len_) == 
NCSCC_RC_SUCCESS) {
-            sndwnd_.nacked_space_ += msg->header_.msg_len_;

[Minh] We now send it out and wait for acked, thus the nacked_space_ is
increased here, so any reason moving the nacked_space_ from Queue() to here?
[Thuan] Because the message could be in sndwnd (resend) either in sndqueue 
(send)
Cannot increase nacked_space with resend message.
I have tried another way to increase/decrease nacked_space dynamic
but it become complex with markUnsent() since sender may receiver Nack for same 
msg > 2 times.

[Minh] OK.

               msg->is_sent_ = true;
-            resend_bytes += msg->header_.msg_len_;
               m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
                   "SndQData[fseq:%u, len:%u], "
                   "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
                   id_.node, id_.ref,
                   msg->header_.fseq_, msg->header_.msg_len_,
                   sndwnd_.acked_.v(), sndwnd_.send_.v(), 
sndwnd_.nacked_space_);
+          } else {
+            break;
             }
-        } else {
-          break;
-        }
         }
       }
       // no more unsent message, back to kEnabled

[Minh] Agree, the new strategy to resend with chunk_size_ is better than
with acked_bytes, it will increase transmission rate and not to depend
on the timer
[Thuan] Thanks

@@ -502,26 +500,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t 
mfrag,
           fseq);
       return;
     }
-  if (state_ == State::kRcvBuffOverflow) {
-    sndqueue_.MarkUnsentFrom(Seq16(fseq));
-    if (Seq16(fseq) - sndwnd_.acked_ > 1) {
-      m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
-          "RcvNack[fseq:%u], "
-          "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
-          "queue[size:%" PRIu64 "], "
-          "Warning[Ignore Nack]",
-          id_.node, id_.ref, fseq,
-          sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_,
-          sndqueue_.Size());
-      return;
-    }
-  }
     if (state_ != State::kRcvBuffOverflow) {
       state_ = State::kRcvBuffOverflow;
       m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ",
           id_.node, id_.ref);
-    sndqueue_.MarkUnsentFrom(Seq16(fseq));
     }
+  sndqueue_.MarkUnsentFrom(Seq16(fseq));

[Minh] I have a doubt with this change in ReceiveNack(), so every Nack
will trigger a retransmission on the Nacked sequence even though we are
already in kRcvBufferOverFlow state. This will increase the "unexpected
retransmission" error rate. On reception of 2nd-Nack, 3rd-Nack, .... we
already moved into kRcvBufferOverFlow state, we don't need to resend the
2nd-Nack, 3rd-Nack as we already did at the 1st-Nack. Only mark it as
Unsent, the actual retransmission of 2nd-Nack, 3rd-Nack, .... is done in
the loop ReceiveChunkAck() as you have improved in this patch, that will
keep msg in order at receivers. So any reason for this change?

[Thuan] Since sender send a chunk ack number of message, receiver may
Get msg 1 -> but drop msg 2 (still not send chunk ack to sender).
Sender receive Nack of msg 2 which fseq - sndwnd_.acked > 1 but sender
Still need resend the msg 2, in current code return without resend.

[Minh] sender will resend msg 2 if this is the first nack (state is notkRcvBufferOverflow), there is a state associated with the retransmissionof the nack

Yes, as sender send a mount of messages then it could get Nack of msg 2 > 2 
times.
It increase "unexpected retransmission" error at receiver but receiver will 
ignore these errors.

[Minh] That's we want to avoid, if we know the msg will be for sure ignored.

     DataMessage* msg = sndqueue_.Find(Seq16(fseq));
     if (msg != nullptr) {
       // Resend the msg found



_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119]

Reply via email to