Re: [devel] [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119]

Tran Thuan Tue, 12 Nov 2019 18:13:04 -0800

Hi Minh,

Thanks for comments, please check my replies inline.


Best Regards,
ThuanTr

-----Original Message-----
From: Minh Hon Chau <minh.c...@dektech.com.au> 
Sent: Wednesday, November 13, 2019 7:47 AM
To: thuan.tran <thuan.t...@dektech.com.au>; 'Nguyen Minh Vu' 
<vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
messages [#3119]

Hi Thuan,

Some comments inline.

Thanks

Minh

On 12/11/19 5:04 pm, thuan.tran wrote:
> When overload happens, sender will wait for chunkAck to continue
> sending more messages, it should send number of message equal chunkAck
> size of receiver. If not, receiver don't receive enough messages to send
> chunkAck and wait until timer timeout to send chunkAck to sender.
> This loop will make sender take very long time to sending all messages.
> ---
>   src/mds/mds_tipc_fctrl_portid.cc | 30 +++++++-----------------------
>   1 file changed, 7 insertions(+), 23 deletions(-)
>
> diff --git a/src/mds/mds_tipc_fctrl_portid.cc 
> b/src/mds/mds_tipc_fctrl_portid.cc
> index 3704baddb..1fff4c855 100644
> --- a/src/mds/mds_tipc_fctrl_portid.cc
> +++ b/src/mds/mds_tipc_fctrl_portid.cc
> @@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t 
> length,
>           sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
>     } else {
>       ++sndwnd_.send_;
> +    sndwnd_.nacked_space_ += length;
[Minh] We haven't sent the msg out to wait for ack, thus nacked_space_ 
should not be increased
>       m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>           "QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
>           "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
> @@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, 
> uint16_t chksize) {
>       // the nacked_space_ of sender
>       uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1),
>           Seq16(fseq));
> +    assert(sndwnd_.nacked_space_ >= acked_bytes);
>       sndwnd_.nacked_space_ -= acked_bytes;
>   
>       // try to send a few pending msg
>       DataMessage* msg = nullptr;
> -    uint64_t resend_bytes = 0;
> -    while (resend_bytes < acked_bytes) {
> +    uint16_t send_msg_cnt = 0;
> +    while (send_msg_cnt++ < chunk_size_) {
>         // find the lowest sequence unsent yet
>         msg = sndqueue_.FirstUnsent();
>         if (msg == nullptr) {
>           break;
>         } else {
> -        if (resend_bytes < acked_bytes) {
>             if (Send(msg->msg_data_, msg->header_.msg_len_) == 
> NCSCC_RC_SUCCESS) {
> -            sndwnd_.nacked_space_ += msg->header_.msg_len_;
[Minh] We now send it out and wait for acked, thus the nacked_space_ is 
increased here, so any reason moving the nacked_space_ from Queue() to here?
[Thuan] Because the message could be in sndwnd (resend) either in sndqueue 
(send)
Cannot increase nacked_space with resend message.
I have tried another way to increase/decrease nacked_space dynamic
but it become complex with markUnsent() since sender may receiver Nack for same 
msg > 2 times.
>               msg->is_sent_ = true;
> -            resend_bytes += msg->header_.msg_len_;
>               m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>                   "SndQData[fseq:%u, len:%u], "
>                   "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
>                   id_.node, id_.ref,
>                   msg->header_.fseq_, msg->header_.msg_len_,
>                   sndwnd_.acked_.v(), sndwnd_.send_.v(), 
> sndwnd_.nacked_space_);
> +          } else {
> +            break;
>             }
> -        } else {
> -          break;
> -        }
>         }
>       }
>       // no more unsent message, back to kEnabled
[Minh] Agree, the new strategy to resend with chunk_size_ is better than 
with acked_bytes, it will increase transmission rate and not to depend 
on the timer
[Thuan] Thanks
> @@ -502,26 +500,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t 
> mfrag,
>           fseq);
>       return;
>     }
> -  if (state_ == State::kRcvBuffOverflow) {
> -    sndqueue_.MarkUnsentFrom(Seq16(fseq));
> -    if (Seq16(fseq) - sndwnd_.acked_ > 1) {
> -      m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
> -          "RcvNack[fseq:%u], "
> -          "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
> -          "queue[size:%" PRIu64 "], "
> -          "Warning[Ignore Nack]",
> -          id_.node, id_.ref, fseq,
> -          sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_,
> -          sndqueue_.Size());
> -      return;
> -    }
> -  }
>     if (state_ != State::kRcvBuffOverflow) {
>       state_ = State::kRcvBuffOverflow;
>       m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ",
>           id_.node, id_.ref);
> -    sndqueue_.MarkUnsentFrom(Seq16(fseq));
>     }
> +  sndqueue_.MarkUnsentFrom(Seq16(fseq));
[Minh] I have a doubt with this change in ReceiveNack(), so every Nack 
will trigger a retransmission on the Nacked sequence even though we are 
already in kRcvBufferOverFlow state. This will increase the "unexpected 
retransmission" error rate. On reception of 2nd-Nack, 3rd-Nack, .... we 
already moved into kRcvBufferOverFlow state, we don't need to resend the 
2nd-Nack, 3rd-Nack as we already did at the 1st-Nack. Only mark it as 
Unsent, the actual retransmission of 2nd-Nack, 3rd-Nack, .... is done in 
the loop ReceiveChunkAck() as you have improved in this patch, that will 
keep msg in order at receivers. So any reason for this change?

[Thuan] Since sender send a chunk ack number of message, receiver may
Get msg 1 -> but drop msg 2 (still not send chunk ack to sender).
Sender receive Nack of msg 2 which fseq - sndwnd_.acked > 1 but sender
Still need resend the msg 2, in current code return without resend.
Yes, as sender send a mount of messages then it could get Nack of msg 2 > 2 
times.
It increase "unexpected retransmission" error at receiver but receiver will 
ignore these errors.
>     DataMessage* msg = sndqueue_.Find(Seq16(fseq));
>     if (msg != nullptr) {
>       // Resend the msg found



_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119]

Reply via email to