Hi Minh, Agree with your latest comment. I will send out V2.
Best Regards, ThuanTr -----Original Message----- From: Minh Hon Chau <minh.c...@dektech.com.au> Sent: Wednesday, November 13, 2019 11:15 AM To: Tran Thuan <thuan.t...@dektech.com.au>; 'Nguyen Minh Vu' <vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119] Hi Thuan, Please see comment inline Thanks Minh On 13/11/19 2:24 pm, Tran Thuan wrote: > Hi Minh, > > Please check replies inline. Thanks. > > Best Regards, > ThuanTr > > -----Original Message----- > From: Minh Hon Chau <minh.c...@dektech.com.au> > Sent: Wednesday, November 13, 2019 10:05 AM > To: Tran Thuan <thuan.t...@dektech.com.au>; 'Nguyen Minh Vu' > <vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all > messages [#3119] > > Hi Thuan, > > Please see comment inline. > > Thanks > > Minh > > On 13/11/19 1:11 pm, Tran Thuan wrote: >> Hi Minh, >> >> Thanks for comments, please check my replies inline. >> >> Best Regards, >> ThuanTr >> >> -----Original Message----- >> From: Minh Hon Chau <minh.c...@dektech.com.au> >> Sent: Wednesday, November 13, 2019 7:47 AM >> To: thuan.tran <thuan.t...@dektech.com.au>; 'Nguyen Minh Vu' >> <vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all >> messages [#3119] >> >> Hi Thuan, >> >> Some comments inline. >> >> Thanks >> >> Minh >> >> On 12/11/19 5:04 pm, thuan.tran wrote: >>> When overload happens, sender will wait for chunkAck to continue >>> sending more messages, it should send number of message equal chunkAck >>> size of receiver. If not, receiver don't receive enough messages to send >>> chunkAck and wait until timer timeout to send chunkAck to sender. >>> This loop will make sender take very long time to sending all messages. >>> --- >>> src/mds/mds_tipc_fctrl_portid.cc | 30 +++++++----------------------- >>> 1 file changed, 7 insertions(+), 23 deletions(-) >>> >>> diff --git a/src/mds/mds_tipc_fctrl_portid.cc >>> b/src/mds/mds_tipc_fctrl_portid.cc >>> index 3704baddb..1fff4c855 100644 >>> --- a/src/mds/mds_tipc_fctrl_portid.cc >>> +++ b/src/mds/mds_tipc_fctrl_portid.cc >>> @@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, >>> uint16_t length, >>> sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_); >>> } else { >>> ++sndwnd_.send_; >>> + sndwnd_.nacked_space_ += length; >> [Minh] We haven't sent the msg out to wait for ack, thus nacked_space_ >> should not be increased >>> m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], " >>> "QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], " >>> "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]", >>> @@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, >>> uint16_t chksize) { >>> // the nacked_space_ of sender >>> uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1), >>> Seq16(fseq)); >>> + assert(sndwnd_.nacked_space_ >= acked_bytes); >>> sndwnd_.nacked_space_ -= acked_bytes; >>> >>> // try to send a few pending msg >>> DataMessage* msg = nullptr; >>> - uint64_t resend_bytes = 0; >>> - while (resend_bytes < acked_bytes) { >>> + uint16_t send_msg_cnt = 0; >>> + while (send_msg_cnt++ < chunk_size_) { >>> // find the lowest sequence unsent yet >>> msg = sndqueue_.FirstUnsent(); >>> if (msg == nullptr) { >>> break; >>> } else { >>> - if (resend_bytes < acked_bytes) { >>> if (Send(msg->msg_data_, msg->header_.msg_len_) == >>> NCSCC_RC_SUCCESS) { >>> - sndwnd_.nacked_space_ += msg->header_.msg_len_; >> [Minh] We now send it out and wait for acked, thus the nacked_space_ is >> increased here, so any reason moving the nacked_space_ from Queue() to here? >> [Thuan] Because the message could be in sndwnd (resend) either in sndqueue >> (send) >> Cannot increase nacked_space with resend message. >> I have tried another way to increase/decrease nacked_space dynamic >> but it become complex with markUnsent() since sender may receiver Nack for >> same msg > 2 times. > [Minh] OK. >>> msg->is_sent_ = true; >>> - resend_bytes += msg->header_.msg_len_; >>> m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], " >>> "SndQData[fseq:%u, len:%u], " >>> "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]", >>> id_.node, id_.ref, >>> msg->header_.fseq_, msg->header_.msg_len_, >>> sndwnd_.acked_.v(), sndwnd_.send_.v(), >>> sndwnd_.nacked_space_); >>> + } else { >>> + break; >>> } >>> - } else { >>> - break; >>> - } >>> } >>> } >>> // no more unsent message, back to kEnabled >> [Minh] Agree, the new strategy to resend with chunk_size_ is better than >> with acked_bytes, it will increase transmission rate and not to depend >> on the timer >> [Thuan] Thanks >>> @@ -502,26 +500,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t >>> mfrag, >>> fseq); >>> return; >>> } >>> - if (state_ == State::kRcvBuffOverflow) { >>> - sndqueue_.MarkUnsentFrom(Seq16(fseq)); >>> - if (Seq16(fseq) - sndwnd_.acked_ > 1) { >>> - m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], " >>> - "RcvNack[fseq:%u], " >>> - "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], " >>> - "queue[size:%" PRIu64 "], " >>> - "Warning[Ignore Nack]", >>> - id_.node, id_.ref, fseq, >>> - sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_, >>> - sndqueue_.Size()); >>> - return; >>> - } >>> - } >>> if (state_ != State::kRcvBuffOverflow) { >>> state_ = State::kRcvBuffOverflow; >>> m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ", >>> id_.node, id_.ref); >>> - sndqueue_.MarkUnsentFrom(Seq16(fseq)); >>> } >>> + sndqueue_.MarkUnsentFrom(Seq16(fseq)); >> [Minh] I have a doubt with this change in ReceiveNack(), so every Nack >> will trigger a retransmission on the Nacked sequence even though we are >> already in kRcvBufferOverFlow state. This will increase the "unexpected >> retransmission" error rate. On reception of 2nd-Nack, 3rd-Nack, .... we >> already moved into kRcvBufferOverFlow state, we don't need to resend the >> 2nd-Nack, 3rd-Nack as we already did at the 1st-Nack. Only mark it as >> Unsent, the actual retransmission of 2nd-Nack, 3rd-Nack, .... is done in >> the loop ReceiveChunkAck() as you have improved in this patch, that will >> keep msg in order at receivers. So any reason for this change? >> >> [Thuan] Since sender send a chunk ack number of message, receiver may >> Get msg 1 -> but drop msg 2 (still not send chunk ack to sender). >> Sender receive Nack of msg 2 which fseq - sndwnd_.acked > 1 but sender >> Still need resend the msg 2, in current code return without resend. > [Minh] sender will resend msg 2 if this is the first nack (state is not > kRcvBufferOverflow), there is a state associated with the retransmission > of the nack > [Thuan] in overflow state, receiver may get 1th msg and drop 2th msg (still > not send chunk ack) > Then sender should resend 2th msg even in state overflow. [Minh] If you say the 2nd should be sent, then it is where "if (Seq16(fseq) - sndwnd_.acked_ > 1)" does help. Even the 2nd is received successfully but not acked, and the nack is at 3, when the timer timeout, the ack is sent, then this patch continues sending a number of chunk_size_ message which does not wait for timer timeout. But within this change, if you look the traffic overall, every nack triggers a retransmission and how successful rate of the retransmission? The receiver ignores/rejects the message but it does impact in term of traffic. [Thuan] Your comment make sense. I will keep this part unchanged. >> Yes, as sender send a mount of messages then it could get Nack of msg 2 > 2 >> times. >> It increase "unexpected retransmission" error at receiver but receiver will >> ignore these errors. > [Minh] That's we want to avoid, if we know the msg will be for sure ignored. > [Thuan] We don't know if nack is redundant or receiver again reject the msg, > resend is good without impacts. > I think number of "unexpected retransmission" is small (< chunk ack size) >>> DataMessage* msg = sndqueue_.Find(Seq16(fseq)); >>> if (msg != nullptr) { >>> // Resend the msg found > _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel