Hi Minh, Please check replies inline. Thanks.
Best Regards, ThuanTr -----Original Message----- From: Minh Hon Chau <minh.c...@dektech.com.au> Sent: Wednesday, November 13, 2019 10:05 AM To: Tran Thuan <thuan.t...@dektech.com.au>; 'Nguyen Minh Vu' <vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119] Hi Thuan, Please see comment inline. Thanks Minh On 13/11/19 1:11 pm, Tran Thuan wrote: > Hi Minh, > > Thanks for comments, please check my replies inline. > > Best Regards, > ThuanTr > > -----Original Message----- > From: Minh Hon Chau <minh.c...@dektech.com.au> > Sent: Wednesday, November 13, 2019 7:47 AM > To: thuan.tran <thuan.t...@dektech.com.au>; 'Nguyen Minh Vu' > <vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all > messages [#3119] > > Hi Thuan, > > Some comments inline. > > Thanks > > Minh > > On 12/11/19 5:04 pm, thuan.tran wrote: >> When overload happens, sender will wait for chunkAck to continue >> sending more messages, it should send number of message equal chunkAck >> size of receiver. If not, receiver don't receive enough messages to send >> chunkAck and wait until timer timeout to send chunkAck to sender. >> This loop will make sender take very long time to sending all messages. >> --- >> src/mds/mds_tipc_fctrl_portid.cc | 30 +++++++----------------------- >> 1 file changed, 7 insertions(+), 23 deletions(-) >> >> diff --git a/src/mds/mds_tipc_fctrl_portid.cc >> b/src/mds/mds_tipc_fctrl_portid.cc >> index 3704baddb..1fff4c855 100644 >> --- a/src/mds/mds_tipc_fctrl_portid.cc >> +++ b/src/mds/mds_tipc_fctrl_portid.cc >> @@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t >> length, >> sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_); >> } else { >> ++sndwnd_.send_; >> + sndwnd_.nacked_space_ += length; > [Minh] We haven't sent the msg out to wait for ack, thus nacked_space_ > should not be increased >> m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], " >> "QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], " >> "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]", >> @@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, >> uint16_t chksize) { >> // the nacked_space_ of sender >> uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1), >> Seq16(fseq)); >> + assert(sndwnd_.nacked_space_ >= acked_bytes); >> sndwnd_.nacked_space_ -= acked_bytes; >> >> // try to send a few pending msg >> DataMessage* msg = nullptr; >> - uint64_t resend_bytes = 0; >> - while (resend_bytes < acked_bytes) { >> + uint16_t send_msg_cnt = 0; >> + while (send_msg_cnt++ < chunk_size_) { >> // find the lowest sequence unsent yet >> msg = sndqueue_.FirstUnsent(); >> if (msg == nullptr) { >> break; >> } else { >> - if (resend_bytes < acked_bytes) { >> if (Send(msg->msg_data_, msg->header_.msg_len_) == >> NCSCC_RC_SUCCESS) { >> - sndwnd_.nacked_space_ += msg->header_.msg_len_; > [Minh] We now send it out and wait for acked, thus the nacked_space_ is > increased here, so any reason moving the nacked_space_ from Queue() to here? > [Thuan] Because the message could be in sndwnd (resend) either in sndqueue > (send) > Cannot increase nacked_space with resend message. > I have tried another way to increase/decrease nacked_space dynamic > but it become complex with markUnsent() since sender may receiver Nack for > same msg > 2 times. [Minh] OK. >> msg->is_sent_ = true; >> - resend_bytes += msg->header_.msg_len_; >> m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], " >> "SndQData[fseq:%u, len:%u], " >> "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]", >> id_.node, id_.ref, >> msg->header_.fseq_, msg->header_.msg_len_, >> sndwnd_.acked_.v(), sndwnd_.send_.v(), >> sndwnd_.nacked_space_); >> + } else { >> + break; >> } >> - } else { >> - break; >> - } >> } >> } >> // no more unsent message, back to kEnabled > [Minh] Agree, the new strategy to resend with chunk_size_ is better than > with acked_bytes, it will increase transmission rate and not to depend > on the timer > [Thuan] Thanks >> @@ -502,26 +500,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t >> mfrag, >> fseq); >> return; >> } >> - if (state_ == State::kRcvBuffOverflow) { >> - sndqueue_.MarkUnsentFrom(Seq16(fseq)); >> - if (Seq16(fseq) - sndwnd_.acked_ > 1) { >> - m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], " >> - "RcvNack[fseq:%u], " >> - "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], " >> - "queue[size:%" PRIu64 "], " >> - "Warning[Ignore Nack]", >> - id_.node, id_.ref, fseq, >> - sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_, >> - sndqueue_.Size()); >> - return; >> - } >> - } >> if (state_ != State::kRcvBuffOverflow) { >> state_ = State::kRcvBuffOverflow; >> m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ", >> id_.node, id_.ref); >> - sndqueue_.MarkUnsentFrom(Seq16(fseq)); >> } >> + sndqueue_.MarkUnsentFrom(Seq16(fseq)); > [Minh] I have a doubt with this change in ReceiveNack(), so every Nack > will trigger a retransmission on the Nacked sequence even though we are > already in kRcvBufferOverFlow state. This will increase the "unexpected > retransmission" error rate. On reception of 2nd-Nack, 3rd-Nack, .... we > already moved into kRcvBufferOverFlow state, we don't need to resend the > 2nd-Nack, 3rd-Nack as we already did at the 1st-Nack. Only mark it as > Unsent, the actual retransmission of 2nd-Nack, 3rd-Nack, .... is done in > the loop ReceiveChunkAck() as you have improved in this patch, that will > keep msg in order at receivers. So any reason for this change? > > [Thuan] Since sender send a chunk ack number of message, receiver may > Get msg 1 -> but drop msg 2 (still not send chunk ack to sender). > Sender receive Nack of msg 2 which fseq - sndwnd_.acked > 1 but sender > Still need resend the msg 2, in current code return without resend. [Minh] sender will resend msg 2 if this is the first nack (state is not kRcvBufferOverflow), there is a state associated with the retransmission of the nack [Thuan] in overflow state, receiver may get 1th msg and drop 2th msg (still not send chunk ack) Then sender should resend 2th msg even in state overflow. > Yes, as sender send a mount of messages then it could get Nack of msg 2 > 2 > times. > It increase "unexpected retransmission" error at receiver but receiver will > ignore these errors. [Minh] That's we want to avoid, if we know the msg will be for sure ignored. [Thuan] We don't know if nack is redundant or receiver again reject the msg, resend is good without impacts. I think number of "unexpected retransmission" is small (< chunk ack size) >> DataMessage* msg = sndqueue_.Find(Seq16(fseq)); >> if (msg != nullptr) { >> // Resend the msg found > _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel