Hi Thuan, Please see comment inline.
Thanks Minh On 13/11/19 1:11 pm, Tran Thuan wrote:
Hi Minh, Thanks for comments, please check my replies inline. Best Regards, ThuanTr -----Original Message----- From: Minh Hon Chau <[email protected]> Sent: Wednesday, November 13, 2019 7:47 AM To: thuan.tran <[email protected]>; 'Nguyen Minh Vu' <[email protected]>; [email protected] Cc: [email protected] Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119] Hi Thuan, Some comments inline. Thanks Minh On 12/11/19 5:04 pm, thuan.tran wrote:When overload happens, sender will wait for chunkAck to continue sending more messages, it should send number of message equal chunkAck size of receiver. If not, receiver don't receive enough messages to send chunkAck and wait until timer timeout to send chunkAck to sender. This loop will make sender take very long time to sending all messages. --- src/mds/mds_tipc_fctrl_portid.cc | 30 +++++++----------------------- 1 file changed, 7 insertions(+), 23 deletions(-) diff --git a/src/mds/mds_tipc_fctrl_portid.cc b/src/mds/mds_tipc_fctrl_portid.cc index 3704baddb..1fff4c855 100644 --- a/src/mds/mds_tipc_fctrl_portid.cc +++ b/src/mds/mds_tipc_fctrl_portid.cc @@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t length, sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_); } else { ++sndwnd_.send_; + sndwnd_.nacked_space_ += length;[Minh] We haven't sent the msg out to wait for ack, thus nacked_space_ should not be increasedm_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], " "QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], " "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]", @@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, uint16_t chksize) { // the nacked_space_ of sender uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1), Seq16(fseq)); + assert(sndwnd_.nacked_space_ >= acked_bytes); sndwnd_.nacked_space_ -= acked_bytes;// try to send a few pending msgDataMessage* msg = nullptr; - uint64_t resend_bytes = 0; - while (resend_bytes < acked_bytes) { + uint16_t send_msg_cnt = 0; + while (send_msg_cnt++ < chunk_size_) { // find the lowest sequence unsent yet msg = sndqueue_.FirstUnsent(); if (msg == nullptr) { break; } else { - if (resend_bytes < acked_bytes) { if (Send(msg->msg_data_, msg->header_.msg_len_) == NCSCC_RC_SUCCESS) { - sndwnd_.nacked_space_ += msg->header_.msg_len_;[Minh] We now send it out and wait for acked, thus the nacked_space_ is increased here, so any reason moving the nacked_space_ from Queue() to here? [Thuan] Because the message could be in sndwnd (resend) either in sndqueue (send) Cannot increase nacked_space with resend message. I have tried another way to increase/decrease nacked_space dynamic but it become complex with markUnsent() since sender may receiver Nack for same msg > 2 times.
[Minh] OK.
[Minh] sender will resend msg 2 if this is the first nack (state is not kRcvBufferOverflow), there is a state associated with the retransmission of the nackmsg->is_sent_ = true; - resend_bytes += msg->header_.msg_len_; m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], " "SndQData[fseq:%u, len:%u], " "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]", id_.node, id_.ref, msg->header_.fseq_, msg->header_.msg_len_, sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_); + } else { + break; } - } else { - break; - } } } // no more unsent message, back to kEnabled[Minh] Agree, the new strategy to resend with chunk_size_ is better than with acked_bytes, it will increase transmission rate and not to depend on the timer [Thuan] Thanks@@ -502,26 +500,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t mfrag, fseq); return; } - if (state_ == State::kRcvBuffOverflow) { - sndqueue_.MarkUnsentFrom(Seq16(fseq)); - if (Seq16(fseq) - sndwnd_.acked_ > 1) { - m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], " - "RcvNack[fseq:%u], " - "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], " - "queue[size:%" PRIu64 "], " - "Warning[Ignore Nack]", - id_.node, id_.ref, fseq, - sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_, - sndqueue_.Size()); - return; - } - } if (state_ != State::kRcvBuffOverflow) { state_ = State::kRcvBuffOverflow; m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ", id_.node, id_.ref); - sndqueue_.MarkUnsentFrom(Seq16(fseq)); } + sndqueue_.MarkUnsentFrom(Seq16(fseq));[Minh] I have a doubt with this change in ReceiveNack(), so every Nack will trigger a retransmission on the Nacked sequence even though we are already in kRcvBufferOverFlow state. This will increase the "unexpected retransmission" error rate. On reception of 2nd-Nack, 3rd-Nack, .... we already moved into kRcvBufferOverFlow state, we don't need to resend the 2nd-Nack, 3rd-Nack as we already did at the 1st-Nack. Only mark it as Unsent, the actual retransmission of 2nd-Nack, 3rd-Nack, .... is done in the loop ReceiveChunkAck() as you have improved in this patch, that will keep msg in order at receivers. So any reason for this change? [Thuan] Since sender send a chunk ack number of message, receiver may Get msg 1 -> but drop msg 2 (still not send chunk ack to sender). Sender receive Nack of msg 2 which fseq - sndwnd_.acked > 1 but sender Still need resend the msg 2, in current code return without resend.
Yes, as sender send a mount of messages then it could get Nack of msg 2 > 2 times. It increase "unexpected retransmission" error at receiver but receiver will ignore these errors.
[Minh] That's we want to avoid, if we know the msg will be for sure ignored.
DataMessage* msg = sndqueue_.Find(Seq16(fseq)); if (msg != nullptr) { // Resend the msg found
_______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
