Re: [devel] [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119]

Tran Thuan Wed, 13 Nov 2019 00:58:57 -0800

Hi Minh,

Agree with your latest comment. I will send out V2.


Best Regards,
ThuanTr

-----Original Message-----
From: Minh Hon Chau <[email protected]> 
Sent: Wednesday, November 13, 2019 11:15 AM
To: Tran Thuan <[email protected]>; 'Nguyen Minh Vu' 
<[email protected]>; [email protected]
Cc: [email protected]
Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
messages [#3119]

Hi Thuan,

Please see comment inline

Thanks

Minh

On 13/11/19 2:24 pm, Tran Thuan wrote:
> Hi Minh,
>
> Please check replies inline. Thanks.
>
> Best Regards,
> ThuanTr
>
> -----Original Message-----
> From: Minh Hon Chau <[email protected]>
> Sent: Wednesday, November 13, 2019 10:05 AM
> To: Tran Thuan <[email protected]>; 'Nguyen Minh Vu' 
> <[email protected]>; [email protected]
> Cc: [email protected]
> Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
> messages [#3119]
>
> Hi Thuan,
>
> Please see comment inline.
>
> Thanks
>
> Minh
>
> On 13/11/19 1:11 pm, Tran Thuan wrote:
>> Hi Minh,
>>
>> Thanks for comments, please check my replies inline.
>>
>> Best Regards,
>> ThuanTr
>>
>> -----Original Message-----
>> From: Minh Hon Chau <[email protected]>
>> Sent: Wednesday, November 13, 2019 7:47 AM
>> To: thuan.tran <[email protected]>; 'Nguyen Minh Vu' 
>> <[email protected]>; [email protected]
>> Cc: [email protected]
>> Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
>> messages [#3119]
>>
>> Hi Thuan,
>>
>> Some comments inline.
>>
>> Thanks
>>
>> Minh
>>
>> On 12/11/19 5:04 pm, thuan.tran wrote:
>>> When overload happens, sender will wait for chunkAck to continue
>>> sending more messages, it should send number of message equal chunkAck
>>> size of receiver. If not, receiver don't receive enough messages to send
>>> chunkAck and wait until timer timeout to send chunkAck to sender.
>>> This loop will make sender take very long time to sending all messages.
>>> ---
>>>     src/mds/mds_tipc_fctrl_portid.cc | 30 +++++++-----------------------
>>>     1 file changed, 7 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/src/mds/mds_tipc_fctrl_portid.cc 
>>> b/src/mds/mds_tipc_fctrl_portid.cc
>>> index 3704baddb..1fff4c855 100644
>>> --- a/src/mds/mds_tipc_fctrl_portid.cc
>>> +++ b/src/mds/mds_tipc_fctrl_portid.cc
>>> @@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, 
>>> uint16_t length,
>>>             sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
>>>       } else {
>>>         ++sndwnd_.send_;
>>> +    sndwnd_.nacked_space_ += length;
>> [Minh] We haven't sent the msg out to wait for ack, thus nacked_space_
>> should not be increased
>>>         m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>>>             "QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
>>>             "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
>>> @@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, 
>>> uint16_t chksize) {
>>>         // the nacked_space_ of sender
>>>         uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1),
>>>             Seq16(fseq));
>>> +    assert(sndwnd_.nacked_space_ >= acked_bytes);
>>>         sndwnd_.nacked_space_ -= acked_bytes;
>>>     
>>>         // try to send a few pending msg
>>>         DataMessage* msg = nullptr;
>>> -    uint64_t resend_bytes = 0;
>>> -    while (resend_bytes < acked_bytes) {
>>> +    uint16_t send_msg_cnt = 0;
>>> +    while (send_msg_cnt++ < chunk_size_) {
>>>           // find the lowest sequence unsent yet
>>>           msg = sndqueue_.FirstUnsent();
>>>           if (msg == nullptr) {
>>>             break;
>>>           } else {
>>> -        if (resend_bytes < acked_bytes) {
>>>               if (Send(msg->msg_data_, msg->header_.msg_len_) == 
>>> NCSCC_RC_SUCCESS) {
>>> -            sndwnd_.nacked_space_ += msg->header_.msg_len_;
>> [Minh] We now send it out and wait for acked, thus the nacked_space_ is
>> increased here, so any reason moving the nacked_space_ from Queue() to here?
>> [Thuan] Because the message could be in sndwnd (resend) either in sndqueue 
>> (send)
>> Cannot increase nacked_space with resend message.
>> I have tried another way to increase/decrease nacked_space dynamic
>> but it become complex with markUnsent() since sender may receiver Nack for 
>> same msg > 2 times.
> [Minh] OK.
>>>                 msg->is_sent_ = true;
>>> -            resend_bytes += msg->header_.msg_len_;
>>>                 m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>>>                     "SndQData[fseq:%u, len:%u], "
>>>                     "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
>>>                     id_.node, id_.ref,
>>>                     msg->header_.fseq_, msg->header_.msg_len_,
>>>                     sndwnd_.acked_.v(), sndwnd_.send_.v(), 
>>> sndwnd_.nacked_space_);
>>> +          } else {
>>> +            break;
>>>               }
>>> -        } else {
>>> -          break;
>>> -        }
>>>           }
>>>         }
>>>         // no more unsent message, back to kEnabled
>> [Minh] Agree, the new strategy to resend with chunk_size_ is better than
>> with acked_bytes, it will increase transmission rate and not to depend
>> on the timer
>> [Thuan] Thanks
>>> @@ -502,26 +500,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t 
>>> mfrag,
>>>             fseq);
>>>         return;
>>>       }
>>> -  if (state_ == State::kRcvBuffOverflow) {
>>> -    sndqueue_.MarkUnsentFrom(Seq16(fseq));
>>> -    if (Seq16(fseq) - sndwnd_.acked_ > 1) {
>>> -      m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
>>> -          "RcvNack[fseq:%u], "
>>> -          "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
>>> -          "queue[size:%" PRIu64 "], "
>>> -          "Warning[Ignore Nack]",
>>> -          id_.node, id_.ref, fseq,
>>> -          sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_,
>>> -          sndqueue_.Size());
>>> -      return;
>>> -    }
>>> -  }
>>>       if (state_ != State::kRcvBuffOverflow) {
>>>         state_ = State::kRcvBuffOverflow;
>>>         m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ",
>>>             id_.node, id_.ref);
>>> -    sndqueue_.MarkUnsentFrom(Seq16(fseq));
>>>       }
>>> +  sndqueue_.MarkUnsentFrom(Seq16(fseq));
>> [Minh] I have a doubt with this change in ReceiveNack(), so every Nack
>> will trigger a retransmission on the Nacked sequence even though we are
>> already in kRcvBufferOverFlow state. This will increase the "unexpected
>> retransmission" error rate. On reception of 2nd-Nack, 3rd-Nack, .... we
>> already moved into kRcvBufferOverFlow state, we don't need to resend the
>> 2nd-Nack, 3rd-Nack as we already did at the 1st-Nack. Only mark it as
>> Unsent, the actual retransmission of 2nd-Nack, 3rd-Nack, .... is done in
>> the loop ReceiveChunkAck() as you have improved in this patch, that will
>> keep msg in order at receivers. So any reason for this change?
>>
>> [Thuan] Since sender send a chunk ack number of message, receiver may
>> Get msg 1 -> but drop msg 2 (still not send chunk ack to sender).
>> Sender receive Nack of msg 2 which fseq - sndwnd_.acked > 1 but sender
>> Still need resend the msg 2, in current code return without resend.
> [Minh] sender will resend msg 2 if this is the first nack (state is not
> kRcvBufferOverflow), there is a state associated with the retransmission
> of the nack
> [Thuan] in overflow state, receiver may get 1th msg and drop 2th msg (still 
> not send chunk ack)
> Then sender should resend 2th msg even in state overflow.

[Minh] If you say the 2nd should be sent, then it is where "if 
(Seq16(fseq) - sndwnd_.acked_ > 1)" does help. Even the 2nd is received 
successfully but not acked, and the nack is at 3, when the timer 
timeout, the ack is sent, then this patch continues sending a number of 
chunk_size_ message which does not wait for timer timeout.

But within this change, if you look the traffic overall, every nack 
triggers a retransmission and how successful rate of the retransmission? 
The receiver ignores/rejects the message but it does impact in term of 
traffic.

[Thuan] Your comment make sense. I will keep this part unchanged.

>> Yes, as sender send a mount of messages then it could get Nack of msg 2 > 2 
>> times.
>> It increase "unexpected retransmission" error at receiver but receiver will 
>> ignore these errors.
> [Minh] That's we want to avoid, if we know the msg will be for sure ignored.
> [Thuan] We don't know if nack is redundant or receiver again reject the msg, 
> resend is good without impacts.
> I think number of "unexpected retransmission" is small (< chunk ack size)
>>>       DataMessage* msg = sndqueue_.Find(Seq16(fseq));
>>>       if (msg != nullptr) {
>>>         // Resend the msg found
>



_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] mds: fix sender take very long time to send all messages [#3119]

Reply via email to