Hi Minh,

Please check replies inline. Thanks.

Best Regards,
ThuanTr

-----Original Message-----
From: Minh Hon Chau <minh.c...@dektech.com.au> 
Sent: Wednesday, November 13, 2019 10:05 AM
To: Tran Thuan <thuan.t...@dektech.com.au>; 'Nguyen Minh Vu' 
<vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
messages [#3119]

Hi Thuan,

Please see comment inline.

Thanks

Minh

On 13/11/19 1:11 pm, Tran Thuan wrote:
> Hi Minh,
>
> Thanks for comments, please check my replies inline.
>
> Best Regards,
> ThuanTr
>
> -----Original Message-----
> From: Minh Hon Chau <minh.c...@dektech.com.au>
> Sent: Wednesday, November 13, 2019 7:47 AM
> To: thuan.tran <thuan.t...@dektech.com.au>; 'Nguyen Minh Vu' 
> <vu.m.ngu...@dektech.com.au>; gary....@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/1] mds: fix sender take very long time to send all 
> messages [#3119]
>
> Hi Thuan,
>
> Some comments inline.
>
> Thanks
>
> Minh
>
> On 12/11/19 5:04 pm, thuan.tran wrote:
>> When overload happens, sender will wait for chunkAck to continue
>> sending more messages, it should send number of message equal chunkAck
>> size of receiver. If not, receiver don't receive enough messages to send
>> chunkAck and wait until timer timeout to send chunkAck to sender.
>> This loop will make sender take very long time to sending all messages.
>> ---
>>    src/mds/mds_tipc_fctrl_portid.cc | 30 +++++++-----------------------
>>    1 file changed, 7 insertions(+), 23 deletions(-)
>>
>> diff --git a/src/mds/mds_tipc_fctrl_portid.cc 
>> b/src/mds/mds_tipc_fctrl_portid.cc
>> index 3704baddb..1fff4c855 100644
>> --- a/src/mds/mds_tipc_fctrl_portid.cc
>> +++ b/src/mds/mds_tipc_fctrl_portid.cc
>> @@ -190,6 +190,7 @@ uint32_t TipcPortId::Queue(const uint8_t* data, uint16_t 
>> length,
>>            sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_);
>>      } else {
>>        ++sndwnd_.send_;
>> +    sndwnd_.nacked_space_ += length;
> [Minh] We haven't sent the msg out to wait for ack, thus nacked_space_
> should not be increased
>>        m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>>            "QueData[mseq:%u, mfrag:%u, fseq:%u, len:%u], "
>>            "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
>> @@ -444,32 +445,29 @@ void TipcPortId::ReceiveChunkAck(uint16_t fseq, 
>> uint16_t chksize) {
>>        // the nacked_space_ of sender
>>        uint64_t acked_bytes = sndqueue_.Erase(Seq16(fseq) - (chksize-1),
>>            Seq16(fseq));
>> +    assert(sndwnd_.nacked_space_ >= acked_bytes);
>>        sndwnd_.nacked_space_ -= acked_bytes;
>>    
>>        // try to send a few pending msg
>>        DataMessage* msg = nullptr;
>> -    uint64_t resend_bytes = 0;
>> -    while (resend_bytes < acked_bytes) {
>> +    uint16_t send_msg_cnt = 0;
>> +    while (send_msg_cnt++ < chunk_size_) {
>>          // find the lowest sequence unsent yet
>>          msg = sndqueue_.FirstUnsent();
>>          if (msg == nullptr) {
>>            break;
>>          } else {
>> -        if (resend_bytes < acked_bytes) {
>>              if (Send(msg->msg_data_, msg->header_.msg_len_) == 
>> NCSCC_RC_SUCCESS) {
>> -            sndwnd_.nacked_space_ += msg->header_.msg_len_;
> [Minh] We now send it out and wait for acked, thus the nacked_space_ is
> increased here, so any reason moving the nacked_space_ from Queue() to here?
> [Thuan] Because the message could be in sndwnd (resend) either in sndqueue 
> (send)
> Cannot increase nacked_space with resend message.
> I have tried another way to increase/decrease nacked_space dynamic
> but it become complex with markUnsent() since sender may receiver Nack for 
> same msg > 2 times.
[Minh] OK.
>>                msg->is_sent_ = true;
>> -            resend_bytes += msg->header_.msg_len_;
>>                m_MDS_LOG_NOTIFY("FCTRL: [me] --> [node:%x, ref:%u], "
>>                    "SndQData[fseq:%u, len:%u], "
>>                    "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "]",
>>                    id_.node, id_.ref,
>>                    msg->header_.fseq_, msg->header_.msg_len_,
>>                    sndwnd_.acked_.v(), sndwnd_.send_.v(), 
>> sndwnd_.nacked_space_);
>> +          } else {
>> +            break;
>>              }
>> -        } else {
>> -          break;
>> -        }
>>          }
>>        }
>>        // no more unsent message, back to kEnabled
> [Minh] Agree, the new strategy to resend with chunk_size_ is better than
> with acked_bytes, it will increase transmission rate and not to depend
> on the timer
> [Thuan] Thanks
>> @@ -502,26 +500,12 @@ void TipcPortId::ReceiveNack(uint32_t mseq, uint16_t 
>> mfrag,
>>            fseq);
>>        return;
>>      }
>> -  if (state_ == State::kRcvBuffOverflow) {
>> -    sndqueue_.MarkUnsentFrom(Seq16(fseq));
>> -    if (Seq16(fseq) - sndwnd_.acked_ > 1) {
>> -      m_MDS_LOG_ERR("FCTRL: [me] <-- [node:%x, ref:%u], "
>> -          "RcvNack[fseq:%u], "
>> -          "sndwnd[acked:%u, send:%u, nacked:%" PRIu64 "], "
>> -          "queue[size:%" PRIu64 "], "
>> -          "Warning[Ignore Nack]",
>> -          id_.node, id_.ref, fseq,
>> -          sndwnd_.acked_.v(), sndwnd_.send_.v(), sndwnd_.nacked_space_,
>> -          sndqueue_.Size());
>> -      return;
>> -    }
>> -  }
>>      if (state_ != State::kRcvBuffOverflow) {
>>        state_ = State::kRcvBuffOverflow;
>>        m_MDS_LOG_NOTIFY("FCTRL: [node:%x, ref:%u] --> Overflow ",
>>            id_.node, id_.ref);
>> -    sndqueue_.MarkUnsentFrom(Seq16(fseq));
>>      }
>> +  sndqueue_.MarkUnsentFrom(Seq16(fseq));
> [Minh] I have a doubt with this change in ReceiveNack(), so every Nack
> will trigger a retransmission on the Nacked sequence even though we are
> already in kRcvBufferOverFlow state. This will increase the "unexpected
> retransmission" error rate. On reception of 2nd-Nack, 3rd-Nack, .... we
> already moved into kRcvBufferOverFlow state, we don't need to resend the
> 2nd-Nack, 3rd-Nack as we already did at the 1st-Nack. Only mark it as
> Unsent, the actual retransmission of 2nd-Nack, 3rd-Nack, .... is done in
> the loop ReceiveChunkAck() as you have improved in this patch, that will
> keep msg in order at receivers. So any reason for this change?
>
> [Thuan] Since sender send a chunk ack number of message, receiver may
> Get msg 1 -> but drop msg 2 (still not send chunk ack to sender).
> Sender receive Nack of msg 2 which fseq - sndwnd_.acked > 1 but sender
> Still need resend the msg 2, in current code return without resend.
[Minh] sender will resend msg 2 if this is the first nack (state is not 
kRcvBufferOverflow), there is a state associated with the retransmission 
of the nack
[Thuan] in overflow state, receiver may get 1th msg and drop 2th msg (still not 
send chunk ack)
Then sender should resend 2th msg even in state overflow.
> Yes, as sender send a mount of messages then it could get Nack of msg 2 > 2 
> times.
> It increase "unexpected retransmission" error at receiver but receiver will 
> ignore these errors.
[Minh] That's we want to avoid, if we know the msg will be for sure ignored.
[Thuan] We don't know if nack is redundant or receiver again reject the msg, 
resend is good without impacts.
I think number of "unexpected retransmission" is small (< chunk ack size)
>>      DataMessage* msg = sndqueue_.Find(Seq16(fseq));
>>      if (msg != nullptr) {
>>        // Resend the msg found
>



_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to