On 29 Aug, 2014, at 5:37 pm, Jerry Jongerius wrote:

>> did you check to see if packets were re-sent even if they weren't lost? on of
>> the side effects of excessive buffering is that it's possible for a packet to
>> be held in the buffer long enough that the sender thinks that it's been
>> lost and retransmits it, so the packet is effectivly 'lost' even if it 
>> actually
>> arrives at it's destination.
> 
> Yes.  A duplicate packet for the missing packet is not seen.
> 
> The receiver 'misses' a packet; starts sending out tons of dup acks (for all
> packets in flight and queued up due to bufferbloat), and then way later, the
> packet does come in (after the RTT caused by bufferbloat; indicating it is
> the 'resent' packet).  

I think I've cracked this one - the cause, if not the solution.

Let's assume, for the moment, that Jerry is correct and PowerBoost plays no 
part in this.  That implies that the flow is not using the full bandwidth after 
the loss, *and* that the additive increase of cwnd isn't sufficient to recover 
to that point within the test period.

There *is* a sequence of events that can lead to that happening:

1) Packet is lost, at the tail end of the bottleneck queue.

2) Eventually, receiver sees the loss and starts sending duplicate acks (each 
triggering CA_EVENT_SLOW_ACK path in the sender).  Sender (running Westwood+) 
assumes that each of these represents a received, full-size packet, for 
bandwidth estimation purposes.

3) The receiver doesn't send, or the sender doesn't receive, a duplicate ack 
for every packet actually received.  Maybe some firewall sees a large number of 
identical packets arriving - without SACK or timestamps, they *would* be 
identical - and filters some of them.  The bandwidth estimate therefore becomes 
significantly lower than the true value, and additionally the RTO fires and 
causes the sender to reset cwnd to 1 (CA_EVENT_LOSS).

4) The retransmitted packet finally reaches the receiver, and the ack it sends 
includes all the data received in the meantime (about 3.5MB).  This is not 
sufficient to immediately reset the bandwidth estimate to the true value, 
because the BWE is sampled at RTT intervals, and also includes low-pass 
filtering.

5) This ends the recovery phase (CA_EVENT_CWR_COMPLETE), and the sender resets 
the slow-start threshold to correspond to the estimated delay-bandwidth product 
(MinRTT * BWE) at that moment.

6) This estimated DBP is lower than the true value, so the subsequent 
slow-start phase ends with the cwnd inadequately sized.  Additive increase 
would eventually correct that - but the key word is *eventually*.

 - Jonathan Morton

_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to