Re: [Bloat] Burst Loss

Richard Scheffenegger Sun, 08 May 2011 05:47:00 -0700

I'm not an expert in TSO / GSO, and NIC driver design, but what I gatheredis, that with these schemes, and mordern NICs that do scatter/gather DMA ofdotzends of "independent" header/data chuncks directly from memory, the NICwill typically send out non-interleaved trains of segments all belonging tosingle TCP sessions. With the implicit assumption, that these burst of up to180 segments (Intel supports 256kB data per chain) can be absorped by thebuffer at the bottleneck and spread out in time there...

From my perspective, having such GSO / TSO to "cycle" through all the

different chains belonging to different sessions (to not introducereordering at the sender even), should already help pace the segments persession somewhat; a slightly more sophisticated DMA engine could check eachof the chains for how much data is to be sent by those, and then clock anappropriate number of interleaved segmets out... I do understand that thisis "work" for a HW DMA engine and slows down GSO software implementations,but may severly reduce the instantaneous rate of a single session, andthereby the impact of burst loss to to momenary buffer overload...

(Let me know if I should draw a picture of the way I understand TSO / HW DMAis currently working, and where it could be improved upon):


Best regards,
  Richard

----- Original Message -----

Back to back packets see higher loss rates than packets more spread out intime. Consider a pair of packets, back to back, arriving over a 1Gbit/seclink into a queue being serviced at 34Mbit/sec, the first packet being'lost' is equivalent to saying that the first packet 'observed' the queuefull - the system's state is no longer a random variable - it is known tobe full. The second packet (lets assume it is also a full one) 'makes anobservation' of the state of that queue about 12us later - but that isonly 3% of the time that it takes to service such large packets at 34Mbit/sec. The system has not had any time to 'relax' anywhere near to backits steady state, it is highly likely that it is still full.

Fixing this makes a phenomenal difference on the goodput (with the usualdelay effects that implies), we've even built and deployed systems withthis sort of engineering embedded (deployed as a network 'wrap') that meanthat end users can sustainably (days on end) achieve effective throughputthat is better than 98% of (the transmission media imposed) maximum. Whatwe had done is make the network behave closer to the underlyingstatistical assumptions made in TCP's design.


Neil




On 5 May 2011, at 17:10, Stephen Hemminger wrote:

On Thu, 05 May 2011 12:01:22 -0400
Jim Gettys <j...@freedesktop.org> wrote:

On 04/30/2011 03:18 PM, Richard Scheffenegger wrote:

I'm curious, has anyone done some simulations to check if the
following qualitative statement holds true, and if, what the
quantitative effect is:

With bufferbloat, the TCP congestion control reaction is unduely
delayed. When it finally happens, the tcp stream is likely facing a
"burst loss" event - multiple consecutive packets get dropped. Worse
yet, the sender with the lowest RTT across the bottleneck will likely
start to retransmit while the (tail-drop) queue is still overflowing.

And a lost retransmission means a major setback in bandwidth (except
for Linux with bulk transfers and SACK enabled), as the standard (RFC
documented) behaviour asks for a RTO (1sec nominally, 200-500 ms
typically) to recover such a lost retransmission...

The second part (more important as an incentive to the ISPs actually),
how does the fraction of goodput vs. throughput change, when AQM
schemes are deployed, and TCP CC reacts in a timely manner? Small ISPs
have to pay for their upstream volume, regardless if that is "real"
work (goodput) or unneccessary retransmissions.

When I was at a small cable ISP in switzerland last week, surely
enough bufferbloat was readily observable (17ms -> 220ms after 30 sec
of a bulk transfer), but at first they had the "not our problem" view,
until I started discussing burst loss / retransmissions / goodput vs
throughput - with the latest point being a real commercial incentive
to them. (They promised to check if AQM would be available in the CPE
/ CMTS, and put latency bounds in their tenders going forward).

I wish I had a good answer to your very good questions.  Simulation
would be interesting though real daa is more convincing.

I haven't looked in detail at all that many traces to try to get a feel
for how much bandwidth waste there actually is, and more formal studies
like Netalyzr, SamKnows, or the Bismark project would be needed to
quantify the loss on the network as a whole.

I did spend some time last fall with the traces I've taken.  In those,
I've typically been seeing 1-3% packet loss in the main TCP transfers.
On the wireless trace I took, I saw 9% loss, but whether that is
bufferbloat induced loss or not, I don't know (the data is out there for
those who might want to dig).  And as you note, the losses are

concentrated in bursts (probably due to the details of Cubic, so I'mtold).


I've had anecdotal reports (and some first hand experience) with much
higher loss rates, for example from Nick Weaver at ICSI; but I believe
in playing things conservatively with any numbers I quote and I've not
gotten consistent results when I've tried, so I just report what's in
the packet captures I did take.

A phenomena that could be occurring is that during congestion avoidance
(until TCP loses its cookies entirely and probes for a higher operating
point) that TCP is carefully timing it's packets to keep the buffers
almost exactly full, so that competing flows (in my case, simple pings)
are likely to arrive just when there is no buffer space to accept them
and therefore you see higher losses on them than you would on the single
flow I've been tracing and getting loss statistics from.

People who want to look into this further would be a great help.
                - Jim


I would not put a lot of trust in measuring loss with pings.
I heard that some ISP's do different processing on ICMP's used
for ping packets. They either prioritize them high to provide
artificially good response (better marketing numbers); or
prioritize them low since they aren't useful traffic.
There are also filters that only allow N ICMP requests per second
which means repeated probes will be dropped.



--
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net

https://lists.bufferbloat.net/listinfo/bloat


_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Burst Loss

Reply via email to