Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-09-01 Thread Richard Scheffenegger
Hi Jerry,

isn't this the problem statement of Conex?

Again, you at the end host would gain little insight with Conex, but every 
intermediate network operator can observe the red/black marked packets, compare 
the ratios and know to what extent (by looking at ingress vs egress into his 
network ) he is contributing...

Best regards,
  Richard

  - Original Message - 
  From: Jerry Jongerius 
  To: 'Rich Brown' 
  Cc: bloat@lists.bufferbloat.net 
  Sent: Thursday, August 28, 2014 6:20 PM
  Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?


  It add accountability.  Everyone in the path right now denies that they could 
possibly be the one dropping the packet.

   

  If I want (or need!) to address the problem, I can't now.  I would have to 
make a change and just hope that it fixed the problem.

   

  With accountability, I can address the problem.  I then have a choice.  If 
the problem is the ISP, I can switch ISP's.  If the problem is the mid-level 
peer or the hosting provider, I can test out new hosting providers.

   

  - Jerry

   

   

   

  From: Rich Brown [mailto:richb.hano...@gmail.com] 
  Sent: Thursday, August 28, 2014 10:39 AM
  To: Jerry Jongerius
  Cc: Greg White; Sebastian Moeller; bloat@lists.bufferbloat.net
  Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

   

  Hi Jerry,

   

AQM is a great solution for bufferbloat.  End of story.  But if you want to 
track down which device in the network intentionally dropped a packet (when 
many devices in the network path will be running AQM), how are you going to do 
that?  Or how do youpropose to do that?

   

  Yes, but... I want to understand why you are looking to know which device 
dropped the packet. What would you do with the information?

   

  The great beauty of fq_codel is that it discards packets that have dwelt too 
long in a queue by actually *measuring* how long they've been in the queue. 

   

  If the drops happen in your local gateway/home router, then it's interesting 
to you as the operator of that device. If the drops happen elsewhere (perhaps 
some enlightened ISP has installed fq_codel, PIE, or some other zoomy queue 
discipline) then they're doing the right thing as well - they're managing their 
traffic as well as they can. But once the data leaves your gateway router, you 
can't make any further predictions.

   

  The SQM/AQM efforts of CeroWrt/fq_codel are designed to give near optimal 
performance of the *local* gateway, to make it adapt to the remainder of the 
(black box) network. It might make sense to instrument the CeroWrt/OpenWrt code 
to track the number of fq_codel drops to come up with a sense of what's 
'normal'. And if you need to know exactly what's happening, then 
tcpdump/wireshark are your friends. 

   

  Maybe I'm missing the point of your note, but I'm not sure there's anything 
you can do beyond your gateway. In the broader network, operators are 
continually watching their traffic and drop rates, and adjusting/reconfiguring 
their networks to adapt. But in general, it's impossible for you to have any 
sway/influence on their operations, so I'm not sure what you would do if you 
could know that the third router in traceroute was dropping...

   

  Best regards,

   

  Rich



--


  ___
  Bloat mailing list
  Bloat@lists.bufferbloat.net
  https://lists.bufferbloat.net/listinfo/bloat
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-30 Thread Jonathan Morton

On 29 Aug, 2014, at 5:37 pm, Jerry Jongerius wrote:

 did you check to see if packets were re-sent even if they weren't lost? on of
 the side effects of excessive buffering is that it's possible for a packet to
 be held in the buffer long enough that the sender thinks that it's been
 lost and retransmits it, so the packet is effectivly 'lost' even if it 
 actually
 arrives at it's destination.
 
 Yes.  A duplicate packet for the missing packet is not seen.
 
 The receiver 'misses' a packet; starts sending out tons of dup acks (for all
 packets in flight and queued up due to bufferbloat), and then way later, the
 packet does come in (after the RTT caused by bufferbloat; indicating it is
 the 'resent' packet).  

I think I've cracked this one - the cause, if not the solution.

Let's assume, for the moment, that Jerry is correct and PowerBoost plays no 
part in this.  That implies that the flow is not using the full bandwidth after 
the loss, *and* that the additive increase of cwnd isn't sufficient to recover 
to that point within the test period.

There *is* a sequence of events that can lead to that happening:

1) Packet is lost, at the tail end of the bottleneck queue.

2) Eventually, receiver sees the loss and starts sending duplicate acks (each 
triggering CA_EVENT_SLOW_ACK path in the sender).  Sender (running Westwood+) 
assumes that each of these represents a received, full-size packet, for 
bandwidth estimation purposes.

3) The receiver doesn't send, or the sender doesn't receive, a duplicate ack 
for every packet actually received.  Maybe some firewall sees a large number of 
identical packets arriving - without SACK or timestamps, they *would* be 
identical - and filters some of them.  The bandwidth estimate therefore becomes 
significantly lower than the true value, and additionally the RTO fires and 
causes the sender to reset cwnd to 1 (CA_EVENT_LOSS).

4) The retransmitted packet finally reaches the receiver, and the ack it sends 
includes all the data received in the meantime (about 3.5MB).  This is not 
sufficient to immediately reset the bandwidth estimate to the true value, 
because the BWE is sampled at RTT intervals, and also includes low-pass 
filtering.

5) This ends the recovery phase (CA_EVENT_CWR_COMPLETE), and the sender resets 
the slow-start threshold to correspond to the estimated delay-bandwidth product 
(MinRTT * BWE) at that moment.

6) This estimated DBP is lower than the true value, so the subsequent 
slow-start phase ends with the cwnd inadequately sized.  Additive increase 
would eventually correct that - but the key word is *eventually*.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-30 Thread Stephen Hemminger
On Sat, 30 Aug 2014 09:05:58 +0300
Jonathan Morton chromati...@gmail.com wrote:

 
 On 29 Aug, 2014, at 5:37 pm, Jerry Jongerius wrote:
 
  did you check to see if packets were re-sent even if they weren't lost? on 
  of
  the side effects of excessive buffering is that it's possible for a packet 
  to
  be held in the buffer long enough that the sender thinks that it's been
  lost and retransmits it, so the packet is effectivly 'lost' even if it 
  actually
  arrives at it's destination.
  
  Yes.  A duplicate packet for the missing packet is not seen.
  
  The receiver 'misses' a packet; starts sending out tons of dup acks (for all
  packets in flight and queued up due to bufferbloat), and then way later, the
  packet does come in (after the RTT caused by bufferbloat; indicating it is
  the 'resent' packet).  
 
 I think I've cracked this one - the cause, if not the solution.
 
 Let's assume, for the moment, that Jerry is correct and PowerBoost plays no 
 part in this.  That implies that the flow is not using the full bandwidth 
 after the loss, *and* that the additive increase of cwnd isn't sufficient to 
 recover to that point within the test period.
 
 There *is* a sequence of events that can lead to that happening:
 
 1) Packet is lost, at the tail end of the bottleneck queue.
 
 2) Eventually, receiver sees the loss and starts sending duplicate acks (each 
 triggering CA_EVENT_SLOW_ACK path in the sender).  Sender (running Westwood+) 
 assumes that each of these represents a received, full-size packet, for 
 bandwidth estimation purposes.
 
 3) The receiver doesn't send, or the sender doesn't receive, a duplicate ack 
 for every packet actually received.  Maybe some firewall sees a large number 
 of identical packets arriving - without SACK or timestamps, they *would* be 
 identical - and filters some of them.  The bandwidth estimate therefore 
 becomes significantly lower than the true value, and additionally the RTO 
 fires and causes the sender to reset cwnd to 1 (CA_EVENT_LOSS).
 
 4) The retransmitted packet finally reaches the receiver, and the ack it 
 sends includes all the data received in the meantime (about 3.5MB).  This is 
 not sufficient to immediately reset the bandwidth estimate to the true value, 
 because the BWE is sampled at RTT intervals, and also includes low-pass 
 filtering.
 
 5) This ends the recovery phase (CA_EVENT_CWR_COMPLETE), and the sender 
 resets the slow-start threshold to correspond to the estimated 
 delay-bandwidth product (MinRTT * BWE) at that moment.
 
 6) This estimated DBP is lower than the true value, so the subsequent 
 slow-start phase ends with the cwnd inadequately sized.  Additive increase 
 would eventually correct that - but the key word is *eventually*.
 
  - Jonathan Morton

Bandwidth estimates by ack RTT is fraught with problems. The returning ACK can 
be
delayed for any number of reasons such as other traffic or aggregation. This 
kind
of delay based congestion control suffers badly from any latency induced in the 
network.
So instead of causing bloat, it gets hit by bloat.

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-30 Thread Jonathan Morton

On 30 Aug, 2014, at 9:28 am, Stephen Hemminger wrote:

 On Sat, 30 Aug 2014 09:05:58 +0300
 Jonathan Morton chromati...@gmail.com wrote:
 
 
 On 29 Aug, 2014, at 5:37 pm, Jerry Jongerius wrote:
 
 did you check to see if packets were re-sent even if they weren't lost? on 
 of
 the side effects of excessive buffering is that it's possible for a packet 
 to
 be held in the buffer long enough that the sender thinks that it's been
 lost and retransmits it, so the packet is effectivly 'lost' even if it 
 actually
 arrives at it's destination.
 
 Yes.  A duplicate packet for the missing packet is not seen.
 
 The receiver 'misses' a packet; starts sending out tons of dup acks (for all
 packets in flight and queued up due to bufferbloat), and then way later, the
 packet does come in (after the RTT caused by bufferbloat; indicating it is
 the 'resent' packet).  
 
 I think I've cracked this one - the cause, if not the solution.
 
 Let's assume, for the moment, that Jerry is correct and PowerBoost plays no 
 part in this.  That implies that the flow is not using the full bandwidth 
 after the loss, *and* that the additive increase of cwnd isn't sufficient to 
 recover to that point within the test period.
 
 There *is* a sequence of events that can lead to that happening:
 
 1) Packet is lost, at the tail end of the bottleneck queue.
 
 2) Eventually, receiver sees the loss and starts sending duplicate acks 
 (each triggering CA_EVENT_SLOW_ACK path in the sender).  Sender (running 
 Westwood+) assumes that each of these represents a received, full-size 
 packet, for bandwidth estimation purposes.
 
 3) The receiver doesn't send, or the sender doesn't receive, a duplicate ack 
 for every packet actually received.  Maybe some firewall sees a large number 
 of identical packets arriving - without SACK or timestamps, they *would* be 
 identical - and filters some of them.  The bandwidth estimate therefore 
 becomes significantly lower than the true value, and additionally the RTO 
 fires and causes the sender to reset cwnd to 1 (CA_EVENT_LOSS).
 
 4) The retransmitted packet finally reaches the receiver, and the ack it 
 sends includes all the data received in the meantime (about 3.5MB).  This is 
 not sufficient to immediately reset the bandwidth estimate to the true 
 value, because the BWE is sampled at RTT intervals, and also includes 
 low-pass filtering.
 
 5) This ends the recovery phase (CA_EVENT_CWR_COMPLETE), and the sender 
 resets the slow-start threshold to correspond to the estimated 
 delay-bandwidth product (MinRTT * BWE) at that moment.
 
 6) This estimated DBP is lower than the true value, so the subsequent 
 slow-start phase ends with the cwnd inadequately sized.  Additive increase 
 would eventually correct that - but the key word is *eventually*.
 
 - Jonathan Morton
 
 Bandwidth estimates by ack RTT is fraught with problems. The returning ACK 
 can be
 delayed for any number of reasons such as other traffic or aggregation. This 
 kind
 of delay based congestion control suffers badly from any latency induced in 
 the network.
 So instead of causing bloat, it gets hit by bloat.

In this case, the TCP is actually tracking RTT surprisingly well, but the 
bandwidth estimate goes wrong because the duplicate ACKs go missing.  Note that 
if the MinRTT was estimated too high (which is the only direction it could go), 
this would result in the slow-start threshold being *higher* than required, and 
the symptoms observed would not occur, since the cwnd would grow to the 
required value after recovery.

This is the opposite effect from what happens to TCP Vegas in a bloated 
environment.  Vegas stops increasing cwnd when the estimated RTT is noticeably 
higher than MinRTT, but if the true MinRTT changes (or it has to compete with a 
non-Vegas TCP flow), it has trouble tracking that fact.

There is another possibility:  that the assumption of non-queue RTT being 
constant against varying bandwidth is incorrect.  If that is the case, then the 
observed behaviour can be explained without recourse to lost duplicate ACKs - 
so Westwood+ is correctly tracking both MinRTT and BWE - but (MinRTT * BWE) 
turns out to be a poor estimate of the true BDP.  I think this still fails to 
explain why the cwnd is reset (which should occur only on RTO), but everything 
else potentially fits.

I think we can distinguish the two theories by running tests against a server 
that supports SACK and timestamps, and where ideally we can capture packet 
traces at both ends.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-29 Thread Jerry Jongerius
 Okay that is interesting, Could I convince you to try to enable SACK
 on the server and test whether you still see the catastrophic results?
 And/or try another tcp variant instead of westwood+, like the default
cubic.

Would love to, but can not.  I have read only access to settings on that
server.

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-29 Thread Sebastian Moeller
Hi Jerry,


On Aug 29, 2014, at 13:33 , Jerry Jongerius jer...@duckware.com wrote:

 Okay that is interesting, Could I convince you to try to enable SACK
 on the server and test whether you still see the catastrophic results?
 And/or try another tcp variant instead of westwood+, like the default
 cubic.
 
 Would love to, but can not.  I have read only access to settings on that
 server.

Ah, too bad, it would have been nice to be able to pinpoint this closer 
(is this effect a quirk/bug in westwood+ or caused by the “archaic” lack of 
SACKs). But this list contains vast knowledge about networking, so I hope that 
someone has an idea how to get closer to the root-cause even without root 
access on the server. Oh, maybe you can ask the hosting company/ owner of the 
server to switch the tcp for you?

Best Regards
Sebastian
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-29 Thread Jerry Jongerius
A ‘boost’ has never been seen.  Bandwidth graphs where there is no packet loss 
look like:

 



 

 

From: Jonathan Morton [mailto:chromati...@gmail.com] 
Sent: Thursday, August 28, 2014 2:15 PM
To: Jerry Jongerius
Cc: bloat
Subject: RE: [Bloat] The Dark Problem with AQM in the Internet?

 

If it is genuinely a single packet, then I have an alternate theory.

I note from http://www.dslreports.com/faq/14520 that PowerBoost works on the 
first 20MB of a download.  At 100Mbps or so, that's about 2 seconds.  So that's 
quite convincing evidence that your packet loss is happening at the moment 
PowerBoost switches off.

It might be that the switching process takes long enough to drop one packet. Or 
it might be that Comcast deliberately drop one packet in order to signal the 
change in bandwidth to the sender. Clever, if mildly distasteful.

- Jonathan Morton

attachment: image001.jpg
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-29 Thread Jerry Jongerius
 did you check to see if packets were re-sent even if they weren't lost? on
of
 the side effects of excessive buffering is that it's possible for a packet
to
 be held in the buffer long enough that the sender thinks that it's been
 lost and retransmits it, so the packet is effectivly 'lost' even if it
actually
 arrives at it's destination.

Yes.  A duplicate packet for the missing packet is not seen.

The receiver 'misses' a packet; starts sending out tons of dup acks (for all
packets in flight and queued up due to bufferbloat), and then way later, the
packet does come in (after the RTT caused by bufferbloat; indicating it is
the 'resent' packet).  


___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-29 Thread Jonathan Morton
 A ‘boost’ has never been seen.  Bandwidth graphs where there is no packet
loss look like:

That's very odd, if true. Westwood+ should still be increasing the
congestion window additively after recovering, so even if it got the
bandwidth or latency estimates wrong, it should still recover full
performance. Not necessarily very quickly, but it should still be visible
on a timescale of several seconds.

More likely is that you're conflating cause and effect. The packet is only
lost when the boost ends, so if for some reason the boost never ends, the
packet is never lost.

- Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-29 Thread Jerry Jongerius
The additive increase is there in the raw data.

 

From: Jonathan Morton [mailto:chromati...@gmail.com] 
Sent: Friday, August 29, 2014 12:31 PM
To: Jerry Jongerius
Cc: bloat
Subject: RE: [Bloat] The Dark Problem with AQM in the Internet?

 

 A ‘boost’ has never been seen.  Bandwidth graphs where there is no packet 
 loss look like:

That's very odd, if true. Westwood+ should still be increasing the congestion 
window additively after recovering, so even if it got the bandwidth or latency 
estimates wrong, it should still recover full performance. Not necessarily very 
quickly, but it should still be visible on a timescale of several seconds.

More likely is that you're conflating cause and effect. The packet is only lost 
when the boost ends, so if for some reason the boost never ends, the packet is 
never lost.

- Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Jerry Jongerius
Mr. White,

 

AQM is a great solution for bufferbloat.  End of story.  But if you want to
track down which device in the network intentionally dropped a packet (when
many devices in the network path will be running AQM), how are you going to
do that?  Or how do you propose to do that?

 

The graph presented is caused the interaction of a single dropped packet,
bufferbloat, and the Westwood+ congestion control algorithm – and not power
boost.

 

- Jerry

 

 

 

-Original Message-
From: Greg White [mailto:g.wh...@cablelabs.com] 
Sent: Monday, August 25, 2014 1:14 PM
To: Sebastian Moeller; Jerry Jongerius
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

 

As far as I know there are no deployments of AQM in DOCSIS networks yet.

So, the effect you are seeing is unlikely to be due to AQM.

 

As Sebastian indicated, it looks like an interaction between power boost, a
drop tail buffer and the tcp congestion window getting reset to slow-start.

 

I ran a quick simulation of a simple network with power boost and basic

(bloated) drop tail buffer (no AQM) this morning in an attempt to understand
what is going on here. You didn't give me a lot to go on in the text of your
blog post, but nonetheless after playing around with parameters a bit, I was
able to get a result that was close to what you are seeing (attached).  Let
me know if you disagree.

 

I'm a bit concerned with the tone of your article, making AQM out to be the
bad guy here (weapon against end users, etc.).  The folks on this list and
those who participate in the IETF AQM WG are working on AQM and packet
scheduling algorithms in an attempt to fix the Internet.  At this point
AQM/PS is the best known solution, let's not create negative perceptions
unnecessarily.

 

-Greg

 

On 8/23/14, 2:01 PM, Sebastian Moeller  mailto:moell...@gmx.de
moell...@gmx.de wrote:

 

Hi Jerry,

 

On Aug 23, 2014, at 20:16 , Jerry Jongerius  mailto:jer...@duckware.com
jer...@duckware.com wrote:

 

 Request for comments on:  http://www.duckware.com/darkaqm
www.duckware.com/darkaqm

 

 The bottom line: How do you know which AQM device in a network 

intentionally  drops a packet, without cooperation from AQM?

 

 Or is this in AQM somewhere and I just missed it?

 

 

I am sure you will get more expert responses later, but let me try to 

comment.

 

Paragraph 1:

 

I think you hit the nail on the head with your observation:

 

The average user can not figure out what AQM device intentionally 

dropped packets

 

Only, I might add, this does not depend on AQM, the user can not figure 

out where packets where dropped in the case that not all involved 

network hops are under said user¹s control ;) So move on, nothing to 

see here ;)

 

Paragraph 2:

 

There is no guarantee that any network equipment responds to ICMP 

requests at all (for example my DSLAM does not). What about pinging a 

host further away and look at that hosts RTT development over time?

(Minor clarification: its the load dependent increase of ping RTT to 

the CMTS that would be diagnostic of a queue, not the RTT per se). No 

increase of ICMP RTT could also mean there is no AQM involved ;)

 

 I used to think along similar lines, but reading 

 https://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Tracero
https://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Tracero

ute _N47_Sun.pdf made me realize that my assumptions about ping and 

trace route were not really backed up by reality. Notably traceroute 

will not necessarily show the real data's path and latencies or drop 

probability.

 

Paragraph 3

 

What is the advertised bandwidth of your link? To my naive eye this 

looks a bit like power boosting (the cable company allowing you higher 

than advertised bandwidth for a short time that is later reduced to the 

advertised speed). Your plot needs a better legend, BTW, what is the 

blue line showing? When you say that neither ping nor trace route 

showed anything, I assumed that you measured concurrently to your 

download. It would be really great if you could netperf-wrapper to get 

comparable data (see the link on 

 http://www.bufferbloat.net/projects/cerowrt/wiki/Quick_Test_for_Bufferb
http://www.bufferbloat.net/projects/cerowrt/wiki/Quick_Test_for_Bufferb

loa t ) There the latency is not only assessed by ICMP echo requests 

but also by UDP packets, and it is very unlikely that your ISP can 

special case these in any tricky way, short of giving priority to 

sparse flows (which is pretty much what you would like your ISP to do 

in the first place ;) )

 

 Here is where I reveal that I am just a layman, but you
complain about 

the loss of one packet, but how do you assume does a (TCP) settle on 

its transfer speed? Exactly it keeps increasing until it looses a 

packet, then reduces its speed to 50% or so and slowly ramps up again 

until the next packet loss. So 

Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Jonathan Morton

On 28 Aug, 2014, at 4:19 pm, Jerry Jongerius wrote:

 AQM is a great solution for bufferbloat.  End of story.  But if you want to 
 track down which device in the network intentionally dropped a packet (when 
 many devices in the network path will be running AQM), how are you going to 
 do that?  Or how do you propose to do that?

We don't plan to do that.  Not from the outside.  Frankly, we can't reliably 
tell which routers drop packets today, when AQM is not at all widely deployed, 
so that's no great loss.

But if ECN finally gets deployed, AQM can set the Congestion Experienced flag 
instead of dropping packets, most of the time.  You still don't get to see 
which router did it, but the packet still gets through and the TCP session 
knows what to do about it.

 The graph presented is caused the interaction of a single dropped packet, 
 bufferbloat, and the Westwood+ congestion control algorithm – and not power 
 boost.

This surprises me somewhat - Westwood+ is supposed to be deliberately tolerant 
of single packet losses, since it was designed explicitly to get around the 
problem of slight random loss on wireless networks.

I'd be surprised if, in fact, *only* one packet was lost.  The more usual case 
is of burst loss, where several packets are lost in quick succession, and not 
necessarily consecutive packets.  This tends to happen repeatedly on dump 
drop-tail queues, unless the buffer is so large that it accommodates the entire 
receive window (which, for modern OSes, is quite impressive in a dark sort of 
way).  Burst loss is characteristic of congestion, whereas random loss tends to 
lose isolated packets, so it would be much less surprising for Westwood+ to 
react to it.

The packets were lost in the first place because the queue became chock-full, 
probably at just about the exact moment when the PowerBoost allowance ran out 
and the bandwidth came down (which tends to cause the buffer to fill rapidly), 
so you get the worst-case scenario: the buffer at its fullest, and the 
bandwidth draining it at its minimum.  This maximises the time before your TCP 
gets to even notice the lost packet's nonexistence, during which the sender 
keeps the buffer full because it still thinks everything's fine.

What is probably happening is that the bottleneck queue, being so large, delays 
the retransmission of the lost packet until the Retransmit Timer expires.  This 
will cause Reno-family TCPs to revert to slow-start, assuming (rightly in this 
case) that the characteristics of the channel have changed.  You can see that 
it takes most of the first second for the sender to ramp up to full speed, and 
nearly as long to ramp back up to the reduced speed, both of which are 
characteristic of slow-start at WAN latencies.  NB: during slow-start, the 
buffer remains empty as long as the incoming data rate is less than the output 
capacity, so latency is at a minimum.

Do you have TCP SACK and timestamps turned on?  Those usually allow minor 
losses like that to be handled more gracefully - the sending TCP gets a better 
idea of the RTT (allowing it to set the Retransmit Timer more intelligently), 
and would be able to see that progress is still being made with the backlog of 
buffered packets, even though the core TCP ACK is not advancing.  In the event 
of burst loss, it would also be able to retransmit the correct set of packets 
straight away.

What AQM would do for you here - if your ISP implemented it properly - is to 
eliminate the negative effects of filling that massive buffer at your ISP.  It 
would allow the sending TCP to detect and recover from any packet loss more 
quickly, and with ECN turned on you probably wouldn't even get any packet loss.

What's also interesting is that, after recovering from the change in bandwidth, 
you get smaller bursts of about 15-40KB arriving at roughly half-second 
intervals, mixed in with the relatively steady 1-, 2- and 3-packet stream.  
That is characteristic of low-level packet loss with a low-latency recovery.

This either implies that your ISP has stuck you on a much shorter buffer for 
the lower-bandwidth (non-PowerBoost) regime, *or* that the sender is enforcing 
a smaller congestion window on you after having suffered a slow-start recovery. 
 The latter restricts your bandwidth to match the delay-bandwidth product, but 
happily the delay in that equation is at a minimum if it keeps your buffer 
empty.

And frankly, you're still getting 45Mbps under those conditions.  Many people 
would kill for that sort of performance - although they'd probably then want to 
kill everyone in the Comcast call centre later on.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Rich Brown
Hi Jerry,

 AQM is a great solution for bufferbloat.  End of story.  But if you want to 
 track down which device in the network intentionally dropped a packet (when 
 many devices in the network path will be running AQM), how are you going to 
 do that?  Or how do youpropose to do that?

Yes, but... I want to understand why you are looking to know which device 
dropped the packet. What would you do with the information?

The great beauty of fq_codel is that it discards packets that have dwelt too 
long in a queue by actually *measuring* how long they've been in the queue. 

If the drops happen in your local gateway/home router, then it's interesting to 
you as the operator of that device. If the drops happen elsewhere (perhaps 
some enlightened ISP has installed fq_codel, PIE, or some other zoomy queue 
discipline) then they're doing the right thing as well - they're managing their 
traffic as well as they can. But once the data leaves your gateway router, you 
can't make any further predictions.

The SQM/AQM efforts of CeroWrt/fq_codel are designed to give near optimal 
performance of the *local* gateway, to make it adapt to the remainder of the 
(black box) network. It might make sense to instrument the CeroWrt/OpenWrt code 
to track the number of fq_codel drops to come up with a sense of what's 
'normal'. And if you need to know exactly what's happening, then 
tcpdump/wireshark are your friends. 

Maybe I'm missing the point of your note, but I'm not sure there's anything you 
can do beyond your gateway. In the broader network, operators are continually 
watching their traffic and drop rates, and adjusting/reconfiguring their 
networks to adapt. But in general, it's impossible for you to have any 
sway/influence on their operations, so I'm not sure what you would do if you 
could know that the third router in traceroute was dropping...

Best regards,

Rich


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Fred Baker (fred)

On Aug 28, 2014, at 9:20 AM, Jerry Jongerius jer...@duckware.com wrote:

 It add accountability.  Everyone in the path right now denies that they could 
 possibly be the one dropping the packet.
  
 If I want (or need!) to address the problem, I can’t now.  I would have to 
 make a change and just hope that it fixed the problem.
  
 With accountability, I can address the problem.  I then have a choice.  If 
 the problem is the ISP, I can switch ISP’s.  If the problem is the mid-level 
 peer or the hosting provider, I can test out new hosting providers.
 
May I ask what may be a dumb question?

All communications has some probability of error. That’s the reason we have 
CRCs on link layer frames; to detect and discard errored packets. The 
probability of such an error varies by media type; it’s relatively uncommon 
(O(10^-11)) on fiber, a little more common (perhaps O(10^-9)) on wired 
Ethernet, likely on Wifi (O(1p^-7 or so), which is why Wifi incorporates local 
retransmission), and very likely (O(10^-4)) on satellite links, which is why 
they use forward error correction.

Errors are not usually single bit errors. They are far more commonly block 
errors, especially if trellis coding is in use, as once there is an error the 
entire link goes screwy until it works out where the data is going. Such block 
errors might consume entire messages, or sets of messages, including not only 
the messages but the gaps between them.

When a message is lost due to an error, how do you determine whose fault it is?

 - Jerry
  
  
  
 From: Rich Brown [mailto:richb.hano...@gmail.com] 
 Sent: Thursday, August 28, 2014 10:39 AM
 To: Jerry Jongerius
 Cc: Greg White; Sebastian Moeller; bloat@lists.bufferbloat.net
 Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?
  
 Hi Jerry,
  
 AQM is a great solution for bufferbloat.  End of story.  But if you want to 
 track down which device in the network intentionally dropped a packet (when 
 many devices in the network path will be running AQM), how are you going to 
 do that?  Or how do youpropose to do that?
  
 Yes, but... I want to understand why you are looking to know which device 
 dropped the packet. What would you do with the information?
  
 The great beauty of fq_codel is that it discards packets that have dwelt too 
 long in a queue by actually *measuring* how long they've been in the queue. 
  
 If the drops happen in your local gateway/home router, then it's interesting 
 to you as the operator of that device. If the drops happen elsewhere 
 (perhaps some enlightened ISP has installed fq_codel, PIE, or some other 
 zoomy queue discipline) then they're doing the right thing as well - they're 
 managing their traffic as well as they can. But once the data leaves your 
 gateway router, you can't make any further predictions.
  
 The SQM/AQM efforts of CeroWrt/fq_codel are designed to give near optimal 
 performance of the *local* gateway, to make it adapt to the remainder of the 
 (black box) network. It might make sense to instrument the CeroWrt/OpenWrt 
 code to track the number of fq_codel drops to come up with a sense of what's 
 'normal'. And if you need to know exactly what's happening, then 
 tcpdump/wireshark are your friends. 
  
 Maybe I'm missing the point of your note, but I'm not sure there's anything 
 you can do beyond your gateway. In the broader network, operators are 
 continually watching their traffic and drop rates, and 
 adjusting/reconfiguring their networks to adapt. But in general, it's 
 impossible for you to have any sway/influence on their operations, so I'm not 
 sure what you would do if you could know that the third router in traceroute 
 was dropping...
  
 Best regards,
  
 Rich
 ___
 Bloat mailing list
 Bloat@lists.bufferbloat.net
 https://lists.bufferbloat.net/listinfo/bloat



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Greg White
And again, AQM is not causing the problem that you observed.  As Jonathan 
indicated, it would almost certainly make your performance better.I can't 
speak for Comcast, but AFAIK they are on a path to deploy AQM.  If their 
customers start raising FUD that could change.

TCP requires congestion signals.  In the vast majority of cases today (and for 
the foreseeable future) those signals are dropped packets.  Going on a witch 
hunt to find the evildoer that dropped your packet is counter productive.  I 
think you should instead be asking why didn't you drop my packet earlier, 
before the buffer got so bloated and power boost cut the BDP by 60%?

-Greg

From: Jerry Jongerius jer...@duckware.commailto:jer...@duckware.com
Date: Thursday, August 28, 2014 at 10:20 AM
To: 'Rich Brown' richb.hano...@gmail.commailto:richb.hano...@gmail.com
Cc: bloat@lists.bufferbloat.netmailto:bloat@lists.bufferbloat.net 
bloat@lists.bufferbloat.netmailto:bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

It add accountability.  Everyone in the path right now denies that they could 
possibly be the one dropping the packet.

If I want (or need!) to address the problem, I can’t now.  I would have to make 
a change and just hope that it fixed the problem.

With accountability, I can address the problem.  I then have a choice.  If the 
problem is the ISP, I can switch ISP’s.  If the problem is the mid-level peer 
or the hosting provider, I can test out new hosting providers.

- Jerry



From: Rich Brown [mailto:richb.hano...@gmail.com]
Sent: Thursday, August 28, 2014 10:39 AM
To: Jerry Jongerius
Cc: Greg White; Sebastian Moeller; 
bloat@lists.bufferbloat.netmailto:bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

Hi Jerry,

AQM is a great solution for bufferbloat.  End of story.  But if you want to 
track down which device in the network intentionally dropped a packet (when 
many devices in the network path will be running AQM), how are you going to do 
that?  Or how do youpropose to do that?

Yes, but... I want to understand why you are looking to know which device 
dropped the packet. What would you do with the information?

The great beauty of fq_codel is that it discards packets that have dwelt too 
long in a queue by actually *measuring* how long they've been in the queue.

If the drops happen in your local gateway/home router, then it's interesting to 
you as the operator of that device. If the drops happen elsewhere (perhaps 
some enlightened ISP has installed fq_codel, PIE, or some other zoomy queue 
discipline) then they're doing the right thing as well - they're managing their 
traffic as well as they can. But once the data leaves your gateway router, you 
can't make any further predictions.

The SQM/AQM efforts of CeroWrt/fq_codel are designed to give near optimal 
performance of the *local* gateway, to make it adapt to the remainder of the 
(black box) network. It might make sense to instrument the CeroWrt/OpenWrt code 
to track the number of fq_codel drops to come up with a sense of what's 
'normal'. And if you need to know exactly what's happening, then 
tcpdump/wireshark are your friends.

Maybe I'm missing the point of your note, but I'm not sure there's anything you 
can do beyond your gateway. In the broader network, operators are continually 
watching their traffic and drop rates, and adjusting/reconfiguring their 
networks to adapt. But in general, it's impossible for you to have any 
sway/influence on their operations, so I'm not sure what you would do if you 
could know that the third router in traceroute was dropping...

Best regards,

Rich
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Bill Ver Steeg (versteb)
Regarding AQM in North American HFC deployments-

I also can't speak for individual Service Providers, but Greg was being modest 
and the following may be interesting.

The most recent DOCSIS 3.1 specs calls for AQM in the CMTS. It specifically 
calls for a specific variant of  PIE that is designed with the DOCSIS MAC layer 
in mind. The DOCSIS 3.0 spec is also being amended to require AQM. Both specs 
also have recommendations to include AQM in the Cable Modems that can be turned 
on in the HFC network.

See http://tools.ietf.org/html/draft-white-aqm-docsis-pie-00 for more details.

bvs

[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Bill Ver Steeg
Distinguished Engineer
Cisco Systems






From: bloat-boun...@lists.bufferbloat.net 
[mailto:bloat-boun...@lists.bufferbloat.net] On Behalf Of Greg White
Sent: Thursday, August 28, 2014 12:36 PM
To: Jerry Jongerius; 'Rich Brown'
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

And again, AQM is not causing the problem that you observed.  As Jonathan 
indicated, it would almost certainly make your performance better.I can't 
speak for Comcast, but AFAIK they are on a path to deploy AQM.  If their 
customers start raising FUD that could change.

TCP requires congestion signals.  In the vast majority of cases today (and for 
the foreseeable future) those signals are dropped packets.  Going on a witch 
hunt to find the evildoer that dropped your packet is counter productive.  I 
think you should instead be asking why didn't you drop my packet earlier, 
before the buffer got so bloated and power boost cut the BDP by 60%?

-Greg

From: Jerry Jongerius jer...@duckwae.commailto:jer...@duckwae.com
Date: Thursday, August 28, 2014 at 10:20 AM
To: 'Rich Brown' richb.hano...@gmail.commailto:richb.hano...@gmail.com
Cc: bloat@lists.bufferbloat.netmailto:bloat@lists.bufferbloat.net 
bloat@lists.bufferbloat.netmailto:bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

It add accountability.  Everyone in the path right now denies that they could 
possibly be the one dropping the packet.

If I want (or need!) to address the problem, I can't now.  I would have to make 
a change and just hope that it fixed the problem.

With accountability, I can address the problem.  I then have a choice.  If the 
problem is the ISP, I can switch ISP's.  If the problem is the mid-level peer 
or the hosting provider, I can test out new hosting providers.

- Jerry



From: Rich Brown [mailto:richb.hano...@gmail.com]
Sent: Thursday, August 28, 2014 10:39 AM
To: Jerry Jongerius
Cc: Greg White; Sebastian Moeller; 
bloat@lists.bufferbloat.netmailto:bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

Hi Jerry,

AQM is a great solution for bufferbloat.  End of story.  But if you want to 
track down which device in the network intentionally dropped a packet (when 
many devices in the network path will be running AQM), how are you going to do 
that?  Or how do youpropose to do that?

Yes, but... I want to understand why you are looking to know which device 
dropped the packet. What would you do with the information?

The great beauty of fq_codel is that it discards packets that have dwelt too 
long in a queue by actually *measuring* how long they've been in the queue.

If the drops happen in your local gateway/home router, then it's interesting to 
you as the operator of that device. If the drops happen elsewhere (perhaps 
some enlightened ISP has installed fq_codel, PIE, or some other zoomy queue 
discipline) then they're doing the right thing as well - they're managing their 
traffic as well as they can. But once the data leaves your gateway router, you 
can't make any further predictions.

The SQM/AQM efforts of CeroWrt/fq_codel are designed to give near optimal 
performance of the *local* gateway, to make it adapt to the remainder of the 
(black box) network. It might make sense to instrument the CeroWrt/OpenWrt code 
to track the number of fq_codel drops to come up with a sense of what's 
'normal'. And if you need to know exactly what's happening, then 
tcpdump/wireshark are your friends.

Maybe I'm missing the point of your note, but I'm not sure there's anything you 
can do beyond your gateway. In the broader network, operators are continually 
watching their traffic and drop rates, and adjusting/reconfiguring their 
networks to adapt. But in general, it's impossible for you to have any 
sway/influence on their operations, so I'm not sure what you would do if you 
could know that the third router in traceroute was dropping...

Best regards,

Rich
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Dave Taht
On Thu, Aug 28, 2014 at 10:20 AM, Jerry Jongerius jer...@duckware.com wrote:
 Jonathan,

 Yes, WireShark shows that *only* one packet gets lost.  Regardless of RWIN
 size.  The RWIN size can be below the BDP (no measurable queuing within the
 CMTS).  Or, the RWIN size can be very large, causing significant queuing
 within the CMTS.  With a larger RWIN value, the single dropped packet
 typically happens sooner in the download, rather than later.  The fact there
 is no burst loss is a significant clue.

 The graph is fully explained by the Westwood+ algorithm that the server is
 using.  If you input the data observed into the Westwood+ bandwidth
 estimator, you end up with the rate seen in the graph after the packet loss
 event.  The reason the rate gets limited (no ramp up) is due to Westwood+
 behavior on a RTO.  And the reason there is the RTO is due the bufferbloat,
 and the timing of the lost packet in relation to when the bufferbloat
 starts.  When there is no RTO, I see the expected drop (to the Westwood+
 bandwidth estimate) and ramp back up.  On a RTO, Westwood+ sets both
 ssthresh and cwnd to its bandwidth estimate.

On the same network, what does cubic do?

 The PC does SACK, the server does not, so not used.  Timestamps off.

Timestamps are *critical* for good tcp performance above 5-10mbit on
most cc algos.

I note that the netperf-wrapper test has the ability to test multiple
variants of
TCP, if enabled on the server (basically you need to modprobe the needed
algorithms, enable them in /proc/sys/net/ipv4/tcp_allowed_congestion_control,
and select them in the test tool (iperf and netperf have support)).

Everyone here has installed netperf-wrapper already, yes?

Very fast to generate a good test and a variety of plots like those shown
here:  http://burntchrome.blogspot.com/2014_05_01_archive.html

(in reading that over, does anyone have any news on CMTS aqm or packet
scheduling systems? It's the bulk of the problem there...)

 netperf-wrapper is easy to bring up
on linux, on osx it needs macports, and the only way I've come up to test
windows behavior is using windows as a netperf client rather than server.

I haven't looked into westwood+'s behavior much of late, I will try to add it
and a few other tcps to some future tests. I do have some old plots showing
it misbehaving relative to other TCPs, but that was before many fixes landed
in the kernel.

Note: I keep hoping to find a correctly working ledbat module, the one
I have doesn't look correct (and needs
to be updated to linux 3.15's change to us based timestamping.)


 - Jerry


 -Original Message-
 From: Jonathan Morton [mailto:chromati...@gmail.com]
 Sent: Thursday, August 28, 2014 10:08 AM
 To: Jerry Jongerius
 Cc: 'Greg White'; 'Sebastian Moeller'; bloat@lists.bufferbloat.net
 Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?


 On 28 Aug, 2014, at 4:19 pm, Jerry Jongerius wrote:

 AQM is a great solution for bufferbloat.  End of story.  But if you want
 to track down which device in the network intentionally dropped a packet
 (when many devices in the network path will be running AQM), how are you
 going to do that?  Or how do you propose to do that?

 We don't plan to do that.  Not from the outside.  Frankly, we can't reliably
 tell which routers drop packets today, when AQM is not at all widely
 deployed, so that's no great loss.

 But if ECN finally gets deployed, AQM can set the Congestion Experienced
 flag instead of dropping packets, most of the time.  You still don't get to
 see which router did it, but the packet still gets through and the TCP
 session knows what to do about it.

 The graph presented is caused the interaction of a single dropped packet,
 bufferbloat, and the Westwood+ congestion control algorithm - and not power
 boost.

 This surprises me somewhat - Westwood+ is supposed to be deliberately
 tolerant of single packet losses, since it was designed explicitly to get
 around the problem of slight random loss on wireless networks.

 I'd be surprised if, in fact, *only* one packet was lost.  The more usual
 case is of burst loss, where several packets are lost in quick succession,
 and not necessarily consecutive packets.  This tends to happen repeatedly on
 dump drop-tail queues, unless the buffer is so large that it accommodates
 the entire receive window (which, for modern OSes, is quite impressive in a
 dark sort of way).  Burst loss is characteristic of congestion, whereas
 random loss tends to lose isolated packets, so it would be much less
 surprising for Westwood+ to react to it.

 The packets were lost in the first place because the queue became
 chock-full, probably at just about the exact moment when the PowerBoost
 allowance ran out and the bandwidth came down (which tends to cause the
 buffer to fill rapidly), so you get the worst-case scenario: the buffer at
 its fullest, and the bandwidth draining it at its minimum.  This maximises
 the time before your TCP gets to even notice 

Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Jan Ceuleers
On 08/28/2014 06:35 PM, Fred Baker (fred) wrote:
 When a message is lost due to an error, how do you determine whose fault
 it is?

Links need to be engineered for the optimum combination of power,
bandwidth, overhead and residual error that meets requirements. I agree
with your implied point that a single error is unlikely to be indicative
of a real problem, but a link not meeting requirements is someone's fault.

So like Jerry I'd be interested in an ability for endpoints to be able
to collect statistics on per-hop loss probabilities so that admins can
hold their providers accountable.

Jan

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Jonathan Morton
If it is genuinely a single packet, then I have an alternate theory.

I note from http://www.dslreports.com/faq/14520 that PowerBoost works on
the first 20MB of a download.  At 100Mbps or so, that's about 2 seconds.
So that's quite convincing evidence that your packet loss is happening at
the moment PowerBoost switches off.

It might be that the switching process takes long enough to drop one
packet. Or it might be that Comcast deliberately drop one packet in order
to signal the change in bandwidth to the sender. Clever, if mildly
distasteful.

- Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Kenyon Ralph
On 2014-08-28T20:00:54+0200, Jan Ceuleers jan.ceule...@gmail.com wrote:
 On 08/28/2014 06:35 PM, Fred Baker (fred) wrote:
  When a message is lost due to an error, how do you determine whose fault
  it is?
 
 Links need to be engineered for the optimum combination of power,
 bandwidth, overhead and residual error that meets requirements. I agree
 with your implied point that a single error is unlikely to be indicative
 of a real problem, but a link not meeting requirements is someone's fault.
 
 So like Jerry I'd be interested in an ability for endpoints to be able
 to collect statistics on per-hop loss probabilities so that admins can
 hold their providers accountable.

Here is some relevant work:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2417573
Measurement and Analysis of Internet Interconnection and Congestion

-- 
Kenyon Ralph


signature.asc
Description: Digital signature
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread Sebastian Moeller
Hi Jerry,


On Aug 28, 2014, at 19:20 , Jerry Jongerius jer...@duckware.com wrote:

 Jonathan,
 
 Yes, WireShark shows that *only* one packet gets lost.  Regardless of RWIN
 size.  The RWIN size can be below the BDP (no measurable queuing within the
 CMTS).  Or, the RWIN size can be very large, causing significant queuing
 within the CMTS.  With a larger RWIN value, the single dropped packet
 typically happens sooner in the download, rather than later.  The fact there
 is no burst loss is a significant clue.
 
 The graph is fully explained by the Westwood+ algorithm that the server is
 using.  If you input the data observed into the Westwood+ bandwidth
 estimator, you end up with the rate seen in the graph after the packet loss
 event.  The reason the rate gets limited (no ramp up) is due to Westwood+
 behavior on a RTO.  And the reason there is the RTO is due the bufferbloat,
 and the timing of the lost packet in relation to when the bufferbloat
 starts.  When there is no RTO, I see the expected drop (to the Westwood+
 bandwidth estimate) and ramp back up.  On a RTO, Westwood+ sets both
 ssthresh and cwnd to its bandwidth estimate.
 
 The PC does SACK, the server does not, so not used.  Timestamps off.

Okay that is interesting, Could I convince you to try to enable SACK on 
the server and test whether you still see the catastrophic results? And/or try 
another tcp variant instead of westwood+, like the default cubic.

Best Regards
Sebastian

 
 - Jerry
 
 
 -Original Message-
 From: Jonathan Morton [mailto:chromati...@gmail.com] 
 Sent: Thursday, August 28, 2014 10:08 AM
 To: Jerry Jongerius
 Cc: 'Greg White'; 'Sebastian Moeller'; bloat@lists.bufferbloat.net
 Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?
 
 
 On 28 Aug, 2014, at 4:19 pm, Jerry Jongerius wrote:
 
 AQM is a great solution for bufferbloat.  End of story.  But if you want
 to track down which device in the network intentionally dropped a packet
 (when many devices in the network path will be running AQM), how are you
 going to do that?  Or how do you propose to do that?
 
 We don't plan to do that.  Not from the outside.  Frankly, we can't reliably
 tell which routers drop packets today, when AQM is not at all widely
 deployed, so that's no great loss.
 
 But if ECN finally gets deployed, AQM can set the Congestion Experienced
 flag instead of dropping packets, most of the time.  You still don't get to
 see which router did it, but the packet still gets through and the TCP
 session knows what to do about it.
 
 The graph presented is caused the interaction of a single dropped packet,
 bufferbloat, and the Westwood+ congestion control algorithm - and not power
 boost.
 
 This surprises me somewhat - Westwood+ is supposed to be deliberately
 tolerant of single packet losses, since it was designed explicitly to get
 around the problem of slight random loss on wireless networks.
 
 I'd be surprised if, in fact, *only* one packet was lost.  The more usual
 case is of burst loss, where several packets are lost in quick succession,
 and not necessarily consecutive packets.  This tends to happen repeatedly on
 dump drop-tail queues, unless the buffer is so large that it accommodates
 the entire receive window (which, for modern OSes, is quite impressive in a
 dark sort of way).  Burst loss is characteristic of congestion, whereas
 random loss tends to lose isolated packets, so it would be much less
 surprising for Westwood+ to react to it.
 
 The packets were lost in the first place because the queue became
 chock-full, probably at just about the exact moment when the PowerBoost
 allowance ran out and the bandwidth came down (which tends to cause the
 buffer to fill rapidly), so you get the worst-case scenario: the buffer at
 its fullest, and the bandwidth draining it at its minimum.  This maximises
 the time before your TCP gets to even notice the lost packet's nonexistence,
 during which the sender keeps the buffer full because it still thinks
 everything's fine.
 
 What is probably happening is that the bottleneck queue, being so large,
 delays the retransmission of the lost packet until the Retransmit Timer
 expires.  This will cause Reno-family TCPs to revert to slow-start, assuming
 (rightly in this case) that the characteristics of the channel have changed.
 You can see that it takes most of the first second for the sender to ramp up
 to full speed, and nearly as long to ramp back up to the reduced speed, both
 of which are characteristic of slow-start at WAN latencies.  NB: during
 slow-start, the buffer remains empty as long as the incoming data rate is
 less than the output capacity, so latency is at a minimum.
 
 Do you have TCP SACK and timestamps turned on?  Those usually allow minor
 losses like that to be handled more gracefully - the sending TCP gets a
 better idea of the RTT (allowing it to set the Retransmit Timer more
 intelligently), and would be able to see that progress is still being 

Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread David Lang

On Thu, 28 Aug 2014, Dave Taht wrote:


On Thu, Aug 28, 2014 at 11:00 AM, Jan Ceuleers jan.ceule...@gmail.com wrote:

On 08/28/2014 06:35 PM, Fred Baker (fred) wrote:

When a message is lost due to an error, how do you determine whose fault
it is?


Links need to be engineered for the optimum combination of power,
bandwidth, overhead and residual error that meets requirements. I agree
with your implied point that a single error is unlikely to be indicative
of a real problem, but a link not meeting requirements is someone's fault.

So like Jerry I'd be interested in an ability for endpoints to be able
to collect statistics on per-hop loss probabilities so that admins can
hold their providers accountable.


I will argue that a provider demonstrating 3% packet loss and low
latency is better than a provider showing .03% packet loss and
exorbitant latency. So I'd rather be measuring latency AND loss.


Yep, the drive to never loose a packet is what caused buffer sizes to grow to 
such silly extremes.


David Lang


One very cool thing that went by at sigcomm last week was the concept
of active networking revived in the form of Tiny Packet Programs:
see:

http://arxiv.org/pdf/1405.7143v3.pdf

Which has a core concept of a protocol and virtual machine that can
actively gather data from the path itself about buffering, loss, etc.

No implementation was presented, but I could see a way to easily do it
in linux via iptables. Regrettably, elsewhere in the real world, we
have to infer these statistics via various means.




Jan

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat




--
Dave Täht

NSFW: 
https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-28 Thread David Lang

On Thu, 28 Aug 2014, Jerry Jongerius wrote:


Yes, WireShark shows that *only* one packet gets lost.  Regardless of RWIN
size.  The RWIN size can be below the BDP (no measurable queuing within the
CMTS).  Or, the RWIN size can be very large, causing significant queuing
within the CMTS.  With a larger RWIN value, the single dropped packet
typically happens sooner in the download, rather than later.  The fact there
is no burst loss is a significant clue.


did you check to see if packets were re-sent even if they weren't lost? on of 
the side effects of excessive buffering is that it's possible for a packet to be 
held in the buffer long enough that the sender thinks that it's been lost and 
retransmits it, so the packet is effectivly 'lost' even if it actually arrives 
at it's destination.


David Lang


The graph is fully explained by the Westwood+ algorithm that the server is
using.  If you input the data observed into the Westwood+ bandwidth
estimator, you end up with the rate seen in the graph after the packet loss
event.  The reason the rate gets limited (no ramp up) is due to Westwood+
behavior on a RTO.  And the reason there is the RTO is due the bufferbloat,
and the timing of the lost packet in relation to when the bufferbloat
starts.  When there is no RTO, I see the expected drop (to the Westwood+
bandwidth estimate) and ramp back up.  On a RTO, Westwood+ sets both
ssthresh and cwnd to its bandwidth estimate.

The PC does SACK, the server does not, so not used.  Timestamps off.

- Jerry


-Original Message-
From: Jonathan Morton [mailto:chromati...@gmail.com]
Sent: Thursday, August 28, 2014 10:08 AM
To: Jerry Jongerius
Cc: 'Greg White'; 'Sebastian Moeller'; bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?


On 28 Aug, 2014, at 4:19 pm, Jerry Jongerius wrote:


AQM is a great solution for bufferbloat.  End of story.  But if you want

to track down which device in the network intentionally dropped a packet
(when many devices in the network path will be running AQM), how are you
going to do that?  Or how do you propose to do that?

We don't plan to do that.  Not from the outside.  Frankly, we can't reliably
tell which routers drop packets today, when AQM is not at all widely
deployed, so that's no great loss.

But if ECN finally gets deployed, AQM can set the Congestion Experienced
flag instead of dropping packets, most of the time.  You still don't get to
see which router did it, but the packet still gets through and the TCP
session knows what to do about it.


The graph presented is caused the interaction of a single dropped packet,

bufferbloat, and the Westwood+ congestion control algorithm - and not power
boost.

This surprises me somewhat - Westwood+ is supposed to be deliberately
tolerant of single packet losses, since it was designed explicitly to get
around the problem of slight random loss on wireless networks.

I'd be surprised if, in fact, *only* one packet was lost.  The more usual
case is of burst loss, where several packets are lost in quick succession,
and not necessarily consecutive packets.  This tends to happen repeatedly on
dump drop-tail queues, unless the buffer is so large that it accommodates
the entire receive window (which, for modern OSes, is quite impressive in a
dark sort of way).  Burst loss is characteristic of congestion, whereas
random loss tends to lose isolated packets, so it would be much less
surprising for Westwood+ to react to it.

The packets were lost in the first place because the queue became
chock-full, probably at just about the exact moment when the PowerBoost
allowance ran out and the bandwidth came down (which tends to cause the
buffer to fill rapidly), so you get the worst-case scenario: the buffer at
its fullest, and the bandwidth draining it at its minimum.  This maximises
the time before your TCP gets to even notice the lost packet's nonexistence,
during which the sender keeps the buffer full because it still thinks
everything's fine.

What is probably happening is that the bottleneck queue, being so large,
delays the retransmission of the lost packet until the Retransmit Timer
expires.  This will cause Reno-family TCPs to revert to slow-start, assuming
(rightly in this case) that the characteristics of the channel have changed.
You can see that it takes most of the first second for the sender to ramp up
to full speed, and nearly as long to ramp back up to the reduced speed, both
of which are characteristic of slow-start at WAN latencies.  NB: during
slow-start, the buffer remains empty as long as the incoming data rate is
less than the output capacity, so latency is at a minimum.

Do you have TCP SACK and timestamps turned on?  Those usually allow minor
losses like that to be handled more gracefully - the sending TCP gets a
better idea of the RTT (allowing it to set the Retransmit Timer more
intelligently), and would be able to see that progress is still being made
with the backlog of buffered 

Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-25 Thread Greg White
As far as I know there are no deployments of AQM in DOCSIS networks yet.
So, the effect you are seeing is unlikely to be due to AQM.

As Sebastian indicated, it looks like an interaction between power boost,
a drop tail buffer and the tcp congestion window getting reset to
slow-start.

I ran a quick simulation of a simple network with power boost and basic
(bloated) drop tail buffer (no AQM) this morning in an attempt to
understand what is going on here. You didn't give me a lot to go on in the
text of your blog post, but nonetheless after playing around with
parameters a bit, I was able to get a result that was close to what you
are seeing (attached).  Let me know if you disagree.

I'm a bit concerned with the tone of your article, making AQM out to be
the bad guy here (weapon against end users, etc.).  The folks on this
list and those who participate in the IETF AQM WG are working on AQM and
packet scheduling algorithms in an attempt to fix the Internet.  At this
point AQM/PS is the best known solution, let's not create negative
perceptions unnecessarily.

-Greg

On 8/23/14, 2:01 PM, Sebastian Moeller moell...@gmx.de wrote:

Hi Jerry,

On Aug 23, 2014, at 20:16 , Jerry Jongerius jer...@duckware.com wrote:

 Request for comments on: www.duckware.com/darkaqm
 
 The bottom line: How do you know which AQM device in a network
intentionally
 drops a packet, without cooperation from AQM?
 
 Or is this in AQM somewhere and I just missed it?


I am sure you will get more expert responses later, but let me try to
comment.

Paragraph 1:

I think you hit the nail on the head with your observation:

The average user can not figure out what AQM device intentionally dropped
packets

Only, I might add, this does not depend on AQM, the user can not figure
out where packets where dropped in the case that not all involved network
hops are under said user¹s control ;) So move on, nothing to see here ;)

Paragraph 2:

There is no guarantee that any network equipment responds to ICMP
requests at all (for example my DSLAM does not). What about pinging a
host further away and look at that hosts RTT development over time?
(Minor clarification: its the load dependent increase of ping RTT to the
CMTS that would be diagnostic of a queue, not the RTT per se). No
increase of ICMP RTT could also mean there is no AQM involved ;)

   I used to think along similar lines, but reading
https://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute
_N47_Sun.pdf made me realize that my assumptions about ping and trace
route were not really backed up by reality. Notably traceroute will not
necessarily show the real data's path and latencies or drop probability.

Paragraph 3

What is the advertised bandwidth of your link? To my naive eye this looks
a bit like power boosting (the cable company allowing you higher than
advertised bandwidth for a short time that is later reduced to the
advertised speed). Your plot needs a better legend, BTW, what is the blue
line showing? When you say that neither ping nor trace route showed
anything, I assumed that you measured concurrently to your download. It
would be really great if you could netperf-wrapper to get comparable data
(see the link on 
http://www.bufferbloat.net/projects/cerowrt/wiki/Quick_Test_for_Bufferbloa
t ) There the latency is not only assessed by ICMP echo requests but also
by UDP packets, and it is very unlikely that your ISP can special case
these in any tricky way, short of giving priority to sparse flows (which
is pretty much what you would like your ISP to do in the first place ;) )

   Here is where I reveal that I am just a layman, but you complain about
the loss of one packet, but how do you assume does a (TCP) settle on its
transfer speed? Exactly it keeps increasing until it looses a packet,
then reduces its speed to 50% or so and slowly ramps up again until the
next packet loss. So unless your test data is not TCP I see no way to
avoid packet loss (and no reason why it is harmful). Now if my power
boost intuition should prove right I can explain the massive drop quite
well, TCP had ramped up to above the long-term stable and suffers several
packet losses in a short time, basically resetting it to 0 or so,
therefore the new ramping to 40Mbps looks pretty similar to the initial
ramping to 110Mbps...

Paragraph 4:

I guess, ECN, explicit congestion notification is the best you can
expect, or routers will initially set a mark on a packet to notify the
TCP endpoints that they need to throttle the speed unless that want to
risk packet loss. But not all routers are configured to use it (plus you
need to configure your endpoints correctly, see:
http://www.bufferbloat.net/projects/cerowrt/wiki/Enable_ECN ). But this
will not tell you where along the path congestion occurred, only that it
occurred (and if push comes to shove your packets still get dropped.)
   Also, I believe, a congested router is going to drop packets to be able
to ³survive² the current load, it 

Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-25 Thread Jim Gettys
Note that I worked with Folkert Van Heusden to get some options added to
his httping program to allow ping style testing against any HTTP server
out there using HTTP/TCP.

See:

http://www.vanheusden.com/httping/

I find it slightly ironic that people are now concerned about ICMP ping no
longer returning queuing information given that when I started working on
bufferbloat, a number of people claimed that ICMP Ping could not be relied
upon to report reliable information, as it may be prioritized differently
by routers. This urban legend may or may not be true; I never observed it
in my explorations.

In any case, you all may find it useful, and my thanks to Folkert for a
very useful tool.

   - Jim
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-25 Thread Bill Ver Steeg (versteb)
Just a cautionary tale- There was a fairly well publicized DOS attack that 
involved TCP SYN packets with a zero TTL (If I recall correctly), so be careful 
running that tool. Be particularly careful if you run it in bulk, as you may 
end up in a black list on a firewall somewhere..




Bill Ver Steeg
Distinguished Engineer 
Cisco Systems






-Original Message-
From: bloat-boun...@lists.bufferbloat.net 
[mailto:bloat-boun...@lists.bufferbloat.net] On Behalf Of Sebastian Moeller
Sent: Monday, August 25, 2014 3:13 PM
To: Jim Gettys
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

Hi Jim,


On Aug 25, 2014, at 20:09 , Jim Gettys j...@freedesktop.org wrote:

 Note that I worked with Folkert Van Heusden to get some options added to his 
 httping program to allow ping style testing against any HTTP server out 
 there using HTTP/TCP.
 
 See:
 
 http://www.vanheusden.com/httping/

That is quite cool!

 
 I find it slightly ironic that people are now concerned about ICMP ping no 
 longer returning queuing information given that when I started working on 
 bufferbloat, a number of people claimed that ICMP Ping could not be relied 
 upon to report reliable information, as it may be prioritized differently by 
 routers. 

Just to add what I learned: some routers seem to have rate limiting for 
ICMP processing and process these on a slow-path (see 
https://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N47_Sun.pdf
 ). Mind you this applies if the router processes the ICMP packet, not if it 
simply passes it along. So as long as the host responding to the pings is not a 
router with interesting limitations, this should not affect the suitability of 
ICMP to detect and measure buffer bloat (heck this is what netperf-wrapper's 
RRUL test automated). But since Jerry wants to pinpoint the exact location of 
his assumed single packet drop he wants to use ping/traceroute to actually 
probe routers on the way, so all this urban legends about ICMP processing on 
routers will actually affect him. But then what do I know...

Best Regards
Sebastian

 This urban legend may or may not be true; I never observed it in my 
 explorations.
 
 In any case, you all may find it useful, and my thanks to Folkert for a very 
 useful tool.
 
   - Jim
 

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Dark Problem with AQM in the Internet?

2014-08-25 Thread Bill Ver Steeg (versteb)
Oops - never mind. I thought the tool was doing traceroute-like things with 
varying TTLs in order to get per-hop data. 

Go back to whatever you were doing..


Bill Ver Steeg
Distinguished Engineer 
Cisco Systems






-Original Message-
From: bloat-boun...@lists.bufferbloat.net 
[mailto:bloat-boun...@lists.bufferbloat.net] On Behalf Of Bill Ver Steeg 
(versteb)
Sent: Monday, August 25, 2014 5:17 PM
To: Sebastian Moeller; Jim Gettys
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

Just a cautionary tale- There was a fairly well publicized DOS attack that 
involved TCP SYN packets with a zero TTL (If I recall correctly), so be careful 
running that tool. Be particularly careful if you run it in bulk, as you may 
end up in a black list on a firewall somewhere..




Bill Ver Steeg
Distinguished Engineer 
Cisco Systems






-Original Message-
From: bloat-boun...@lists.bufferbloat.net 
[mailto:bloat-boun...@lists.bufferbloat.net] On Behalf Of Sebastian Moeller
Sent: Monday, August 25, 2014 3:13 PM
To: Jim Gettys
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?

Hi Jim,


On Aug 25, 2014, at 20:09 , Jim Gettys j...@freedesktop.org wrote:

 Note that I worked with Folkert Van Heusden to get some options added to his 
 httping program to allow ping style testing against any HTTP server out 
 there using HTTP/TCP.
 
 See:
 
 http://www.vanheusden.com/httping/

That is quite cool!

 
 I find it slightly ironic that people are now concerned about ICMP ping no 
 longer returning queuing information given that when I started working on 
 bufferbloat, a number of people claimed that ICMP Ping could not be relied 
 upon to report reliable information, as it may be prioritized differently by 
 routers. 

Just to add what I learned: some routers seem to have rate limiting for 
ICMP processing and process these on a slow-path (see 
https://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N47_Sun.pdf
 ). Mind you this applies if the router processes the ICMP packet, not if it 
simply passes it along. So as long as the host responding to the pings is not a 
router with interesting limitations, this should not affect the suitability of 
ICMP to detect and measure buffer bloat (heck this is what netperf-wrapper's 
RRUL test automated). But since Jerry wants to pinpoint the exact location of 
his assumed single packet drop he wants to use ping/traceroute to actually 
probe routers on the way, so all this urban legends about ICMP processing on 
routers will actually affect him. But then what do I know...

Best Regards
Sebastian

 This urban legend may or may not be true; I never observed it in my 
 explorations.
 
 In any case, you all may find it useful, and my thanks to Folkert for a very 
 useful tool.
 
   - Jim
 

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat