[Bloat] Apple ECN, Bufferbloat, CoDel

2015-06-13 Thread Mikael Abrahamsson


I highly encourage people to take a look at:

https://developer.apple.com/videos/wwdc/2015/?id=719 (you might have to 
reigster as an apple developer to watch it, I don't know)


"Your App and Next Generation Networks
IPv6 is growing exponentially and carriers worldwide are moving to pure 
IPv6 APNs. Learn about new tools to test your apps for compatibility and 
get expert advice on making sure your apps work in all network 
environments. iOS 9 and OS X 10.11 now support the latest TCP standards. 
Hear from the experts on TCP Fast Open and Explicit Congestion 
Notification, and find out how it benefits your apps."


Being on this list you might not learn much from the talk, but I really 
appreciate a talk aimed at a wider (developer) audience which so clearly 
outlines the benefits of ECN, CoDel and TCP host opimization to reduce 
end-to-end experienced application communication latency. One of the major 
takeaways is that Apple is planning to by default enable ECN in iOS9 and 
OSX 10.11. This would mean hundreds of millions of devices will be using 
ECN in a few months.


You can skip to 16 minutes into the talk if you're not interested in the 
new requirement for applications to support an environment where it's 
Internet access is IPv6 only behind NAT64+DNS64 (I'm myself super excited 
about this).


Let's hope this brings a lot of buzz and requests towards device 
manufacturers to start supporting ECN marking and AQM. Apple is usually a 
good megaphone to bring attention to these kinds of issues...


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Apple ECN, Bufferbloat, CoDel

2015-06-13 Thread Mikael Abrahamsson

On Sat, 13 Jun 2015, Dave Taht wrote:

I don't understand how badly this is going to break dnssec. dnsmasq in 
particular has been dealing with edge case after edge case on dnssec for 
the last few months, and it was my hope we'd finally got them all.


DNS64 breaks DNSSEC because it creates an  response where none is 
present in the zone being queried. It's basically doing MITM for DNS, 
which is exactly what DNSSEC was supposed to fix.


DNSSEC would work if Apple decided to just do NAT64 discovery and then do 
their own DNS64 in the host, but I have no information as to what is being 
done here.


At least DNSSEC still works between the Internet and the ISP DNS64 
resolver, but the end host won't be able to verify the response using 
DNSSEC.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Fwd: performance testing on the WRT1200AC

2015-06-14 Thread Mikael Abrahamsson


Hi,

Some background:

The WRT1900ACv1 (which has been shipping for 6 months or so) is based on 
Marvell Armada XP, which uses a packet processor. There is no support for 
this in the generic Linux kernel, which means performance is a lot lower 
with the generic kernel compared to the "special" kernel which has patches 
and where you use the Marvell SDK to compile it to support the packet 
processor. With the generic kernel, you get CPU only based forwarding 
which is around 300-500 megabit/s of TCP.


Now, with WRT1200AC and WRT1900ACv2 which was released in the last few 
weeks or so and just now becoming more widely available, they've changed 
to Marvell Armada 385 which is the beefiest packet forwarding generic CPU 
I have ever heard of or encountered in a "home gateway" kind of package. I 
have an WRT1200AC for testing I received this week, and so far I have been 
able to verify that it does 940 megabit/s of TCP (iperf) with the generic 
kernel shipped with OpenWRT CC with the below default qdisc. It seems to 
do this using approximately 25% CPU.


So what I would like to do now is try to push it a little bit harder, so 
if someone could give me an example of a more punishing qdisc setup and 
test to run through it, that would be very interesting.


But so far, the Armada 385 chipset (and I hope we'll see more devices 
based on it) seems to be a perfect platform for bufferbload testing and 
development. Yes, it's a lot pricier than the WNDR3800 that for instance 
CeroWRT uses, but on the other hand, it seems to have 10x the performance 
of that box, and everything seems to work right out of the box without any 
special patches.


On Sun, 14 Jun 2015, Dave Taht wrote:


a wider audience for the issues in new consumer hardware seems desirable.

forwarding with permission.


-- Forwarded message --
From: Dave Taht 
Date: Sun, Jun 14, 2015 at 8:41 AM
Subject: Re: performance testing on the WRT1200AC
To: Mikael Abrahamsson , Aaron Wood 


Dear Mikael:

netperf-wrapper has been renamed to flent. :) Quite a bit of new stuff
is dropping into it, one of my favorite tests is the new qdisc_stats
test (which I run at the same time as another test). It hasn't been
tested on a multi-queue interface (and doesn't work with openwrt's sh
implementation dang it). But do a pull anyway. :)

On Sun, Jun 14, 2015 at 8:18 AM, Mikael Abrahamsson  wrote:


Hi,

I want to do some more demanding testing of the WRT1200AC. Currently it's
running a few days old openwrt CC. It comes with the below qdisc setting. I
will be testing it using the following setup:

linux-switch-wrt1200ac-linux

All links above are gigabit ethernet links.

My plan is to for instance run netperf-wrapper with a few different tests.

Would it strain the WRT1200AC if I configured it to shape to 900 megabit/s
bidirectionallty? I guess in order to actually achieve a little bit of


My original tests with the 1900AC showed htb peaking out with sqm +
offloads at about 550/650mbit on the rrul test. (I can't remember if
nat was on or off, but I think off)

but that was months ago. I have a huge hope that cake will do better
on this platform and recently (yesterday) I think got that to the
point where we could push it to openwrt to be built regularly.

Aaron, cc'd, has done quite a bit of work with the 1900, and I think
he started running into trouble at 200mbit.


buffering, I'm going to have to run below wirespeed? Because I can't get
more than 1 gigabit/s of traffic to the wrt1200ac because of above layout,
so doing bidirectional shaping to 900 on eth0 (WAN PORT) would at least give
it a bit more to do and also give a chance to induce some buffering?


Ain't it a bitch? A thought would be to also exercise the wifi a bit
to drive it past gigE overall. So have two clients running flent tests
simultaneously, one on wifi, one on ethernet, and there you go,
driving it into overload.


Do you have some other ideas for testing? I am mostly interested in making
sure the CPU is fast enough to do AQM at gig speeds...


Well, there are other issues.

A) The mvneta ethernet driver in the 1900 did not support BQL when
last I looked, supplying insufficient backpressure to the upper
layers.

B) The multiqueued hardware applies a bit of fq for you automagically,
BUT, even if BQL was in place, BQL's buffering is additive per
hardware queue, so it tends to

what I saw was nearly no drops in the qdisc. I don't think I even saw
maxpacket grow (a sure sign you are backlogging in the qdisc) I ended
up disabling the hardware mq multiqueue[1] stuff entirely by "tc qdisc
add dev eth0 root fq_codel", and even then, see A) - but I did finally
see maxpacket grow...

C) to realize to my horror that they had very aggressively implemented
GRO for everything, giving us 64k "packets" to deal with coming in
from the gigE ethernet... which interacted rather badly with t

[Bloat] [Cerowrt-devel] performance numbers from WRT1200AC (Re: Latest build test - new sqm-scripts seem to work; "cake overhead 40" didn't) (fwd)

2015-06-23 Thread Mikael Abrahamsson


FYI in case some aren't on the cerowrt-devel ml.

-- Forwarded message --
Date: Tue, 23 Jun 2015 14:55:30 +0200 (CEST)
From: Mikael Abrahamsson 
To: cerowrt-devel 
Subject: [Cerowrt-devel] performance numbers from WRT1200AC (Re: Latest build
test - new sqm-scripts seem to work; "cake overhead 40" didn't)

On Tue, 23 Jun 2015, Sebastian Moeller wrote:

	Most likely not. Check http://wiki.openwrt.org/doc/howto/sqm . Rich 
published a great set of instructions for setting up sqm-scripts under 
openwrt proper.


I tried it on Linksys WRT1200AC with OpenWrt CC RC2. I configured sqm to have 
800 megabit/s each direction, and ran iperf3 over IPv4 with NAT44 from Linux 
box behind WRT1200AC to an OSX macbook connected on a switch on the same L2 
subnet as the WAN port.


Linux <->WRT1200AC<->switch<->OSX

I get 765 megabit/s of throughput using single session, at sirq load of around 
25%. If I lower the mss to 300 (to generate higher pps) I get around 560 
megabit/s of throughput at 50% sirq. With 10 parallel TCP sessions, I get about 
the same. At MSS of 200 bytes, I get 400 megabit/s at 70% sirq.


If I turn off SQM completely, I get 600 megabit/s at 200 byte MSS single 
session at 80% sirq and 930 megabit/s at 26% sirq with default MSS.


So if you want high performing device that is OpenWRT compatible and still does 
forwarding using CPU so you can test queuing algorithms, the WRT1200AC and 
WRT1900ACv2 is the best I have been able to find currently (unless you go for 
x86 platform).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Cerowrt-devel mailing list
cerowrt-de...@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] failing to find the "declared victory" in a current wifi router

2015-07-06 Thread Mikael Abrahamsson

On Mon, 6 Jul 2015, Joe Touch wrote:


- CC-rc2 doesn't have a WRT1200AC build
presumably I should have used mvebu-armada-385-linksys-caiman,
but it's not at all clear


Yes, that's the one for the WRT1200AC. It's called "caiman" internally at 
Linksys it seems.



- and I'd have to install LUCI and/or reinstall
factory firmware from the command line, and none
of that is all that clear, esp. a recovery route
that doesn't involve voiding warranty to wire in
a serial port


You can flash back the factory firmware without serial, you just use 
sysupgrade with the Linksys factory image. I've done this. It's not easy 
to get into the box, and I have plastic dents on my unit now because I 
failed to understand how it fits together. I also ended up buying pin 
headers with tweezers to connect the TTL-USB serial device to the 
connector on the PCB. I have since then received proper cables so now I 
have wires sticking out and I'm waiting for connectors so I can make a 
more permanent solution.


I have also had to use the serial console on mine because something broke 
in the upgrade process one of the 30-40 times I did sysupgrade.


I won't speak of the "declared victory". In my opinion the victory might 
be "there is now knowledge on how to do this and there is substantial 
awareness in the rest of the industry" but it's definitely not executed 
yet.


And yes, you're right, there is very little "mainstream" about OpenWrt. 
It's reasonably easy with a lot of devices (and there are guides to read), 
but it's not like anyone can do it. It's like changing oil in a car, it's 
not that hard, but if you don't know how to do it, you need to study first 
and find correct tools in order to do it. Also, if you get it wrong you 
might damage things.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] failing to find the "declared victory" in a current wifi router

2015-07-10 Thread Mikael Abrahamsson

On Tue, 7 Jul 2015, Joe Touch wrote:


Some questions:

On 7/6/2015 11:16 PM, Mikael Abrahamsson wrote:
...

You can flash back the factory firmware without serial, you just use
sysupgrade with the Linksys factory image.


How does that differ from mtd, e.g., as indicated here (which doesn't
mention sysinstall)?:
http://wiki.openwrt.org/doc/howto/generic.uninstall


http://wiki.openwrt.org/toh/linksys/wrt1200ac#how_to_flash_the_firmware_to_device

"Revert to Linksys Stock Firmware"

So you can either use the web ui or the "sysupgrade" command to go back to 
Stock firmware.


I have only used the mtd method once, and that was when there was a 
problem with the flash for some reason and I was recommended that method 
instead by an OpenWrt developer.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] failing to find the "declared victory" in a current wifi router

2015-07-10 Thread Mikael Abrahamsson

On Fri, 10 Jul 2015, Joe Touch wrote:

Can you explain where the info on using the sysupgrade command to revert 
to the factory image is??


http://wiki.openwrt.org/toh/linksys/wrt1200ac#how_to_flash_the_firmware_to_device

The web ui uses the "sysupgrade" command. So implicitly the instructions 
above to use "web ui" to upgrade, means you can use sysupgrade.


http://wiki.openwrt.org/doc/howto/generic.sysupgrade

So I just did:

cd /tmp
wget 
sysupgrade 

Then it rebooted. I then had to use the factory default button on the 
device to reset the configuration before things started working properly. 
"sysupgrade -n " does this as well, but I am not sure the 
Linksys configuration data is stored in the same place so that might or 
might not help.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Speed tests - attribution of latency to relevant network hops

2015-07-29 Thread Mikael Abrahamsson

On Wed, 29 Jul 2015, David Lang wrote:

unless you measure it per hop, how are you going to attribute it to each 
hop? and unless you have a server at that layer to talk to, how do you 
know what the latency or bandwidth is?


Measuring latency is doable (using the same mechanism that traceroute with 
for instance max-ttl 5), but I don't know how much of this is available to 
your web application?


If you sent 5 packets with TTL 1-5 and measured the time to get back the 
ttl-expired-in-transit ICMP, you could get an indication where the latency 
increase was happening.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Fwd: Did *bufferbloat* cause the 2010 flashcrash?

2015-08-07 Thread Mikael Abrahamsson

On Fri, 7 Aug 2015, Steinar H. Gunderson wrote:

[1] For the purposes of the question, the “BitTorrent problem” is when 
you and I are on the same network, and your 200+ BitTorrent upload 
sessions makes it impossible for me to upload my single cat video to 
YouTube.


Isn't this dependent on the upload speed? I would imagine that if it's 
5-10 megabit/s, then using Codel or similar technique that doesn't allow 
the buffer to grow to hundreds of milliseconds, would improve things a 
lot?


For a 500 kilobit/s link, I'm not sure even that would work...?

--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] RE : Save WiFi from the FCC - DEADLINE is in 3 days *September* 8

2015-09-08 Thread Mikael Abrahamsson

On Tue, 8 Sep 2015, Dave Taht wrote:


wifi, and the carriers... which bugs me. 5.x ghz is the people's
spectrum, that we should be free to use any way we want... and to make


Well, in the US at least, corporations are people, so...

But that aside, I don't know if there is anything that can be done really, 
unlicensed is unlicensed and if it's not free for everybody to use, what 
is it?


Also, isn't it pretty much the same players in wifi and LTE space, 
Qualcomm, Broadcom and the others, they're in both spaces and I don't see 
what they have to gain to make wifi worse?


And 802.11 isn't really open either, and the unlicensed spectrum still 
requires that devices are approved to be operated there, right, so if FCC 
and the likes do their job properly then these technologies should 
work together at least on the RF level?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] RE : Save WiFi from the FCC - DEADLINE is in 3 days *September* 8

2015-09-08 Thread Mikael Abrahamsson

On Tue, 8 Sep 2015, Dave Taht wrote:


On Tue, Sep 8, 2015 at 1:22 AM, Mikael Abrahamsson  wrote:

On Tue, 8 Sep 2015, Dave Taht wrote:


wifi, and the carriers... which bugs me. 5.x ghz is the people's
spectrum, that we should be free to use any way we want... and to make


Please note that the LTE-U debate is separate from the lockdown
debate, which only has a day to run. Can we get more letters into the
FCC for the lockdown problem?


I have already posted as well.

And jeeze, what makes sense - on the "licensed" spectrum - is the 
government auctions it off for big bucks one year, and then the public 
pays rents on it for all eternity. Far saner to have more openly 
available spectrum


Well, yes, we need both unlicensed and licensed spectrum.

One failed concept in america, at least, is the idea of a commons - as 
in a tragedy of the commons - elsewhere, for example, "public lands" are 
actually "the queen's" lands and people tend to treat them with more 
respect.


Yes, in sweden we have something called (translated) "Rights of public 
access" to land for instance, I'm allowed to go camping in someone elses 
forest as long as it's noncommercial and I leave it as I found it. It's a 
constant battle to keep this freedom and I agree we need this for radio as 
well. BUT it's not like unlicensed radio today means you can do whatever 
you want, there is still quite a lot of regulation around it. So I can 
understand if they want to achieve that regulated devices in unlicensed 
spectrum actually follows the regulation. The problem is that it's 
different across the world, I've heard that in Thailand for instance, 
you're only allowed to transmit with a total of 100mW from a device, so if 
you turn on both 2.4GHz and 5GHz radios, you need to limit them to 50mW 
each (or some combination). How would a completely open device solve this 
problem?


So I think a constructive approach would be to try to say how the FCC 
concern can be solved or at least mitigated in a FOSS world. Do we have 
any ideas?


Because I can understand that regulators whose job it is to make sure 
devices follow the rules have a problem with FOSS code that lets people do 
whatever they want.


Do we really want for regulators to bring back the vans who might roll 
around and impose a fine because you were running OpenWRT and happened to 
set the output power too high for whatever local regulation was in place?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] RE : Save WiFi from the FCC - DEADLINE is in 3 days *September* 8

2015-09-08 Thread Mikael Abrahamsson

On Tue, 8 Sep 2015, Dave Taht wrote:


Well, no... we need the concept of "the public's" spectrum, also.


What does that mean? Only devices that have FOSS firmware are allowed to 
send in this spectrum?


Because I actually fail to see the fundamental difference to spectrum I 
use to communicate from my purchased devices from VENDOR1 and VENDOR2 that 
I run myself, and my purchased device I use to communicate that are from 
VENDOR1 and VENDOR2 where the device from VENDOR2 is run by a mobile 
provider. I mean, do we rule out wifi networks run by providers?


Now, I will admit that I have no idea how LTE-U looks like on RF, but 
what's so different about it compared to the different other things 
sending in there like Bluetooth and wifi (and wifi has many different 
encodings).



One failed concept in america, at least, is the idea of a commons - as in
a tragedy of the commons - elsewhere, for example, "public lands" are
actually "the queen's" lands and people tend to treat them with more
respect.



Yes, in sweden we have something called (translated) "Rights of public
access" to land for instance, I'm allowed to go camping in someone elses
forest as long as it's noncommercial and I leave it as I found it.


What is the word, actually?


"Allemansrätten". Literally "everymansright".

Well, pushing the responsibility back on the actual users of the 
technology is fine by me. Enforcement seems only to be of a concern on 
the DFS channels around a limited number of airports.




--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] bloat at gigE

2015-09-23 Thread Mikael Abrahamsson

On Wed, 23 Sep 2015, Benjamin Cronce wrote:


The upload wasn't even saturated. Probably why upload bloat was very low.
Large bloat on the download just shows you the server really can push more
than 1Gb. 200ms bloat with 913Mb down is about 20MiB of buffer. That's
insane! That's about 20x more buffer than my entire 24 port 1Gb Procurve
managed switch. What kind of network equipment has that much buffer?


That's typically one distinction between L3 switch and a "router". The L3 
switch typically has on-die memory that can be as low as 128KB or up to a 
few megabytes of memory.


Then you have the real "service edge" routers with 128.000 queues that's 
used to aggregate tens of thousands of customers, where a linecard can 
have many gigabytes of packet buffer memory.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] new "vector packet processing" based effort to speed up networking

2016-02-12 Thread Mikael Abrahamsson

On Thu, 11 Feb 2016, Dave Täht wrote:


Someone asked me recently what I thought of the dpdk. I said:
"It's a great way to heat datacenters". Still, there's momentum it
seems, to move more stuff into userspace.


Especially now that Intel CPUs seem to be able to push a lot of PPS 
compared to what they could before. A lot more.


What one has to take into account is that this tech is most likely going 
to be deployed on servers with 10GE NICs or even 25/40/100GE, and they are 
most likely going to be connected to a small buffer datacenter switch 
which will do FIFO on extremely small shared buffer memory (we're talking 
small fractions of a millisecond of buffer at 10GE speed), and usually 
lots of these servers will be behind oversubscribed interconnect links 
between switches.


A completely different use case would of course be if someone started to 
create midrange enterprise routers with 1GE/10GE ports using this 
technology, then it would of course make a lot of sense to have proper 
AQM. I have no idea what kind of performance one can expect out of a low 
power Intel CPU that might fit into one of these...


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cake] are anyone playing with dpdk and vpp?

2016-04-27 Thread Mikael Abrahamsson

On Wed, 27 Apr 2016, Stephen Hemminger wrote:


DPDK gets impressive performance on large systems (like 14M packets/sec per
core), but not convinced on smaller systems.
Performance depends on having good CPU cache. I get poor performance on


As soon as you can't find information in cache and have to go to RAM to 
get it (and you need it to proceed), you've lost the impressive 
performance.


VPP is all about pre-fetching (tell memory subsystem to go get information 
into cache you probably will need in the not so distant future). It 
actually reminds me of demo programming on C64/Amiga that I was involved 
in in the 80ties. Lots of small optimisations needed to yield these 
results.


So yes, cache is extremely important for VPP.

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] BBR congestion control algorithm for TCP in net-next

2016-09-21 Thread Mikael Abrahamsson

On Wed, 21 Sep 2016, Dave Taht wrote:

I did a fairly comprehensive string of tests today, comparing it at 
20Mbits, 48ms RTT, to cubic and competing with cubic, against a byte 
fifo of 256k, pie, cake, cake flowblind, and fq_codel.


20 megabit/s is 2.5 megabyte/s, so that 256k FIFO is only 100ms worth of 
buffering. I guess you see packet drop in steady state here, ie buffer is 
full?


I'd be interested in seeing same experiment start with 10MB FIFO, and 
having CUBIC flow start first and give it proper head start.


My intuition and understanding of what's going to happen might very well 
be completely off, but I think it'd be interesting to know.


I'll take a look at your flent data, thanks for posting them!

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] BBR congestion control algorithm for TCP in net-next

2016-09-21 Thread Mikael Abrahamsson

On Wed, 21 Sep 2016, Dave Taht wrote:

* It seriously outcompetes cubic, particularly on the single queue aqms. 
fq_codel is fine. I need to take apart the captures to see how well it 
is behaving in this case. My general hope was that with fq in place, 
anything that was delay based worked better as it was only competing 
against itself.


I'm looking at 4up-sqwave-fq_bfifo-256k. Is this really fq_bfifo, or just 
bfifo? Looks like there is no fq.


If someone doesn't have the correct Flent available, I posted two 
screenshots here: http://imgur.com/a/cFtMd


What I think I see:

The flows are started in order: "BBR1, CUBIC2, BBR4, CUBIC3" (a bit 
confusing, but according to your description).


So it looks like BBR1 fills the pipe within half a second or so, nice 
steady state. Then CUBIC2 starts, and slowly over a few seconds, starts to 
starve BBR1 of BW, it looks like steady state here would be that CUBIC2 
would end up with around 65-70% of the BW, and BBR1 getting 30-35%. Then 
BBR4 comes along (10 seconds in), and just KILLS them both, smacks them 
over the head with a hammer, taking 90% of the BW, wildly oscillating in 
BW way above 20 megabit/s down to 10. The ping here goes up to around 
150-160ms. CUBIC3 starts at 15 seconds and get basically no bw at all.


Then at around 22 seconds in, I guess pretty close to 12-13 seconds after 
BBR4 was started, BBR4 starts to calm down, slowly letting the other 
streams come back to life. At around 30 seconds, they all seem to get at 
least a bit of the bw each and nobody is completely starved, but BBR1 
seems to not get much BW at all (very dotted line).


When at the end there is only CUBIC3 and BBR4 left, it looks like BBR4 has 
a 2/3 to 1/3 advantage.


Looking at cake_flowblind_noecn, BBR1 and BBR4 just kills both CUBIC 
flows. Same with PIE.


So it seems my intuition was wrong, at least for these scenarios. It 
wasn't CUBIC that would kill BBR, it's the other way around. Great to have 
testing tools! Thanks Flent!


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] "BBR" TCP patches submitted to linux kernel

2016-09-30 Thread Mikael Abrahamsson

On Thu, 29 Sep 2016, Aaron Wood wrote:

While you think 3.10 is old, in my experience it's still seen as cutting 
edge by many.  RHEL is still only at 3.10.  And routers are using much 
older 3.x kernels.  There's a huge lag between what the "enterprise" 
crowd is running in production, and what you guys are developing on. 
Because "stability".


It's been one of my major frustrations (especially on the embedded side 
where 3.x kernels are still considered 'new' and 2.6.x is 'trusted').


State of affairs are actually improving. What I'm seeing from several SoC 
vendors is that they're moving from a "new kernel every 3 years, and we'll 
choose a 2 year old kernel when doing the work so it'll be 5 years old by 
the time a new one comes around, with the result that a lot of devices 
are on 2.6.26, 3.2 and 3.4), to a model where they actually do a new 
kernel every 6 months, and they'll choose a kernel that's around 12-18 
months old at that time.


This is of course not great, but it's an improvement. I'm pushing for SoC 
vendors to actually upstream their patches as much as possible and support 
creation of kernel version independent HAL/API in the kernel that they can 
write their drivers for.


So if you know any netdev people, please tell them to be supportive when 
SoC vendors come and want changes done to the kernel to support for 
instance hw packet accelerators. We want this done right of course (so we 
can live with it for the next 5-10 years at least), but this is very 
important that it gets done.


This of course has interesting effects for AQM, since with packet 
accelerators you're taking the kernel pretty much out of the data path as 
soon as the hardware is programmed... but that's a different but related 
struggle to make sure that these aren't as bloated as yesteryears 
implementations.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Make-wifi-fast] the wifi airtime-fair fq_codel stuff on net-next looks mostly good

2016-10-15 Thread Mikael Abrahamsson

On Wed, 12 Oct 2016, Dave Taht wrote:


http://openwrtsummit.org/#quick-details


I've had the discussion with "radio guys" before regarding "fairness" of 
radio resources. They kept talking about "optimising the cell for 
throughput". I told them "then we should give the speaker with the highest 
bitrate and demand for bits as much radio resources as possible, and 
starve everybody else". This is of course not good for general customer 
satisfaction.


After a lot of discussions back and forth, we came to the same conclusion 
as you seem to have come to (if I understood Tokes talk correctly), in 
that "radio time" is the most fair resource. If someone has bad radio 
conditions then they get lower total throughput than the one with good 
radio conditions, so the fairness is "equal air time". This means 
everybody get equal part of the shared resource, and gives people an 
incentive to try to improve radio reception if they have trouble, and 
doesn't starve everybody else of airtime just because one device is having 
a bad radio day.


So full support for this approach from me, good job!

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] Comcast's NANOG slides re Bufferbloat posted (Oct 2016)

2016-10-20 Thread Mikael Abrahamsson

On Thu, 20 Oct 2016, Rich Brown wrote:


https://www.nanog.org/sites/default/files/20160922_Klatsky_First_Steps_In_v1.pdf


Does anyone understand what access speeds these customers had during these 
tests?


96 kilobyte buffer on 1 megabit/s upstream or 50 megabit/s upstream makes 
a big difference.


(I have 250/50 on my DOCSIS3.0 connection, but perhaps it's common 
knowledge what speeds Comcast customers typically has, that I don't know?)


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Any non-bloaty 4port or 8port GigE switches?

2016-10-20 Thread Mikael Abrahamsson

On Sun, 3 Jul 2016, leetminiwheat wrote:


Hi, sorry for the noise here but can anyone recommend a decent non-bloated
4port or 8port GigE switch? something supporting bonding/failover would be
a bonus but not a requirement.


Do you have bloated gig switches? It's usually problem the other way 
around, these devices typically have 128 kilobyte of buffer shared between 
all ports which is way too little.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Large decrease in speed needed to combat bufferbloat?

2016-10-20 Thread Mikael Abrahamsson

On Wed, 17 Aug 2016, Alec Robertson wrote:

I've managed to get bufferbloat under control, with only 3-4ms of added 
ping when downloading but I've had to set the ingress to 43000, reducing 
my speed not hugely but more than I might have expected.


I personally think that aiming for 3-4ms of bloat is excessive for the 
applications we see today. Most of the time you're not going to notice 
10-20ms bloat even when using quite time sensitive applications, and that 
10-20ms PDV range is probably a better tradeoff between performance and 
potential interactive performance downside.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] 22 seconds til bloat on gfiber?

2016-10-22 Thread Mikael Abrahamsson

On Sat, 22 Oct 2016, Dave Taht wrote:


http://www.dslreports.com/speedtest/5408767


What's the setup here? Someone has told me that Google Fiber is PON? So 
there is an ONT at the customer prem which takes fiber and hands off some 
kind of 1GBASE-T? What more?


Just trying to figure out what device has ~13-15 megabyte buffer so it can 
induce 1200ms buffer lag at 1 gigabit/s.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] 22 seconds til bloat on gfiber?

2016-10-26 Thread Mikael Abrahamsson

On Wed, 26 Oct 2016, Jan Ceuleers wrote:


What I mean is that the OLT optics become very expensive if you need to
support as many lambdas as you have customers. You'd furthermore need an
OLT port for much fewer customers (e.g. 1 port per 64 or 128 customers)
than the thousands you can support on a (shared) GPON port on a single
lambda.


That only works if your customers don't use their Internet access very 
much. If they do, you're in trouble and have to rebuild.


In my market, we're now in the access speeds where 100/10 is on the lower 
end of access, and it's not uncommon for people to have 250, 500 or 1000 
downstream. If they then actually start using their bw then you'd have to 
rebuild to either go higher speed for some CPE (complicated and 
expensive), or rebuild to have smaller splitter domains.


I guess the answer depends a lot on your cost of labour.

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] 22 seconds til bloat on gfiber?

2016-10-27 Thread Mikael Abrahamsson

On Thu, 27 Oct 2016, Dave Taht wrote:

interactive, once basic bandwidth needs are slaked, which starts to 
happen once you crack the largest typical load (which these days is 4k 
video streaming).


gbit fiber is *way* on the unneeded side of the demand curve for home users.


I can make up credible scenarios where a home with 4-5 people would need 
200-300 megabit/s of reliably available bandwidth, plus downloading 
something large, then you can make use of a gig. So not "way" unneeded.


I do have problems coming up with scenarios where you need more than a 
gig.



which kind of points out that you need business users to use it all up.


Business users actually use *less* bw than residential. People typically 
don't watch 4k video streams at work.


Btw, what does that report say? I don't want to spend money on it.

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [bbr-dev] Re: "BBR" TCP patches submitted to linux kernel

2016-11-02 Thread Mikael Abrahamsson

On Tue, 1 Nov 2016, Yuchung Cheng wrote:

We are curious why you choose the single-queued AQM. Is it just for the 
sake of testing?


Non-flow aware AQM is the most commonly deployed "queue management" on the 
Internet today. Most of them are just stupid FIFOs with taildrop, and the 
buffer size can be anywhere from super small to huge depending on 
equipment used and how it's configured.


Any proposed TCP congestion avoidance algorithm to be deployed on the 
wider Internet has to some degree be able to handle this deployment 
scenario without killing everything else it's sharing capacity with.


Dave Tähts testing case where BBR just kills Cubic makes me very 
concerned.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] fixing bufferbloat in 2017

2016-11-23 Thread Mikael Abrahamsson

On Tue, 22 Nov 2016, Dave Taht wrote:

I would like to see the industries most affected by bufferbloat - 
voip/videoconferencing/gaming,web gain a good recognition of the 
problem, how to fix it, and who to talk to about it (router makers and 
ISPs)


It would be great if the realtime communications people (gaming, video, 
audio etc) had some kind of help page where people could be pointed to 
understand the problem.


I saw a Youtube video btw, where they had problems with gaming because 
"I'm uploading a youtube video at the same time as I am gaming, stupid 
me". People don't even realise this is not the way it has to be.


My take on this is that the problem is fairly well understood in "our" 
circles, but the wider audience still doesn't know, and even if they know, 
there is nowhere to go to fix it.


If we can find a product that solves the gaming community problem (they're 
one of the people who have "ping" in their applications and who 
immediately notices when it's bad), we could perhaps approach someone 
prominent in that gaming community and making a video on how to solve the 
problem.


"Look here, I did  and now I can game and upload a youtube video at the 
same time without problemsoneoneone"


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] fixing bufferbloat in 2017

2016-11-23 Thread Mikael Abrahamsson

On Wed, 23 Nov 2016, Pedro Tumusok wrote:

If this something we should try, I can help out with the first point, 
but the second one probably needs local bufferbloat evangelists.


I am not worried about getting these people on board to show a solution.

I'm worried that we do not have a solution that is easily deployable for 
"normal" people. If someone has X/Y megabit/s Comcast Internet connection, 
what solution do we have to offer them? I can't think of one that actually 
solves the problem for real.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] fixing bufferbloat in 2017

2016-11-23 Thread Mikael Abrahamsson

On Wed, 23 Nov 2016, Benjamin Cronce wrote:

If there is a simple affordable solution, say Open/DD-WRT distro based 
bridge that all you do is configure your up/down bandwidth and it 
applies Codel/fq-Codel/Cake, then all you need to do is drive up 
awareness. A good channel for awareness would be getting in contact with 
popular Twitch or YouTube gaming streamers. But I wouldn't put much 
effort into driving up awareness until there is a device that people can 
easily acquire, use, and afford. At first I was thinking of telling 
people to use *-WRT supporting routers, but changing the firmware on 
your router requires too much research, and many people care about 
bleeding edge features. You need something that works in tangent with 
whatever they are using.


If Comcast sells you 100/20 (I have no idea if this is a thing), you set 
your upstream on this box to 18 meg fq_codel, and then Comcast 
oversubscribes you so you only get 15 meg up part of the time, then you're 
still bloated by the modem. This is not a solution.


I don't think "buy $thing, install *WRT on it, configure it like this" is 
above most gamers, but I'm afraid we don't even have a working solution 
for someone with that kind of skillset.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] fixing bufferbloat in 2017

2016-11-23 Thread Mikael Abrahamsson

On Wed, 23 Nov 2016, David Lang wrote:

Deploy what we already know to work on the real edge devices and things 
get vastly simpler.


Sure! Sounds Great. How?

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Fixing bufferbloat in 2017

2016-11-26 Thread Mikael Abrahamsson

On Sat, 26 Nov 2016, Aaron Wood wrote:

and call it a day.  And those BSPs are _ancient_.  I wouldn't be 
surprised to see 2.6 still coming out on new models, let alone 4.0.


Most seem to be on 3.2 and 3.4, but I've heard people say Broadcom now has 
BSP for 4.1.


However, since basically all high-speed devices use a hardware packet 
accelerator, even with newer kernels you might not get anti-bufferbloat 
benefit because these packet accelerators have their own buffer handling.


I might be in the position to test one of these broadcom 4.1 based devices 
in the next few months, I'll run some tests and report back if that 
happens.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] 22 seconds til bloat on gfiber?

2016-12-06 Thread Mikael Abrahamsson

On Wed, 7 Dec 2016, Jonathan Morton wrote:

That’s not to say it’s *impossible* to sell 4Gbps or 10Gbps connections. 
You could do it by bundling a multi-port switch with a sufficiently fast 
uplink port, and sell it as “a full gigabit for each of N computers”. 
The most obvious customers to target might be apartment complexes or 
entire villages, who could share such a connection over a large number 
of users and defray a relatively high installation cost.


I believe this is what Comcast is doing for their 2 gigabit/s service, and 
why Netgear released their X10 with SFP+ uplink.


I've been told Comcast does SFP+ handoff, and this device seems to be 
tailor made for use with such a service.


Otoh it seems that 2.5GE and 5GE is going to be a thing in the not so 
distant future, I've been told 2017 will see shipping products for this 
that will be at a better price point that 10GE is currently at (which 
means quite expensive).


So I imagine we'll be seeing high end "home routers" with built in L2 
switches that have 1/2.5/5GE support to cater for this market in 2017.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] TCP BBR paper is now generally available

2016-12-08 Thread Mikael Abrahamsson

On Fri, 2 Dec 2016, Dave Taht wrote:


http://queue.acm.org/detail.cfm?id=3022184


"BBR converges toward a fair share of the bottleneck bandwidth whether 
competing with other BBR flows or with loss-based congestion control."


That's not what I took away from your tests of having BBR and Cubic flows 
together, where BBR just killed Cubic dead.


What has changed since? Have you re-done your tests with whatever has 
changed, I must have missed that? Or did I misunderstand?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] TCP BBR paper is now generally available

2016-12-08 Thread Mikael Abrahamsson

On Thu, 8 Dec 2016, Dave Täht wrote:


drop tail works better than any single queue aqm in this scenario.


*confused*

I see nothing in the BBR paper about how it interoperates with other 
TCP algorithms. Your text above didn't help me at all.


How is BBR going to be deployed? Is nobody interested how it behaves in a 
mixed environment?


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] What does cablelabs certification actually do?

2016-12-08 Thread Mikael Abrahamsson

On Fri, 9 Dec 2016, jb wrote:

And then wondered why certification can't also include verification for 
correctly sized buffers as well?


There is nothing stopping this, and it's being worked on (PIE goes into 
DOCSIS 3.1).


http://www.cablelabs.com/wp-content/uploads/2014/06/DOCSIS-AQM_May2014.pdf
https://www.nanog.org/sites/default/files/20160922_Klatsky_First_Steps_In_v1.pdf

Cable Labs (as far as I understand) is an organisation funded by cable 
operators and vendors, and they create standards and tests used by the 
cable industry.


I don't know what tests Cable Labs perform, but there is nothing stopping 
them from validating buffers+AQM in the modems as well, and I do hope they 
do this going forward.


Why not reach out to Greg White who is mentioned in the cablelabs 
DOCSIS-AQM pdf above and ask? Or even better, invite him to this list if 
he's not already here. He's on the IETF AQM WG list, I have posts by him 
in my folder back to 2013.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


[Bloat] how much delay is too much delay

2017-01-13 Thread Mikael Abrahamsson


https://www.youtube.com/user/xFPxAUTh0r1ty

This channel analyses several online games and how they work networkwise. 
It seems online games typically "tick" at 30-60Hz in that the game server 
and user application communicates this often. 60Hz seems to be the "golden 
standard", and I guess resolution of 17ms is fine for when things are 
happening.


In gaming they have multiple delay components, one is "input delay" which 
relates to the time it takes from you for instance press the mouse button, 
until the game shows that it has responded by showing you result on 
screen. It seems this is typically 40-60ms, because the game needs to 
handle the input, send data to the graphics card, which needs to render 
it, and then it needs to be sent to the monitor. There are of course a lot 
more than this, but you get the idea.


I don't know what the delay is from mouse-click to when the game knows you 
clicked, and then can send out this information to the game server, but 
from what I'm guessing from reading up on the topic, this is in the "less 
than 10ms" range. So theoretically, the game can send an update to the 
game server much quicker than it can display on the local screen.


Another data point for instance for the game "Rocket League", is that the 
highest ranking players have a hard time playing effectively when the 
user-to-game server "ping" is more than approximately 100ms. I don't know 
if this is RTT, but considering they're getting around 130ms from a user 
in Texas to a server in Europe, it seems reasonable that this is RTT.


My reason for bringing this up (again) in the bloat forum, is that these 
people are exactly the kind of people who are very sensitive to problems 
that "anti-bloat" solves. If we can come up with a solution that makes it 
less likely that these people will get "ping spikes" etc, and we can 
package up something that actually solves this (preferrably something they 
can go to the store and buy outright), this would be a great way to 
"market" it. I'm quite sure they'd be interested in making videos about it 
to make more people aware of the problem.


There are multiple "gaming routers" out there, with "QoS". I have no idea 
what this "QoS" does. If anyone knows, I'd be very interested in knowing 
more.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] how much delay is too much delay

2017-01-13 Thread Mikael Abrahamsson

On Fri, 13 Jan 2017, Jesper Dangaard Brouer wrote:

I love the way he measures the delay by recording the screen with a high 
speed camera, and then correlate mouse-button activation by a visual 
red-blink (some PC-local setup/app) and counting the frames until the 
movement happen in the game.


He actually has an LED connected to the mouse itself, so the red blink is 
when the electrical circuit is closed by the mouse button press.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] emulating non-duplex media in linux qdiscs

2017-10-10 Thread Mikael Abrahamsson

On Mon, 9 Oct 2017, Dave Taht wrote:

Saying that is half duplex, doesn't work for me. In their example of 
"half duplex", (using push to talk), it still means that everybody on 
that channel hears who is talking. "half duplex" to me, given the 
definition of duplex, means more that there is a *p2p* channel (a wire), 
that you can ping pong data across.


A 10base-T hub connected to a 10base-2 or 10base-5 segment, all in the 
same broadcast domain, is considered to be "half duplex" in ethernet port 
configuration term.


So it doesn't have to be p2p. And I do think this mimics a shared radio as 
well (because a coax wire with multiple nodes on it seems very similar to 
a radio channel over the air).


Now, radio has the difference that two stations might not hear each other, 
and that's of course a problem in CSMA/CD terms.


Back to your netem problem. What you need is to force all packets through 
the same queue, right? So I tried to dream up a complicated scheme with 4 
bridges and some kind of "forced forwarding", but I don't think it'd pan 
out.


So the best way is probably to have a shaper that feeds 
transmit-tokens/does scheduling into two different shapers (rx/tx on the 
same interface). So whatever scheduling they are fed in order to tell them 
the rate they're allowed to transmit, they get it from the same source. 
That way they have to compete for the same resources.


This will not perfectly mimic the exponential backoff of CSMA/CD, but it 
might be good enough for what you need? Also, I just realised I have no 
idea how wifi is scheduled. Is it even close to CSMA/CD?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] keyboard bloat

2017-11-24 Thread Mikael Abrahamsson

On Fri, 24 Nov 2017, Dave Taht wrote:


https://danluu.com/keyboard-latency/


This is very interesting.

This guy is doing "button to pixel" delay testing:

https://www.youtube.com/watch?v=4GnKsqDAmgY

At 2:25 he's also talking about the total chain of events that needs to 
happen between input and when you see something on the screen.


I wish there was more focus and testing on these kinds of things, then 
perhaps we'd also get more focus on bufferbloat in the network.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] benefits of ack filtering

2017-11-28 Thread Mikael Abrahamsson

On Tue, 28 Nov 2017, Dave Taht wrote:


Recently Ryan Mounce added ack filtering cabilities to the cake qdisc.

The benefits were pretty impressive at a 50x1 Down/Up ratio:

http://blog.cerowrt.org/post/ack_filtering/

And quite noticeable at 16x1 ratios as well.

I'd rather like to have a compelling list of reasons why not to do
this! And ways to do it better, if not. The relevant code is hovering
at:

https://github.com/dtaht/sch_cake/blob/cobalt/sch_cake.c#L902


Your post is already quite comprehensive when it comes to downsides.

The better solution would of course be to have the TCP peeps change the 
way TCP works so that it sends fewer ACKs. I don't want middle boxes 
making "smart" decisions when the proper solution is for both end TCP 
speakers to do less work by sending fewer ACKs. In the TCP implementations 
I tcpdump regularily, it seems they send one ACK per 2 downstream packets.


At 1 gigabit/s that's in the order of 35k pps of ACKs (100 megabyte/s 
divided by 1440 divided by 2). That's in my opinion completely ludicrous 
rate of ACKs for no good reason.


I don't know what the formula should be, but it sounds like the ACK 
sending ratio should be influenced by how many in-flight ACKs there might 
be. Is there any reason to have more than 100 ACKs in flight at any given 
time? 500? 1000?


My DOCSIS connection (inferred through observation) seems to run on 1ms 
upstream time slots, and my modem will delete contigous ACKs at 16 or 32 
ACK intervals, ending up running at typically 1-2 ACKs per 1ms time slot. 
This cuts down the ACK rate when I do 250 megabit/s downloads from 5-8 
megabit/s to 400 kilobit/s of used upstream bw.


Since this ACK reduction is done on probably hundreds of millions of 
fixed-line subscriber lines today, what arguments do designers of TCP have 
to keep sending one ACK per 2 received TCP packets?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] benefits of ack filtering

2017-11-29 Thread Mikael Abrahamsson

On Wed, 29 Nov 2017, Sebastian Moeller wrote:

Well, ACK filtering/thinning is a simple trade-off: redundancy versus 
bandwidth. Since the RFCs say a receiver should acknoledge every second 
full MSS I think the decision whether to filter or not should be kept to


Why does it say to do this? What benefit is there to either end system to 
send 35kPPS of ACKs in order to facilitate a 100 megabyte/s of TCP 
transfer?


Sounds like a lot of useless interrupts and handling by the stack, apart 
from offloading it to the NIC to do a lot of handling of these mostly 
useless packets so the CPU doesn't have to do it.


Why isn't 1kPPS of ACKs sufficient for most usecases?

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] benefits of ack filtering

2017-11-29 Thread Mikael Abrahamsson

On Wed, 29 Nov 2017, Luca Muscariello wrote:


Why does it say to do this? What benefit is there to either end system to
send 35kPPS of ACKs in order to facilitate a 100 megabyte/s of TCP transfer?


Did you check RFC 3449 ?
https://tools.ietf.org/html/rfc3449#section-5.2.1


RFC3449 is all about middleboxes doing things.

I wanted to understand why TCP implementations find it necessary to send 
one ACK per 2xMSS at really high PPS. Especially when NIC offloads and 
middleboxes frequently strip out this information anyway so it never 
reaches the IP stack (right?).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] benefits of ack filtering

2017-11-30 Thread Mikael Abrahamsson

On Thu, 30 Nov 2017, Eric Dumazet wrote:

I agree that TCP itself should generate ACK smarter, on receivers that 
are lacking GRO. (TCP sends at most one ACK per GRO packets, that is why 
we did not feel an urgent need for better ACK generation)


Could you elaborate a bit more on the practical implications of the above 
text? What is the typical GRO size used when doing gigabit ethernet 
transmissions?


So if we're receiving 70kPPS of 1500 byte packets containing 1460 MSS 
sized packet (~100 megabyte/s), what would a typical ACK rate be in that 
case?


In response to some other postings here, my question regarding "is 35kPPS 
really needed" my proposal is not "let's send 50 PPS of ACKs". My proposal 
is if we can't come up with a smarter algorithm than something from the 
90ties that says "let's send one ACK per 2*MSS" when we today have 
magnitudes higher rates of forwarding. Also, on for instance DOCSIS 
networks then you're going to get several ACKs back-to-back anyway 
(because if they're not pruned by the DOCSIS network, they're anyway sent 
in "bursts" within a single DOCSIS transmit opportunity), so imagining 
that 35kPPS gives you higher resolution than 1kPPS of ACKs is just an 
illusion.


So if GRO results in (I'm just speculating here) "we're only sending one 
ACK per X kilobytes received if the packets arrived in the same 
millisecond" and X is in the 16-64 kilobyte range, then that's fine by me.


Any network worth anything should be able to smooth out "bursts" of 16-64 
kilobytes at line rate anyway, in case of egress and the line rate there 
is lower than the sending end is transmitting packets at.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Bufferbloat in high resolution + non-stationarity

2017-11-30 Thread Mikael Abrahamsson

On Thu, 30 Nov 2017, Jonathan Morton wrote:

I submit that to provide *deployable* QoS schemes, you must either solve 
the classification problem elegantly (which is a Hard Problem), or else 
show that your scheme works adequately in the absence of classification. 
I'm taking the latter approach with Cake, even though it *also* supports 
Diffserv awareness to enhance its performance where classification is 
straightforward.


In IETF INT-AREA, there is now discussion about allocating a new diffserv 
codepoint for "less-than-best-effort" traffic. I have been advocate for 
this for quite a while, and I actually believe that this is incrementally 
deployable and has a chance to actually get ISP buy-in.


The idea is to use TOS 0, but use the last 3 diffserv bits to indicate 
that this is less-than-BE. Non-implementing networks will treat this as 
BE, implementing networks can use some kind of DRR scheme to give this 
traffic less bandwidth in case of congestion, or just drop it earlier when 
there is queue buildup.


I think this is the only chance we have to get internet-wide coordination 
for a diffserv codepoint that people will do anything with, and the 
recommendation should be to only act on this at the customer access line 
(the one connecting the ISP to the residential gateway) or perhaps within 
the customer network. The hope is that ISPs will not mangle/bleach this 
codepoint, because it actually indicates traffic should get lower 
priority, not higher.


I am in complete agreement with you that any scheme that relies on 
Internet-wide QoS scheme based on diffserv/TOS is a no-go. No ISP will 
listen to this and act on it, as it's a DoS vector.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Make-wifi-fast] benefits of ack filtering

2017-12-03 Thread Mikael Abrahamsson

On Sun, 3 Dec 2017, Juliusz Chroboczek wrote:


As far as I know, DOCSIS has an asymmetry factor that is between 4 and 10,
depending on the deployment.  With worst case asymmetry being 10, this


I can buy 300/10 megabit/s access from my cable provider. So that's a lot 
worse. My cable box has 16 downstream channels, and 4 upstream ones. Each 
channel is TDM based, and there is some kind of scheduler granting sending 
opportunities for each channel to each modem, as needed. I'm not a DOCSIS 
expert.



means that you can send an Ack for every data packet with 400 byte data
packets, every second data packet with 200 byte data packets.  If the
asymmetry is a more reasonable 4, then the figures are 100 and 50
respectively.

Try as I might, I fail to see the problem.  Are we advocating deploying
TCP-aware middleboxes, with all the problems that entails, in order to
work around a problem that doesn't exist?


If I understand correctly, DOCSIS has ~1ms sending opportunities upstream. 
So sending more than 1kPPS of ACKs is meaningless, as these ACKs will just 
come back to back at wire-speed as the CMTS receives them from the modem 
in chunks. So instead, the cable modem just deletes all the sequential 
ACKs and doesn't even send these back-to-back ones.


LTE works the same, it's also frequency divided and TDM, so I can see the 
same benefit there of culling sequential ACKs sitting there in the buffer. 
I don't know if this is done though.


I've seen people I think are involved in TCP design. They seem to be under 
the impression that more ACKs give higher resolution and granularity to 
TCP. My postulation is that this is commonly false because of how the 
network access is designed and how also the NICs are designed (the 
transmit/receive offloading). So sending 35kPPS of ACKs for a gigabit/s 
transfer is just inefficient and shouldn't be done. I would prefer if end 
points would send less ACKs instead of the network killing them.


And the network does kill them, as we have seen. Because any novice 
network access technology designer can say "oh, having 16 sequential ACKs 
here in my buffer, sitting waiting to get sent, is just useless 
information. Let's kill the 15 first ones."


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-04 Thread Mikael Abrahamsson

On Sun, 3 Dec 2017, Dave Taht wrote:

What Jesper's been working on for ages has been to try and get linux's 
PPS up for small packets, which last I heard was hovering at about 
4Gbits.


You might want to look into what the VPP (https://fd.io/) peeps are doing. 
They can at least forward packets at pretty impressive rates. 200Mpps zero 
frame loss with 2M FIB, limited to NIC and PCIe, not CPU (on many-core 
machine).


I have never thought there was much of a market for gbit to or from the 
home. 40Mbits is enough for nearly everybody until > 4k video with 
smellovision and tactile feedback become a standard.


I'd say the sweet spot right now is in the 100-250 megabit/s range, 
considering "cost of production" and "what do people need/use". This means 
it still can be done on 1 gigabit/s access links.


Anything faster than 1GE is going to be significantly more expensive than 
1GE because 1GE is "good enough for most" when it comes to hundreds of 
millions of households for their inter/intra home need. Also for SME use, 
1GE is good enough for a lot of use cases.


I personally now have 250/50 which is good enough for me, and I don't want 
to pay 2x my current MRC to get 1000/100. However, if I had to downgrade 
to 30 megabit/s I would most certinaly notice it, and in my market that 
would just be a 20-30% saving which definitely isn't worth it.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-04 Thread Mikael Abrahamsson

On Mon, 4 Dec 2017, Joel Wirāmu Pauling wrote:


I'm not going to pretend that 1Gig isn't enough for most people. But I
refuse to believe it's the networks equivalent of a 10A power (20A
depending on where you live in the world) AC residential phase
distribution circuit.


That's a good analogy. I actually believe it is, at least for the near 
5-10 years.



This isn't a question about what people need, it's more about what the
market can deliver. 10GPON (GPON-X) and others now make it a viable
service that can and is being deployed in residential and commercial
access networks.


Well, you're sharing that bw with everybody else on that splitter. Sounds 
to me that the service being delivered over that would instead be in the 
2-3 gigabit/s range for the individual subscriber (this is what I 
typically see on equivalent shared mediums, that the top speed individual 
subscriptions are will be in the 20-40% of max theoretical speed the 
entire solution can deliver).


The problem is now that Retail Servicer Provider X can deliver a post 
Gigabit service... what is capable of taking it off the ONU/CMNT point 
in the home? As usual it's a follow the money question, once RSP's can 
deliver Gbit+ they will need an ecosystem in the home to feed into it, 
and right now there isn't a good technology platform that supports it; 
10GBase-X/10GBaseT is a non-starter due to the variability in home 
wiring - arguably the 7 year leap from 100-1000mbit was easy It's mean a 
gap of 12 years and counting for the same.. it's not just the NIC's and 
CPU's in the gateways it's the connector and in-home wiring problems as 
well.


As soon as one goes above 1GE, prices increases A LOT on everything 
involved. I doubt we'll see any 2.5G or higher speed equipment in wide use 
in home/SME in the next 5 years.



Blatant Plug - request :
I'm interested to hear opinions on this as I have a talk on this very
topic 'The long and Winding Road to 10Gbit+ in the home'
https://linux.conf.au/ at Linuxconf in January. In particular if you
have any home network gore/horror stories and photos you would be
happy for me to include in my talk, please include.


I am still waiting for a decently priced 10GE switch. I can get 1GE 
24port managed ones, fanless, for 100-200USD. As soon as I go 10GE, price 
jumps up a lot, and I get fans. The NICs aren't widely available, even 
though they're not the biggest problem. My in-house cabling can do 10GE, 
but I guess I'm an outlier.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-04 Thread Mikael Abrahamsson

On Mon, 4 Dec 2017, Joel Wirāmu Pauling wrote:


How to deliver a switch, when the wiring and port standard isn't
actually workable?


Not workable?


10GBase-T is out of Voltage Spec with SFP+ ; you can get copper SFP+


Yep, the "Cu SFP" was a luxury for a while. Physics is harsh mistress 
though.



but they are out of spec... 10GbaseT doesn't really work over Cat5e
more than a couple of meters (if you are lucky) and even Cat6 is only
rated at 30M... there is a reason no-one is producing Home Copper
switches and it's not just the NIC Silicon cost (that was a factor
until Recently obviously, but only part of the equation).


I have CAT6 in my home, and not more than 30 meters anywhere. So it would 
work for me. You need CAT6e for 100M, so anyone doing new installs should 
use that. Stiff cable, though.



On the flip side:
Right now I am typing this via a 40gbit network, comprised of the
cheap and readily available Tb3 port - it's daisy chained and limited
to 6 ports, but right now it's easily the cheapest and most effective
port. Pitty that the fabled optical tb3 cables are damn expensive...
so you're limited to daisy-chains of 2m. They seem to have screwed the
pooch on the USB-C network standard quite badly - which looked so
promising, so for the moment Tb3 it is for me at least.


With that distance, you could probably run 10GE over CAT3 wiring. So there 
is a reason 10GE requires more for longer distances, because it's bad 
cable so instead you need lots of power and DSPs to figure out what's 
going on.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-04 Thread Mikael Abrahamsson

On Mon, 4 Dec 2017, Pedro Tumusok wrote:


Looking at chipsets coming/just arrived from the chipset vendors, I think
we will see CPE with 10G SFP+ and 802.11ax Q3/Q4 this year.
Price is of course a bit steeper than the 15USD USB DSL modem :P, but
probably fits nicely for the SMB segment.


https://kb.netgear.com/31408/What-SFP-modules-are-compatible-with-my-Nighthawk-X10-R9000-router

This has been available for a while now. Only use-case I see for it is 
Comcast 2 gigabit/s service, that's the only one I know of that would fit 
this product (since it has no downlink 10GE ports).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-07 Thread Mikael Abrahamsson

On Mon, 4 Dec 2017, dpr...@reed.com wrote:

I suggest we stop talking about throughput, which has been the mistaken 
idea about networking for 30-40 years.


We need to talk both about latency and speed. Yes, speed is talked about 
too much (relative to RTT), but it's not irrelevant.


Speed of light in fiber means RTT is approx 1ms per 100km, so from 
Stockholm to SFO my RTT is never going to be significantly below 85ms 
(8625km great circle). It's current twice that.


So we just have to accept that some services will never be deliverable 
across the wider Internet, but have to be deployed closer to the customer 
(as per your examples, some need 1ms RTT to work well), and we need lower 
access latency and lower queuing delay. So yes, agreed.


However, I am not going to concede that speed is "mistaken idea about 
networking". No amount of smarter queuing is going to fix the problem if I 
don't have enough throughput available to me that I need for my 
application.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] benefits of ack filtering

2017-12-13 Thread Mikael Abrahamsson

On Wed, 13 Dec 2017, Jonathan Morton wrote:

the uplink shaper is set to about a fiftieth of that.  I seriously doubt 
that DOCSIS is ever inherently that asymmetric.


Well, the products are, because that's what the operators seems to want, 
probably also because that's what the customers demand.


So my modem has 16x4 (16 downstream channels and 4 upstream channels), 
meaning built into the hardware, I have 1/4 split.


Then providers typically (this is my understanding, I haven't worked 
professionally with DOCSIS networks) do is they have 24 downstream 
channels and 4 upstream channels. Older modems can have 8 downstream and 4 
upstream for instance, so they'll "tune" to the amount of channels they 
can, and then there is an on-demand scheduler that handles upstream and 
downstream traffic.


So I guess theoretically the operator could (if large enough) make a hw 
vendor create a 16x16 modem and have 32 channels total. But nobody does 
that, because that doesn't sell as well as having more downstream (because 
people don't seem to care about upstream). It just makes more market sense 
to sell these asymmetric services, because typically people are eyeballs 
and they don't need a lot of upstream bw (or think they need it).


On the ADSL side, I have seen 28/3 (28 down, 3 up) for annex-M with 
proprietary extensions. The fastest symmetric I have seen is 4.6/4.6. So 
if you as an operator can choose between selling a 28/3 or 4.6/4.6 
service, what will you do? To consumers, it's 28/3 all day.


So people can blame the ISPs all day long, but there is still (as you 
stated) physical limitations on capacity on RF spectrum in air/copper, and 
you need to handle this reality somehow. If a lot of power is used 
upstream then you'll get worse SNR for the downstream, meaning less 
capacity overall. Symmetric access capacity costs real money and results 
in less overall capacity unless it's on point to point fiber.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-13 Thread Mikael Abrahamsson

On Wed, 13 Dec 2017, Jonathan Morton wrote:

Occasionally, of course, practically everyone in the country wants to 
tune into coverage of some event at the same time.  More commonly, they 
simply get home from work and school at the same time every day.  That 
breaks the assumptions behind pure statistical multiplexing, and 
requires a greater provisioning factor.


Reasonable operators have provisioning guidelines that look at actual 
usage, although they probably look at it in 5 minute averages and not 
millisecond as done here in this context.


So they might say "if busy hour average is over 50% 3 days in a week" this 
will trigger a provisioning alarm for that link, and the person (or 
system) will take a more detailed look and look at 5minute average graph 
and decide if this needs to be upgraded or not.


For me the interesting point is always "what's going on in busy hour of 
the day" and never "what's the monthly average transferred amount of 
data".


Of course, this can hide subsecond bufferbloat extremely well (and has), 
but at least this is typically how statistical overprovisioning is done. 
You look at actual usage and make sure your network is never full for any 
sustained amount of time, in normal operation, and make sure you perform 
upgrades well before the growth has resulted in network being full.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-14 Thread Mikael Abrahamsson

On Wed, 13 Dec 2017, Jonathan Morton wrote:

Ten times average demand estimated at time of deployment, and struggling 
badly with peak demand a decade later, yes.  And this is the 
transportation industry, where a decade is a *short* time - like less 
than a year in telecoms.


I've worked in ISPs since 1999 or so. I've been at startups and I've been 
at established ISPs.


It's kind of an S curve when it comes to traffic growth, when you're 
adding customers you can easily see 100%-300% growth per year (or more). 
Then after market becomes saturated growth comes from per-customer 
increased usage, and for the past 20 years or so, this has been in the 
neighbourhood of 20-30% per year.


Running a network that congests parts of the day, it's hard to tell what 
"Quality of Experience" your customers will have. I've heard of horror 
stories from the 90ties where a then large US ISP was running an OC3 (155 
megabit/s) full most of the day. So someone said "oh, we need to upgrade 
this", and after a while, they did, to 2xOC3. Great, right? No, after that 
upgrade both OC3:s were completely congested. Ok, then upgrade to OC12 
(622 megabit/s). After that upgrade, evidently that link was not congested 
a few hours of the day, and of course needed more upgrades.


So at the places I've been, I've advocated for planning rules that say 
that when the link is peaking at 5 minute averages of more than 50% of 
link capacity, then upgrade needs to be ordered. This 50% number can be 
larger if the link aggregates larger number of customers, because 
typically your "statistical overbooking" varies less the more customers 
participates.


These devices do not do per-flow anything. They might have 10G or 100G 
link to/from it with many many millions of flows, and it's all NPU 
forwarding. Typically they might do DIFFserv-based queueing and WRED to 
mitigate excessive buffering. Today, they typically don't even do ECN 
marking (which I have advocated for, but there is not much support from 
other ISPs in this mission).


Now, on the customer access line it's a completely different matter. 
Typically people build with BRAS or similar, where (tens of) thousands of 
customers might sit on a (very expensive) access card with hundreds of 
thousands of queues per NPU. This still leaves just a few queues per 
customer, unfortunately. So these do not do per-flow anything either. This 
is where PIE comes in, because these devices like these can do PIE in the 
NPU fairly easily because it's kind of like WRED.


So back to the capacity issue. Since these devices typically aren't good 
at assuring per-customer access to the shared medium (backbone links), 
it's easier to just make sure the backbone links are not regularily full. 
This doesn't mean you're going to have 10x capacity all the time, it 
probably means you're going to be bouncing between 25-70% utilization of 
your links (for the normal case, because you need spare capacity to handle 
events that increase traffic temporarily, plus handle loss of capacity in 
case of a link fault). The upgrade might be to add another link, or a 
higher tier speed interface, bringing down the utilization to typically 
half or quarter of what you had before.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-17 Thread Mikael Abrahamsson

On Sun, 17 Dec 2017, Matthias Tafelmeier wrote:

What I actually wanted to posit in relation to that is that one could 
get sooner a c-cabable backbone sibling by marrying two ideas: the 
airborne concept ongoing as outlined plus what NASA is planning to bring 
about for the space backbone, e.g [1][2]. It's laser based instead of 
directed radio-wave only. Sure, both is in the speed range of c, 
apparantely, laser transmission has in addition a significantly higher 
bandwidth to offer. "10 to 100 times as much data at a time as 
radio-frequency systems"[3]. Attenuations to photons in clean 
atmospheric air are neglible (few mps - refractive index of about 
1.0003), so actually a neglible slowdown - easily competing with top 
notch fibres (99.7% the vacuum speed of light). Sure, that's the ideal 
case, though, if cleverly done from the procurement of platforms and 
overall system steering perspective, might feasible.


Todays laser links are in the few km per hop range, with is easily at 
least one magnitude shorter than radio based equivalents.


I don't know the physics behind it, but people who have better insight 
than I do tell me "it's hard" to run longer hops (if one wants any kind of 
high bitrate).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-18 Thread Mikael Abrahamsson
 only a 40ms delta
between the pre-hop and hitting my ISP, where it was normally about 11ms
for that link. You could say about 30ms of buffering was going on. The
really interesting thing is I was only getting about 5-10Mb/s, which means
there was virtually zero free bandwidth. but I had almost no packet-loss. I
called my ISP shortly after the issue started and that's when they told me
they were under a DDOS and were at 100% trunk, and they said they were
going to have their trunk bandwidth increased shortly. 5 minutes later, the
issue was gone. About 30 minutes later I was called back and told the DDOS
was still on-going, they just upgraded to enough bandwidth to soak it all.
I found it very interesting that a DDOS large enough to effectively kill
95% of my provisioned bandwidth and increase my ping 30ms over normal, did
not seem to affect packet-loss almost at all. It was well under 0.1%. Is
this due to the statistical nature of large links or did Level 3 have an
AQM to my ISP?


This is interesting. I thought about this for several minutes, but can't 
come up with an explanation to this behaviour, at least not from the 
typical kind of DDOS that's going around. If there was some kind of ddos 
mitigration equipment put into the mix, that might explain what you were 
seeing.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The Blind Men and the Elephant.

2018-02-14 Thread Mikael Abrahamsson

On Mon, 12 Feb 2018, Dave Taht wrote:


but to me the simpler thing would be to garner folk to ask at
vendor/isp press conferences: "Have you implemented RFC8290 yet? If
not, when?"


Has anyone implemented FQ_CODEL in a packet accelerator, or is this still 
a CPU thing only?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348

2018-04-03 Thread Mikael Abrahamsson

On Tue, 3 Apr 2018, Jonathan Morton wrote:

notwithstanding).  In the end, people have kept reinventing "reliable 
datagram" protocols on top of UDP, whenever they ran up against 
requirements that TCP didn't fulfil.


Yes, for multiple reasons. TCP is ossified and typically lives in the OS, 
because of NAT the only options for protocols that work are TCP and UDP, 
so if you want to move your "transmission stack" to userspace, your only 
choice is UDP. So enter things like QUIC and others that are mux:ed stream 
protocols over UDP, which can then live in userland on all major operating 
systems.


This is not ideal, but it's not strange that this is happening. The only 
way to innovate as an application/protocol developer is to use UDP.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348

2018-04-04 Thread Mikael Abrahamsson

On Tue, 3 Apr 2018, Michael Welzl wrote:

Sure, when you’re in control of both ends of a connection, you can build 
whatever you want on top of UDP - but there’s a lot of wheel 
re-inventing there. Really, the transport layer can’t change as long as 
applications (or their libraries) are exposed to only the services of 
TCP and UDP, and thereby statically bound to these transport protocols.


I'm aware of TAPS and I have been trying to gather support for this kind 
of effort for years now, and I'm happy to see there is movement. I have 
also heard encouraging talk from several entities interested in actually 
doing serious work in this area, including some opensourcing part of their 
now non-FOSS code-base as part of that work.


So we need applications to be able to get more access to what's going on 
the wire, including access to non-TCP/UDP, but also to be able to create 
"pluggable TCP-stacks" so that a host can have several different ones, and 
the user can install new ones even on older operating systems.


With more and more IPv6 around, I hope we'll be able to deploy new 
protocols that are not TCP/UDP (A+P), and that this will bring back some 
innovation in that area.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348

2018-04-04 Thread Mikael Abrahamsson

On Wed, 4 Apr 2018, Dave Taht wrote:


How dead is posix these days? Ietf does not generally do apis well.


POSIX nowadays is

http://pubs.opengroup.org/onlinepubs/9699919799/

My take on it is that the IETF should not be scared to do APIs, even 
though there is a lot of resistance still.


However, the IETF should not do POSIX APIs, but instead something of their 
own.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348

2018-04-04 Thread Mikael Abrahamsson

On Wed, 4 Apr 2018, Michael Welzl wrote:

well - they have been refusing too long to do them at all. i guess 
that’s part of the problem


It's not about refusing to do so, it's because other SDOs have told the 
IETF not to. If IETF tries to touch POSIX, the SDO that does POSIX doesn't 
appreciate this.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348

2018-04-04 Thread Mikael Abrahamsson

On Wed, 4 Apr 2018, Luca Muscariello wrote:

And yes, flow queueing, absolutely. Flow isolation, becomes fundamental 
is such a zoo, or jungle.


There was talk in IETF about a transport protocol that was proposed to do 
a lot of things TCP doesn't do, but still retain some things that has been 
useful with TCP.


I think it was this one:

https://datatracker.ietf.org/doc/draft-ietf-nvo3-gue/

I'd like to see it not over UDP, but rather as a native IP protocol. The 
talk was about having the network being able to look into the state 
machine of the protocol (MSS size, equivalent of SYN, etc) but not into 
payload (which would be end-to-end encrypted). It would also be able to do 
muxed streams/message based to avoid head-of-line-blocking because of 
single packet loss.


So any of this that comes up then the whole FQ machinery might benefit 
frmo being able to identify flows in any new protocol, but I imagine this 
is not a hard thing to do. I still have hopes for the flow label in IPv6 
to do this job, even though it hasn't seen wide adoption so far.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Fwd: [Bug 1436945] Re: devel: consider fq_codel as the default qdisc for networking

2018-06-05 Thread Mikael Abrahamsson

On Tue, 5 Jun 2018, Jonas Mårtensson wrote:


What about PLPMTU?  Do you think they might tweak that too?

 net.ipv4.tcp_mtu_probing=2
 (despite name, applies to IPv6 too)



Maybe, suggest it on their github. But I would maybe propose instead
net.ipv4.tcp_mtu_probing=1.


MTU probing would be awsome. I am great fan of PLPMTU and this should be 
default-on everywhere in all protocols.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Make-wifi-fast] Van Jacobson's slides on timing wheels at netdevconf

2018-07-23 Thread Mikael Abrahamsson

On Sat, 21 Jul 2018, Jonathan Morton wrote:

An example of such a situation would be sparse flows in DRR++, which is 
a key part of fq_codel and Cake.  So to implement DRR++ using timing 
wheels, you have to choose your scheduling horizon carefully so as to 
minimise the delay to sparse packets.


At the spring IETF, there was talk from IEEE person about using ethernet 
pause frames to get senders to stop talking for a while. My understanding 
was that this was on microsecond scale or even nanosecond time scales.


One of the mentions in the presentation was on slide 10 about 
"fat-buffered router". In the data center, these are kind of going away, 
because on-die memory is small and rates are high. A 64x100GE forwarding 
asic might have 16MB of buffer, which is very little buffer for the kind 
of bit rates we're talking here.


https://www.youtube.com/watch?v=sJMvAqEQCBE 1h44m in (proposed IEEE 
802.1Qcz work) is the one I am thinking of.


Wonder how this would interact with the timing wheel proposed by Van 
Jacobson?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Flow offload's impact on bufferbloat

2018-08-17 Thread Mikael Abrahamsson

On Fri, 10 Aug 2018, Rosen Penev wrote:


My question is not really how to fix it. I already know that. I just
got the feeling that bypassing parts of the linux network stack would
result in less buffering.


On the OpenWrt configuration page for the "software flow offload":

"Experimental feature. Not fully compatible with QoS/SQM."

I don't know exactly what it does, it reduces amount of CPU cycles needed 
to forward packets in an already established flow it seems, but I'd 
imagine that it might very well bypass some of the scheduling code which 
could explain what you're seeing. So you might get faster forwarding but 
less AQM.


So if your device isn't fast enough to keep up with your total Internet 
access speed, then this might be a good thing. If your device is faster 
than what's needed, then you'd better spend the cycles on getting good AQM 
instead of freeing up more CPU that isn't used for anything anyway.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] beating the drum for BQL

2018-08-22 Thread Mikael Abrahamsson

On Wed, 22 Aug 2018, Dave Taht wrote:

I/we really should have beat the bql drum harder over the last 6 years. 
It's the basic start to all the debloating.


It only helps with kernel based forwarding. A lot of devices don't even 
use this, especially as speeds go up. They use packet accelerators so the 
kernel never sees the packets after initial flow setup.


So you need to get the people developing that silicon to get with the 
program.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] beating the drum for BQL

2018-08-23 Thread Mikael Abrahamsson

On Thu, 23 Aug 2018, Pete Heist wrote:




On Aug 23, 2018, at 2:49 AM, Dave Taht  wrote:

I had a chance to give a talk at broadcom recently, slides here:

http://flent-fremont.bufferbloat.net/~d/broadcom_aug9.pdf 
<http://flent-fremont.bufferbloat.net/~d/broadcom_aug9.pdf>


Thanks for sharing, this is really useful, raising awareness where it matters. 
Quite a bit of content... :)

Ubiquiti needs some work getting this into more of their products (EdgeMAX in 
particular). A good time to lobby for this might be, well a couple months ago, 
as they’re producing alpha builds for their upcoming 2.0 release with kernel 
4.9 and new Cavium/Mediatek/Octeon SDKs. I just asked about the status in the 
EdgeRouter Beta forum, in case it finds the right eyes before the release:

https://community.ubnt.com/t5/EdgeRouter-Beta/BQL-support/m-p/2466657 
<https://community.ubnt.com/t5/EdgeRouter-Beta/BQL-support/m-p/2466657>

https://community.ubnt.com/t5/EdgeMAX-Beta-Blog/New-EdgeRouter-firmware-2-0-0-alpha-2-has-been-released/ba-p/2414938
 
<https://community.ubnt.com/t5/EdgeMAX-Beta-Blog/New-EdgeRouter-firmware-2-0-0-alpha-2-has-been-released/ba-p/2414938>


My only experience with these devices is the Edgerouter 3/5/X, and they 
have very low performance if you disable offloads (which you need to do to 
enable AQM) and run everything in CPU, around 100 megabit/s of 
uni-directional traffic.


Do they have other platforms where this would actually matter?


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] beating the drum for BQL

2018-08-23 Thread Mikael Abrahamsson

On Thu, 23 Aug 2018, Sebastian Moeller wrote:

router should be able to handle at least the sold plan's bandwidth with 
its main CPU...)


There is exactly one SoC on the market that does this, and that's Marvell 
Armada 385, and it hasn't been very successful when it comes to ending up 
in these kinds of devices. It's mostly ended up in NASes and devices such 
as WRT1200AC, WRT1900ACS, WRT3200AC.


	Sure doing less/ a half asses job is less costly than doing it 
right, but in the extreme not doing the job at all saves even more 
energy ;). And I am not sure we are barking up the right tree here, it 
is not that all home CPE are rigorously optimized for low power and 
energy saving... my gut feeling is that the only optimizing principle is 
cost for the manufacturer/OEM and that causes underpowered CPU that are 
packet-accerlerated"-doped to appear to be able to do their job. I might 
be wrong though, as I have ISP internal numbers on this issue.


The CPU power and RAM/flash has crept up a lot in the past 5 years because 
other requirements in having the HGW support other applications than just 
being a very simple NAT44+wifi router.


Cost is definitely an optimization, and when you're expected to have a 
price-to-customer including software in the 20-40 EUR/device range, then 
the SoC can't cost much. There has also been a lot of vendor lock-in.


But now speeds are creeping up even more, we're now seeing 2.5GE and 10GE 
platforms, which require substantial CPU power to do forwarding. The Linux 
kernel is now becoming the bottleneck in the forwarding, not even on a 
3GHz Intel CPU is it possible to forward even 10GE using the normal Linux 
kernel path (my guess right now is that this is due to context switching 
etc, not really CPU performance).


Marvell has been the only one to really aim for lots of CPU performance in 
their SoC, there might be others now going the same path but it's also a 
downside if the CPU becomes bogged down with packet forwarding when it's 
also expected to perform other tasks on behalf of the user (and ISP).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] beating the drum for BQL

2018-08-23 Thread Mikael Abrahamsson

On Thu, 23 Aug 2018, Rosen Penev wrote:

Flow offloading can save quite a bit of CPU, even when done in software. 
It also helps that the kernel network stack is getting better.


I tried this on my 10GE x86-64 test bed. It didn't help. It's %sirq 
limited it seems. flowoffload changed nothing. It helps on lower end CPU 
platforms (I've tried it there too), but not for the 10GE forwarding case.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] beating the drum for BQL

2018-08-24 Thread Mikael Abrahamsson

On Thu, 23 Aug 2018, Dave Taht wrote:

On the marvell front... yes, they tend to produce hardware that runs too 
hot. I too rather like the chipset, and it's become my default hw for 
most things in the midrange.


I checked my WRT1200AC and it idles at 8W. My similar Broadcom box idles 
at 10W, but that one has a lot more on the motherboard plus 4x4 wifi that 
tends to run very hot. I intend to try them under load though and see how 
much power usage changes.



Lastly... there are still billions of slower ISP links left in the
world to fix, with hardware that now costs well under
40 bucks. The edgerouter X is 50 bucks (sans wifi) and good to
~180mbps for inbound shaping presently. Can we get those edge
connections fixed???


There are indeed these kinds of slower devices, but it's also that they 
tend to be the kind of device that last saw development a few years ago 
and only reason it's still being new installed is because it's cheap.


In most of the world, customers do not rent the CPE so there is no cash 
flow to the ISP to fix anything. So they tend to sit there until they 
break.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] beating the drum for BQL

2018-08-24 Thread Mikael Abrahamsson

On Thu, 23 Aug 2018, Dave Taht wrote:


I should also point out that the kinds of routing latency numbers in
those blog entries was on very high end intel hardware. It would be
good to re-run those sort of tests on the armada and others for
1,10,100, 1000 routes. Clever complicated algorithms have a tendency
to bloat icache and cost more than they are worth, fairly often, on
hardware that typically has 32k i/d caches, and a small L2.


My testing has been on OpenWrt with 4.14 on intel x86-64. Looking how the 
box behaves, I'd say it's limited by context switching / interrupt load, 
and not actually by CPU being busy doing "hard work".


All of the fast routing implementations (snabbswitch, FD.IO/VPP etc) they 
take away CPU and devices from Linux, and runs busy-loop with polling a 
lot of the time, an never context switching which means L1 cache is never 
churned. This is how they become fast. I see potential to do "XDP 
offload" of forwarding here, basically doing similar job to what a 
hardware packet accelerator does. Then we can optimise forwarding by using 
lessons learnt from the other projects potentially. Need to keep the 
bufferbloat work in mind when doing this though, so we don't make that bad 
again.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] beating the drum for BQL

2018-08-24 Thread Mikael Abrahamsson

On Fri, 24 Aug 2018, Dave Taht wrote:

My ar71xx/ath9 hw - like nanostations - was below 2w. wndr3800 don't 
remember, I think the the ethernet switch added quite a bit. But 8Ws? 
not even close to that. A modern LED lightbulb eats that and sheds quite 
a lot of light.


My very simple and stupid 1GE SFP/ethernet fiber media converter, uses 
4.3W when idling.



Random curiosity: what do various SFP+ interfaces (notably gpon) eat?
has anyone got a gpon interface for the omnia yet? I *hate* the need
for ONTs.


These can easily be 1-2 Watts. I put in a 1GE SFP into the before 
mentioned Broadcom HGW and power usage went up from 9.4W to 10.2W. So if 
it's a GPON or similar then I'd imagine it's substantially more 
considering that it's quite a lot more things a GPON device needs to do.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] beating the drum for BQL

2018-08-24 Thread Mikael Abrahamsson

On Fri, 24 Aug 2018, Toke Høiland-Jørgensen wrote:

Are there actually any 10GE embedded platforms one can buy? I've been 
thinking about how to upgrade my home network without putting x86 boxes 
everywhere...


https://www.solid-run.com/marvell-armada-family/macchiatobin/

I know people currently working on XDP-enabling the drivers for that 
board (Marvell 8040).


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] beating the drum for BQL

2018-08-24 Thread Mikael Abrahamsson

On Fri, 24 Aug 2018, Jan Ceuleers wrote:


On 24/08/18 13:46, Jan Ceuleers wrote:

On 24/08/18 10:06, Dave Taht wrote:

Random curiosity: what do various SFP+ interfaces (notably gpon) eat?


I have taken a look at a couple. I see numbers in the range 1.7 - 2.2W
for GPON ONTs.


Just to be clear: that's for GPON SFP ONTs.


Just the SFP, right?

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Marvell 385

2018-08-26 Thread Mikael Abrahamsson

On Sat, 25 Aug 2018, Dave Taht wrote:

The expressobin is a Marvell Armada "3700LP (88F3720) dual core ARM 
Cortex A53 processor up to 1.2GHz" - how does that compare? I have 
plenty of ath10k and ath9k pcmcia cards


I have one of these, incl wifi. Right now the drivers are not in great 
shape, but they're being worked on. My espressobin has worse performance 
than on its wired ports than my WRT1200AC (Armada 385).


I have talked to people who say the drivers are being worked on though... 
If you have input, Kaloz is probably a great person to take that input. I 
know other people working on Marvell drivers as well.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Marvell 385

2018-08-27 Thread Mikael Abrahamsson

On Sun, 26 Aug 2018, Dave Taht wrote:


I was on that thread. It was broken before entirely. As for the single
interrupt on this chip variant - believe it or not, I'm not huge on


When doing 10GE tests on x86-64 I received the highest performance when I 
set interrupt affinity to single core per interface.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] first bufferbloat free cablemodem?

2018-10-07 Thread Mikael Abrahamsson

On Sun, 7 Oct 2018, Aaron Wood wrote:


Maybe he's on a DOCSIS 3.1 headend that's also using pie?  Pie doesn't need
to know the outbound rate, correct?  as it's meant to be driven by the
RTS/CTS type behavior that the upstream traffic on cable has (the correct
terms for cable aren't coming to mind at the moment).


Correct, PIE acts on the queue just like CODEL. From what I can tell, PIE 
is a queue discipline that can be implemented on hardware that supports 
WRED (which most can) with the help of extra software and some CPU cycles 
to tune it over time. That's why HW manufacturers like PIE.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cerowrt-devel] DNSSEC key rollover today

2018-10-11 Thread Mikael Abrahamsson

On Thu, 11 Oct 2018, Dave Taht wrote:


if any of you are still using cerowrt, and dnssec, it's gonna break
unless you update this, or disable dnssec... I do not know if the new
key was in openwrt 18.06 either...

http://www.circleid.com/posts/20181005_how_to_prepare_for_dnssec_root_ksk_rollover_on_october_11_2018/


Just as an operational concern, if you have an old image of something (pre 
mid 2017) that doesn't have the new key, it's not going to be able to 
download the new key using the old key, as of today.


Any old install might have the key update function implemented and might 
have the new key, but as soon as you re-install and the new key is not 
there anymore, it'll stop working.


A DNSSEC validating device needs to have functionality to get the root key 
somehow and keep it updated. Otherwise it's better to just not validate at 
all if one cares about operational availability of the service.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] vyatta in AT&T 5G gear

2018-10-16 Thread Mikael Abrahamsson

On Mon, 15 Oct 2018, Dave Taht wrote:


Vyos (the open source fork of vyatta) was one of the first to add
fq_codel support... I wonder

http://linuxgizmos.com/att-releases-white-box-spec-for-its-linux-based-5g-routers/


Isn't Vyos just running the Linux kernel for forwarding? So they received 
fq_codel for free when the Linux kernel got support for it? They just had 
to make it configurable?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


[Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH

2018-11-11 Thread Mikael Abrahamsson


Hi,

I am running "stock" OpenWrt 18.06.1 on an WRT1200AC with 
CAKE+piece_of_cake.qos and set to 250 down 100 up. This is on an ethernet 
point-to-point FTTH connection in Stockholm, Sweden. Basically just 
installed OpenWrt and then added the sqm-scripts-extra and luci-app-sqm 
packages, went in and configured the correct settings in the web UI, and 
then everything was great.


Biggest benefit with this FTTH setup is that I don't have to experience 
the first-hop sceduler I had with my previous DOCSIS connection (that also 
sometimes didn't do advertised bandwidth so I ended up getting 10-30ms of 
bufferbloat).


http://www.dslreports.com/speedtest/41682104

The smokeping screenshots below show the difference between DOCSIS and 
FTTH scheduler, but the much lower access RTT (1-2 ms ) and the lower PDV 
(which seems to be several ms on DOCSIS but not on my P2P FTTH).


https://imgur.com/a/96dFdho

Thanks everybody for the excellent packaging and ease of use for end users 
to get this to work. I've had this running now for 40 days without any 
issue.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH

2018-11-12 Thread Mikael Abrahamsson

On Mon, 12 Nov 2018, Dave Taht wrote:


tc -s qdisc show dev your_device?
tc -s qdisc show dev your_ifbdevice?


I haven't restarted in 40 days and I don't remember restarting cake, so 
this should be several weeks of data.


qdisc cake 8031: dev eth1.2 root refcnt 2 bandwidth 100Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 70822286277 bytes 202513660 pkt (dropped 13984, overlimits 25350421 
requeues 0)

 backlog 0b 0p requeues 0
 memory used: 5156288b of 500b
 capacity estimate: 100Mbit
 min/max network layer size:   42 /1514
 min/max overhead-adjusted size:   42 /1514
 average network hdr offset:   14

  Tin 0
  thresh100Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay  4us
  av_delay  1us
  sp_delay  1us
  backlog0b
  pkts202527644
  bytes 70842325936
  way_inds  4939006
  way_miss 11834545
  way_cols0
  drops   13984
  marks 512
  ack_drop0
  sp_flows2
  bk_flows1
  un_flows0
  max_len 28766
  quantum  1514

qdisc ingress : dev eth1.2 parent :fff1 
 Sent 807912654344 bytes 631652827 pkt (dropped 0, overlimits 0 requeues 
0)

 backlog 0b 0p requeues 0
qdisc cake 8032: dev ifb4eth1.2 root refcnt 2 bandwidth 250Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 829571211610 bytes 631641016 pkt (dropped 11811, overlimits 
79004 requeues 0)

 backlog 0b 0p requeues 0
 memory used: 4540528b of 1250b
 capacity estimate: 250Mbit
 min/max network layer size:   60 /1514
 min/max overhead-adjusted size:   60 /1514
 average network hdr offset:   14

  Tin 0
  thresh250Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay1.2ms
  av_delay559us
  sp_delay  1us
  backlog0b
  pkts631652827
  bytes829588333230
  way_inds 12061686
  way_miss 12913211
  way_cols1
  drops   11811
  marks3589
  ack_drop0
  sp_flows1
  bk_flows1
  un_flows0
  max_len 38444
  quantum  1514




--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH

2018-11-12 Thread Mikael Abrahamsson

On Mon, 12 Nov 2018, Dave Taht wrote:


I guess my biggest question is how bloated is the "Before cake"
version of the link?


Not very.

http://www.dslreports.com/speedtest/41693199

I then did another test while at the same time doing a different vendor 
speedtest:


http://www.dslreports.com/speedtest/41693256

Ping just increased 5-10 ms when doing this.

If I then re-enable cake with 25/10 I get:

http://www.dslreports.com/speedtest/41693346

qdisc after this last test:

qdisc cake 8034: dev eth1.2 root refcnt 2 bandwidth 100Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 391610860 bytes 650447 pkt (dropped 1430, overlimits 645558 requeues 
0)

 backlog 0b 0p requeues 0
 memory used: 2425408b of 500b
 capacity estimate: 100Mbit
 min/max network layer size:   46 /1514
 min/max overhead-adjusted size:   46 /1514
 average network hdr offset:   14

  Tin 0
  thresh100Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay 82us
  av_delay  6us
  sp_delay  1us
  backlog0b
  pkts   651877
  bytes   393761357
  way_inds11602
  way_miss 3103
  way_cols0
  drops1430
  marks   0
  ack_drop0
  sp_flows   16
  bk_flows1
  un_flows0
  max_len 18168
  quantum  1514

qdisc ingress : dev eth1.2 parent :fff1 
 Sent 896042971 bytes 760157 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev tun2025 root refcnt 2 limit 10240p flows 1024 
quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn

 Sent 21580 bytes 166 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc cake 8035: dev ifb4eth1.2 root refcnt 2 bandwidth 250Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 912501460 bytes 754253 pkt (dropped 5904, overlimits 926439 requeues 
0)

 backlog 0b 0p requeues 0
 memory used: 805712b of 1250b
 capacity estimate: 250Mbit
 min/max network layer size:   60 /1514
 min/max overhead-adjusted size:   60 /1514
 average network hdr offset:   14

  Tin 0
  thresh250Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay650us
  av_delay429us
  sp_delay  1us
  backlog0b
  pkts   760157
  bytes   921432581
  way_inds17426
  way_miss 3168
  way_cols0
  drops5904
  marks   0
  ack_drop0
  sp_flows7
  bk_flows1
  un_flows0
  max_len 15104
  quantum  1514


it seems to smoothe out the flows better than my ISP shaper.

These tests are done when rest of people in the household was also using 
Internet for other things, so not "clean room".


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH

2018-11-12 Thread Mikael Abrahamsson

On Mon, 12 Nov 2018, Dave Taht wrote:


Don't use this connection much, do you? :)


Last 4 week average is 300 kilobit/s up and 3000 kilobit/s down. So no. 
Mostly streaming Netflix and similar things.



   marks 512


and that you have at least one device with ecn enabled. Would this be
OSX or IOS perhaps?


I typically turn it on on all devices I remember to turn it on. There are 
plenty of iOS devices in the household, but also ECN enabled OSX machines.



I don't suppose you have someone else "across town" you could run some
benchmarks against?


Surely. I can run anything you need, I have 1GE ubuntu machine 
~3ms away. What tests do you want me to run? I have ubuntu laptop here I 
can run wired tests with. It already has flent installed, so just tell me 
what you want me to do and test. If you want me to change qdisc settings 
I'm going to need good instructions, I am not proficient in changing those 
settings.



Similarly, a cpu number under load. I note here, that splitting GSO
has a big cost, (primarily in routing table lookup) and you can at
these speeds, probably disable it.


sirq% peaks out around 35-40% when doing download at 250 megabit/s. Around 
10% when doing upload at 100 megabit/s. Armada 385 is nice.


I was also expecting 64k here. I imagine you are using modern linuxes 
that don't overuse TSO anymore, and osx and windows never got into it to 
the extreme that linux did.


root@wrt1200-hemma:~# uname -a
Linux wrt1200-hemma 4.14.63 #0 SMP Wed Aug 15 20:42:39 2018 armv7l GNU/Linux



--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH

2018-11-12 Thread Mikael Abrahamsson

On Mon, 12 Nov 2018, Dave Taht wrote:


If I then re-enable cake with 25/10 I get:

http://www.dslreports.com/speedtest/41693346


I don't "get" the knee in the download curve here and the prior test.


That's when I start a competing speedtest to the local swedish speedtest 
site using an OSX app they ship.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] one benefit of turning off shaping + fq_codel

2018-11-14 Thread Mikael Abrahamsson

On Tue, 13 Nov 2018, Dave Taht wrote:


It turns out we are contributing to global warming.

https://community.ubnt.com/t5/UniFi-Routing-Switching/USG-temperature/m-p/2547046/highlight/true#M115060


There is a reason vendors have packet accelerators. It's more efficient 
compared to doing everything in CPU.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] known buffer sizes on switches

2018-11-24 Thread Mikael Abrahamsson

On Sat, 24 Nov 2018, Dave Taht wrote:


https://people.ucsc.edu/~warner/buffer.html


Nice resource, thanks.

If someone wonders why things look the way they do, so it's all about 
on-die and off-die memory. Either you use off-die or on-die memory, often 
SRAM which requires 6 gates per bit. So spending half a billion gates 
gives you ~10MB buffer on-die. If you're doing off-die memory (DRAM or 
similar) then you'll get the gigabytes of memory seen in some equipment. 
There basically is nothing in between. As soon as you go off-die you might 
as well put at least 2-6 GB in there.


Also, off-die memory takes IO capacity. A forwarding chip might have 4 
"sides" with I/O lanes sets. If you put it in a 1RU device with no buffer, 
you can connect ports to all of the lanes. This gives you a very high port 
density low buffer size device and a very good price point.


Now, if you want more buffer and more route memory (taking one "side" 
each) plus connecting it to a backplane (another side), you now only have 
a single "side" left for ports. This is why high route-count, high buffer, 
modular switches are so much more expensive compared low-route, 
low-buffer, fixed configuration ones.


Above is principle, there are of course combinations and optimizations to 
be made so not all devices adhere exactly to the above.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-27 Thread Mikael Abrahamsson

On Tue, 27 Nov 2018, Luca Muscariello wrote:

link fully utilized is defined as Q>0 unless you don't include the 
packet currently being transmitted. I do, so the TXtteer is never idle. 
But that's a detail.


As someone who works with moving packets, it's perplexing to me to 
interact with transport peeps who seem enormously focused on "goodput". My 
personal opinion is that most people would be better off with 80% of their 
available bandwidth being in use without any noticable buffer induced 
delay, as opposed to the transport protocol doing its damndest to fill up 
the link to 100% and sometimes failing and inducing delay instead.


Could someone perhaps comment on the thinking in the transport protocol 
design "crowd" when it comes to this?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-27 Thread Mikael Abrahamsson

On Tue, 27 Nov 2018, Luca Muscariello wrote:


A BDP is not a large buffer. I'm not unveiling a secret.


It's complicated. I've had people throw in my face that I need 2xBDP in 
buffer size to smoothe things out. Personally I don't want more than 10ms 
buffer (max), and I don't see why I should need more than that even if 
transfers are running over hundreds of ms of light-speed-in-medium induced 
delay between the communicating systems.


I have routers that are perfectly capable at buffering packets for 
hundreds of ms even at hundreds of megabits/s of access speed. I choose 
not to use them though, and configure them to drop packets much earlier.


My point was that FQ_codel helps to get very close to the optimum w/o 
adding useless queueing and latency. With a single queue that's almost 
impossible. No, sorry. Just impossible.


Right, I realise I wasn't clear I wasn't actually commenting on your 
specific text directly, my question was more generic.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-27 Thread Mikael Abrahamsson

On Tue, 27 Nov 2018, Luca Muscariello wrote:


If you, Mikael don't want more than 10ms buffer, how do you achieve that?


class class-default
  random-detect 10 ms 2000 ms

That's the only thing available to me on the platforms I have. If you 
would like this improved, please reach out to the Cisco ASR9k BU and tell 
them to implement ECN and PIE (or something even better). They won't do it 
because I say so, it seems. WRED is all they give me.



You change the behaviour of the source and hope flow isolation is available.


Sorry, I only transport the packets, I don't create them.


If you just cut the buffer down to 10ms and do nothing else, the only thing
you get is a short queue and may throw away half of your link capacity.


If i have lots of queue I might instead get customer complaints about high 
latency for their interactive applications.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-27 Thread Mikael Abrahamsson

On Tue, 27 Nov 2018, Luca Muscariello wrote:

This is a whole different discussion but if you want to have a per-user 
context at the BNG level + TM + FQ I'm not sure that kind of beast will 
ever exist. Unless you have a very small user fan-out the hardware 
clocks could loop over several thousands of contexts. You should expect 
those kind of features to be in the CMTS or OLT.


This is per queue per customer access port (250 customers per 10GE port, 
so 250 queues). It's on an "service edge" linecard that I imagine people 
use for BNG purposes. I tend to not use words like that because to me a 
router is a router.


I do not do coax. I do not do PON. I do point to point ethernet using 
routers and switches, like god^WIEEE intended.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


[Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

2018-11-28 Thread Mikael Abrahamsson

On Wed, 28 Nov 2018, Dave Taht wrote:


see ecn-sane. Please try to write a position paper as to where and why
ecn is good and bad.

if one day we could merely establish a talmud of commentary
around this religion it would help.


From my viewpoint it seems to be all about incremental deployment. We have 
30 years of "crud" that things need to work with, and the worst-case needs 
to be a disaster for anything that wants to deploy.


This is one thing about L4S, ETC(1) is the last "codepoint" in the header 
not used, that can statelessly identify something. If anyone sees a better 
way to use it compared to "let's put it in a separate queue and CE-mark it 
agressively at very low queue depths and also do not care about 
re-ordering so a ARQ L2 can re-order all it wants", then they need to 
speak up, soon.


I actually think the "let's not care about re-ordering" would be a 
brilliant thing, it'd help quite a lot of packet network types become less 
costly and more efficient, while at the same time not doing blocking of 
subsequent packets just because some earlier packet needed to be 
retransmitted. Brilliant for QUIC for instance, that already handles this 
(at least per-stream).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

2018-11-28 Thread Mikael Abrahamsson

On Thu, 29 Nov 2018, Jonathan Morton wrote:


You are essentially proposing using ECT(1) to take over an intended function of 
Diffserv.


Well, I am not proposing anything. I am giving people a heads-up that the 
L4S authors are proposing this.


But yes, you're right. Diffserv has shown itself to be really hard to 
incrementally deploy across the Internet, so it's generally bleached 
mid-path.


In my view, that is the wrong approach.  Better to improve Diffserv to 
the point where it becomes useful in practice.


I agree, but unfortunately nobody has made me king of the Internet yet so 
I can't just decree it into existance.


 Cake has taken steps in that direction, by implementing some reasonable 
interpretation of some Diffserv codepoints.


Great. I don't know if I've asked this but is CAKE easily implementable in 
hardware? From what I can tell it's still only Marvell that is trying to 
put high performance enough CPUs into HGWs to do forwarding in CPU (which 
can do CAKE), all others still rely on packet accelerators to achieve the 
desired speeds.


My alternative use of ECT(1) is more in keeping with the other 
codepoints represented by those two bits, to allow ECN to provide more 
fine-grained information about congestion than it presently does.  The 
main challenge is communicating the relevant information back to the 
sender upon receipt, ideally without increasing overhead in the TCP/IP 
headers.


You need to go into the IETF process and voice this opinion then, because 
if nobody opposes in the near time then ECT(1) might go to L4S 
interpretation of what is going on. They do have ECN feedback mechanisms 
in their proposal, have you read it? It's a whole suite of documents, 
architecture, AQM proposal, transport proposal, the entire thing.


On the other hand, what you want to do and what L4S tries to do might be 
closely related. It doesn't sound too far off.


Also, Bob Briscoe works for Cable Labs now, so he will now have silicon 
behind him. This silicon might go into other things, not just DOCSIS 
equipment, so if you have use-cases that L4S doesn't do but might do with 
minor modification, it might be better to join him than to fight him.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

2018-11-29 Thread Mikael Abrahamsson

On Thu, 29 Nov 2018, Jonathan Morton wrote:


I'd say the important bits are only slightly harder than doing the same with 
fq_codel.


Ok, FQ_CODEL is way off to get implemented in HW. I haven't heard anyone 
even discussing it. Have you (or anyone else) heard differently?


I believe much of Cake's perceived CPU overhead is actually down to 
inefficiencies in the Linux network stack.  Using a CPU and some modest 
auxiliary hardware dedicated to moving packets, not tied up in handling 
general-purpose duties, then achieving greater efficiency with 
reasonable hardware costs could be quite easy, without losing the 
flexibility to change algorithms later.


I need to watch the MT7621 packet accelerator talk at the most recent 
OpenWrt summit. I installed OpenWrt 18.06.1 on an Mikrotik RB750vGR3 and 
just clicked my way around in LUCI and enabled flow offload and b00m, it 
now did full gig NAT44 forwarding. It's implemented as a -j FLOWOFFLOAD 
iptables rule. The good thing here might be that we could throw 
unimportant high speed flows off to the accelerator and then just handle 
the time sensitive flows in CPU, and just make sure the CPU has 
preferential access to the media for its time-sensitive flow. That kind of 
approach might make FQ_CODEL deployable even on slow CPU platforms with 
accelerators because you would only run some flows through FQ_CODEL, where 
the bulk high-speed flows would be handed off to acceleration (and we 
guess they don't care about PDV and bufferbloat).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

2018-11-29 Thread Mikael Abrahamsson

On Thu, 29 Nov 2018, Sebastian Moeller wrote:

	As far as I can tell intel is pushing atom/x86 cores into its 
docsis SoCs (puma5/6/7) as well as into the high-end dsl SoCs (formerly 
lantiq, 
https://www.intel.com/content/www/us/en/smart-home/anywan-grx750-home-gateway-brief.html?wapkw=grx750), 
I am quite confident that those also pack enough punch for CPU based 
routing at Gbps-rates. In docsis modems these are already rolled-out, I 
do not know of any DSL modem/router that uses the GRX750


"10 Gbit/s packet processor".

Game over, again.

Call me naive, but the solution to the impasse at getting a common 
definition of diffserv agreed upon is replacing all TCP CC algorithms? 
This is replacing changing all endpoints (and network nodes) to honor 
diffserve with changing all endpoints to use a different TCP CC. At 
least I would call that ambitious (unless L4S offers noticeable 
advantages for all participating without being terribly unfair to the 
non-participating legacy TCP users*).


L4S proposes a separate queue for the L4S compatible traffic, and some 
kind of fair split between L4S and non-L4S traffic. I guess it's kind of 
along the lines of my earlier proposals about having some kind of fair 
split with 3 queues for PHB LE, BE and the rest. It makes it deployable in 
current HW without the worst kind of DDoS downsides imaginable.


The Internet is all about making things incrementally deployable. It's 
very frustrating, but that's the way it is. Whatever we want to propose 
needs to work so-so with what's already out there and it's ok if it takes 
a while before it makes everything better.


I'd like diffserv to work better, but it would take a lot of work in the 
operator community to bring it out to where it needs to be. It's not 
hopeless though, and I think 
https://tools.ietf.org/html/draft-ietf-tsvwg-le-phb-06 is one step in the 
right direction. Just the fact that we might have two queues instead of 
one in the simplest implementations might help. The first step is to get 
ISPs to not bleach diffserv but at least allow 000xxx.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

2018-11-29 Thread Mikael Abrahamsson

On Thu, 29 Nov 2018, Jonathan Morton wrote:

I have to ask, why would the network care?  What optimisations can be 
obtained by reordering packets *within* a flow, when it's usually just 
as easy to deliver them in order?


Because most implementations aren't flow aware at all and might have 4 
queues, saying "oh, this single queue is for transports that don't care 
about ordering" means everything in that queue can just be sent as soon as 
it can, ignoring HOL caused by ARQ.


Of course, we already have FQ which reorders packets in *different* 
flows.  The benefits are obvious in that case.


FQ is a fringe in real life (speaking as a packet moving monkey). It's 
just on this mailing list that it's the norm.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] when does the CoDel part of fq_codel help in the real world?

2018-11-29 Thread Mikael Abrahamsson

On Thu, 29 Nov 2018, Stephen Hemminger wrote:

The problem is that any protocol is mostly blind to the underlying 
network (and that can change).  To use dave's analogy it is like being 
put in the driver seat of a vehicle blind folded.  When you step on the 
gas you don't know if it is a dragster, jet fighter, or a soviet 
tractor. The only way a protocol can tell is based on the perceived 
inertia and when it runs into things...


Actually, I've made the argument to IETF TCPM that this is not true. You 
can be able to communicate earlier data from previous flows on the same 
connection so that new flows can re-learn this.


If no flow the past hour has been able to run faster than 1 megabit/s and 
always PMTUD to 1460 bytes MTU outbound, then there is good chance that 
the next flow will encounter the same thing. Why not use this information 
when guessing how things will behave going forward?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

2018-11-29 Thread Mikael Abrahamsson

On Fri, 30 Nov 2018, Jonathan Morton wrote:

Ah, so you're thinking in terms of link-layers which perform local 
retransmission, like wifi.  So the optimisation is to not delay packets 
"behind" a corrupted packet while the latter is retransmitted.


Yes.

It's possible for a TCP to interpret a reordered packet as missing, 
triggering an end-to-end retransmission which is then discovered to be 
unnecessary.  At the application level, TCP also performs the same HoL 
blocking in response to missing data.  So it's easy to see why links try 
to preserve ordering, even to this extent, but I suspect they typically 
do so on a per-station basis rather than per-flow.


It's a "truth-everybody-knows" in networking that "NEVER RE-ORDER PACKETS 
WITHIN 5-TUPLE FLOW! THERE BE DRAGONS THERE!". I'd also say I see 
enough transport people who says that this should be true generally, if 
nothing else because of legacy.


Personally I think the problem of reordering packets is overblown, and 
that TCPs can cope with occasional missing or reordered packets without 
serious consequences to performance.  So if you add "reordering 
tolerant" to the list of stuff that Diffserv can indicate, you might 
just end up with all traffic being marked that way.  Is that really 
worthwhile?


Question isn't so much about TCP, it's the other things I am worried 
about. TCP handles re-ordering kind of gracefully, other protocols might 
not.


Oddly enough, wifi is now one of the places where FQ is potentially 
easiest to find, with Toke's work reaching the Linux kernel and so many 
wifi routers being Linux based.


Again, even if they're using Linux they will/might have packet 
accelerators that just grab the flow and the kernel never sees it again. 
No FQ_CODEL for that.


An acknowledged problem is overly persistent retries by the ARQ 
mechanism, such that the time horizon for the link-layer retransmission 
often exceeds that of the end-to-end RTO, both for TCP and 
request-response protocols like DNS. I say, retransmit at the link layer 
once or twice, then give up and let the end-hosts sort it out.


I agree, but I also think that it would help some link-layers if the 
re-ordering requirement could be relaxed. However, before that can be 
communicated a lot of study needs to be done to check if this is actually 
true. I've had incidents in my 20 year networking career where it's not 
and applications misbehaved when they were re-ordered.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Does VDSL interleaving+FEC help bufferbloat?

2019-01-04 Thread Mikael Abrahamsson

On Fri, 4 Jan 2019, Dave Taht wrote:

dsl interleave was added primarily to make multicast udp tv streams work 
better (as they are very intolerant of packet loss). Often (as in free's 
implementation) these streams are "invisible" to the overlying IP 
applications. It typically adds at least 6ms of delay to an already slow 
technology.


ADSL2+ is very prone to short bursts of interference, so setting no 
interleaving means quite high packet loss. Setting interleaving to 16ms 
means FEC has a much better chance of correcting errors and thus reduce 
packet loss.


At several jobs ago we actually had several different profiles for 
customers, they could choose 1, 4 or 16ms interlaving depending on their 
needs for gaming etc. The 1 and 4 ms interleaving targets had different 
SNR margin targets so they were sacrificing speed for lower latency, 
because that's the tradeoff you basically have to do with normal L4 
protocols that end customers typically use.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] question about ack-filter

2019-02-03 Thread Mikael Abrahamsson

On Sun, 3 Feb 2019, Evuraan wrote:


Greetings!

Since my search-fu has failed:

What is ack-filtering? How is it important?  What's the difference
between ack-filter-aggressive and ack-filter?


We discussed it here if you want more background information:

https://bloat.bufferbloat.narkive.com/PCQIrEs7/benefits-of-ack-filtering

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] The "Some Congestion Experienced" ECN codepoint - a new internet draft -

2019-03-11 Thread Mikael Abrahamsson

On Sun, 10 Mar 2019, Jonathan Morton wrote:

An interesting idea, but SCE marks will appear even when there's a lot 
of congestion (at high rates, ie. probably every packet that doesn't 
carry CE), as well as showing up at low frequency when the level of 
congestion only warrants reducing the growth rate.  I think the word 
"Some" is sufficiently descriptive, while "Slight" might cause people to 
ignore it completely.


One way to handle this would be "buffering experienced" or something like 
that. Ie if this packet is being enqueued into a buffer with non-trivial 
number of packets in it, mark it.


The L4S proposal also has the property that their use of this last code 
point combination in the entire packet header (and this is a big thing, 
this is the last unicorn) also meant the packet was allowed to be 
re-ordered. I thought this was a big and nice thing, for other areas. This 
new proposal removes that property.


From what I can see, L4S actually is quite novel and has the chance to 
seriously change the way queueing is done. This proposal seems more like 
"a little more of what we had before" which I do not think warrants 
claiming this last unicorn codepoint. I'd like its use to be truly novel 
and be more than a tweak.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cake] The "Some Congestion Experienced" ECN codepoint - a new internet draft -

2019-03-11 Thread Mikael Abrahamsson

On Mon, 11 Mar 2019, Sebastian Moeller wrote:

	How is packet reordering for anybody but the folks responsible for 
operating the "conduits" in any way attractive?


For instance QUIC uses muxing of streams within the same 5 tuple, so it 
has use of the transport not holding up information just to preserve 
ordering within the 5 tuple stream.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


  1   2   3   >