from:"\"Mikael Abrahamsson\""


On Tue, 22 Nov 2016, Dave Taht wrote:

I would like to see the industries most affected by bufferbloat - 
voip/videoconferencing/gaming,web gain a good recognition of the 
problem, how to fix it, and who to talk to about it (router makers and 
ISPs)


It would be great if the realtime communications people (gaming, video, 
audio etc) had some kind of help page where people could be pointed to 
understand the problem.


I saw a Youtube video btw, where they had problems with gaming because 
"I'm uploading a youtube video at the same time as I am gaming, stupid 
me". People don't even realise this is not the way it has to be.


My take on this is that the problem is fairly well understood in "our" 
circles, but the wider audience still doesn't know, and even if they know, 
there is nowhere to go to fix it.


If we can find a product that solves the gaming community problem (they're 
one of the people who have "ping" in their applications and who 
immediately notices when it's bad), we could perhaps approach someone 
prominent in that gaming community and making a video on how to solve the 
problem.


"Look here, I did  and now I can game and upload a youtube video at the 
same time without problemsoneoneone"


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] fixing bufferbloat in 2017


On Wed, 23 Nov 2016, Pedro Tumusok wrote:

If this something we should try, I can help out with the first point, 
but the second one probably needs local bufferbloat evangelists.


I am not worried about getting these people on board to show a solution.

I'm worried that we do not have a solution that is easily deployable for 
"normal" people. If someone has X/Y megabit/s Comcast Internet connection, 
what solution do we have to offer them? I can't think of one that actually 
solves the problem for real.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] fixing bufferbloat in 2017


On Wed, 23 Nov 2016, Benjamin Cronce wrote:

If there is a simple affordable solution, say Open/DD-WRT distro based 
bridge that all you do is configure your up/down bandwidth and it 
applies Codel/fq-Codel/Cake, then all you need to do is drive up 
awareness. A good channel for awareness would be getting in contact with 
popular Twitch or YouTube gaming streamers. But I wouldn't put much 
effort into driving up awareness until there is a device that people can 
easily acquire, use, and afford. At first I was thinking of telling 
people to use *-WRT supporting routers, but changing the firmware on 
your router requires too much research, and many people care about 
bleeding edge features. You need something that works in tangent with 
whatever they are using.


If Comcast sells you 100/20 (I have no idea if this is a thing), you set 
your upstream on this box to 18 meg fq_codel, and then Comcast 
oversubscribes you so you only get 15 meg up part of the time, then you're 
still bloated by the modem. This is not a solution.


I don't think "buy $thing, install *WRT on it, configure it like this" is 
above most gamers, but I'm afraid we don't even have a working solution 
for someone with that kind of skillset.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] fixing bufferbloat in 2017


On Wed, 23 Nov 2016, David Lang wrote:

Deploy what we already know to work on the real edge devices and things 
get vastly simpler.


Sure! Sounds Great. How?

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Fixing bufferbloat in 2017

2016-11-26 Thread Mikael Abrahamsson


On Sat, 26 Nov 2016, Aaron Wood wrote:

and call it a day.  And those BSPs are _ancient_.  I wouldn't be 
surprised to see 2.6 still coming out on new models, let alone 4.0.


Most seem to be on 3.2 and 3.4, but I've heard people say Broadcom now has 
BSP for 4.1.


However, since basically all high-speed devices use a hardware packet 
accelerator, even with newer kernels you might not get anti-bufferbloat 
benefit because these packet accelerators have their own buffer handling.


I might be in the position to test one of these broadcom 4.1 based devices 
in the next few months, I'll run some tests and report back if that 
happens.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] 22 seconds til bloat on gfiber?

2016-12-06 Thread Mikael Abrahamsson


On Wed, 7 Dec 2016, Jonathan Morton wrote:

That’s not to say it’s *impossible* to sell 4Gbps or 10Gbps connections. 
You could do it by bundling a multi-port switch with a sufficiently fast 
uplink port, and sell it as “a full gigabit for each of N computers”. 
The most obvious customers to target might be apartment complexes or 
entire villages, who could share such a connection over a large number 
of users and defray a relatively high installation cost.


I believe this is what Comcast is doing for their 2 gigabit/s service, and 
why Netgear released their X10 with SFP+ uplink.


I've been told Comcast does SFP+ handoff, and this device seems to be 
tailor made for use with such a service.


Otoh it seems that 2.5GE and 5GE is going to be a thing in the not so 
distant future, I've been told 2017 will see shipping products for this 
that will be at a better price point that 10GE is currently at (which 
means quite expensive).


So I imagine we'll be seeing high end "home routers" with built in L2 
switches that have 1/2.5/5GE support to cater for this market in 2017.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] TCP BBR paper is now generally available

2016-12-08 Thread Mikael Abrahamsson


On Fri, 2 Dec 2016, Dave Taht wrote:


http://queue.acm.org/detail.cfm?id=3022184


"BBR converges toward a fair share of the bottleneck bandwidth whether 
competing with other BBR flows or with loss-based congestion control."


That's not what I took away from your tests of having BBR and Cubic flows 
together, where BBR just killed Cubic dead.


What has changed since? Have you re-done your tests with whatever has 
changed, I must have missed that? Or did I misunderstand?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] TCP BBR paper is now generally available

2016-12-08 Thread Mikael Abrahamsson


On Thu, 8 Dec 2016, Dave Täht wrote:


drop tail works better than any single queue aqm in this scenario.


*confused*

I see nothing in the BBR paper about how it interoperates with other 
TCP algorithms. Your text above didn't help me at all.


How is BBR going to be deployed? Is nobody interested how it behaves in a 
mixed environment?


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] What does cablelabs certification actually do?

2016-12-08 Thread Mikael Abrahamsson


On Fri, 9 Dec 2016, jb wrote:

And then wondered why certification can't also include verification for 
correctly sized buffers as well?


There is nothing stopping this, and it's being worked on (PIE goes into 
DOCSIS 3.1).


http://www.cablelabs.com/wp-content/uploads/2014/06/DOCSIS-AQM_May2014.pdf
https://www.nanog.org/sites/default/files/20160922_Klatsky_First_Steps_In_v1.pdf

Cable Labs (as far as I understand) is an organisation funded by cable 
operators and vendors, and they create standards and tests used by the 
cable industry.


I don't know what tests Cable Labs perform, but there is nothing stopping 
them from validating buffers+AQM in the modems as well, and I do hope they 
do this going forward.


Why not reach out to Greg White who is mentioned in the cablelabs 
DOCSIS-AQM pdf above and ask? Or even better, invite him to this list if 
he's not already here. He's on the IETF AQM WG list, I have posts by him 
in my folder back to 2013.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

[Bloat] how much delay is too much delay

2017-01-13 Thread Mikael Abrahamsson



https://www.youtube.com/user/xFPxAUTh0r1ty

This channel analyses several online games and how they work networkwise. 
It seems online games typically "tick" at 30-60Hz in that the game server 
and user application communicates this often. 60Hz seems to be the "golden 
standard", and I guess resolution of 17ms is fine for when things are 
happening.


In gaming they have multiple delay components, one is "input delay" which 
relates to the time it takes from you for instance press the mouse button, 
until the game shows that it has responded by showing you result on 
screen. It seems this is typically 40-60ms, because the game needs to 
handle the input, send data to the graphics card, which needs to render 
it, and then it needs to be sent to the monitor. There are of course a lot 
more than this, but you get the idea.


I don't know what the delay is from mouse-click to when the game knows you 
clicked, and then can send out this information to the game server, but 
from what I'm guessing from reading up on the topic, this is in the "less 
than 10ms" range. So theoretically, the game can send an update to the 
game server much quicker than it can display on the local screen.


Another data point for instance for the game "Rocket League", is that the 
highest ranking players have a hard time playing effectively when the 
user-to-game server "ping" is more than approximately 100ms. I don't know 
if this is RTT, but considering they're getting around 130ms from a user 
in Texas to a server in Europe, it seems reasonable that this is RTT.


My reason for bringing this up (again) in the bloat forum, is that these 
people are exactly the kind of people who are very sensitive to problems 
that "anti-bloat" solves. If we can come up with a solution that makes it 
less likely that these people will get "ping spikes" etc, and we can 
package up something that actually solves this (preferrably something they 
can go to the store and buy outright), this would be a great way to 
"market" it. I'm quite sure they'd be interested in making videos about it 
to make more people aware of the problem.


There are multiple "gaming routers" out there, with "QoS". I have no idea 
what this "QoS" does. If anyone knows, I'd be very interested in knowing 
more.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] how much delay is too much delay

2017-01-13 Thread Mikael Abrahamsson


On Fri, 13 Jan 2017, Jesper Dangaard Brouer wrote:

I love the way he measures the delay by recording the screen with a high 
speed camera, and then correlate mouse-button activation by a visual 
red-blink (some PC-local setup/app) and counting the frames until the 
movement happen in the game.


He actually has an LED connected to the mouse itself, so the red blink is 
when the electrical circuit is closed by the mouse button press.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] emulating non-duplex media in linux qdiscs

2017-10-10 Thread Mikael Abrahamsson


On Mon, 9 Oct 2017, Dave Taht wrote:

Saying that is half duplex, doesn't work for me. In their example of 
"half duplex", (using push to talk), it still means that everybody on 
that channel hears who is talking. "half duplex" to me, given the 
definition of duplex, means more that there is a *p2p* channel (a wire), 
that you can ping pong data across.


A 10base-T hub connected to a 10base-2 or 10base-5 segment, all in the 
same broadcast domain, is considered to be "half duplex" in ethernet port 
configuration term.


So it doesn't have to be p2p. And I do think this mimics a shared radio as 
well (because a coax wire with multiple nodes on it seems very similar to 
a radio channel over the air).


Now, radio has the difference that two stations might not hear each other, 
and that's of course a problem in CSMA/CD terms.


Back to your netem problem. What you need is to force all packets through 
the same queue, right? So I tried to dream up a complicated scheme with 4 
bridges and some kind of "forced forwarding", but I don't think it'd pan 
out.


So the best way is probably to have a shaper that feeds 
transmit-tokens/does scheduling into two different shapers (rx/tx on the 
same interface). So whatever scheduling they are fed in order to tell them 
the rate they're allowed to transmit, they get it from the same source. 
That way they have to compete for the same resources.


This will not perfectly mimic the exponential backoff of CSMA/CD, but it 
might be good enough for what you need? Also, I just realised I have no 
idea how wifi is scheduled. Is it even close to CSMA/CD?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] keyboard bloat

2017-11-24 Thread Mikael Abrahamsson


On Fri, 24 Nov 2017, Dave Taht wrote:


https://danluu.com/keyboard-latency/


This is very interesting.

This guy is doing "button to pixel" delay testing:

https://www.youtube.com/watch?v=4GnKsqDAmgY

At 2:25 he's also talking about the total chain of events that needs to 
happen between input and when you see something on the screen.


I wish there was more focus and testing on these kinds of things, then 
perhaps we'd also get more focus on bufferbloat in the network.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] benefits of ack filtering

2017-11-28 Thread Mikael Abrahamsson


On Tue, 28 Nov 2017, Dave Taht wrote:


Recently Ryan Mounce added ack filtering cabilities to the cake qdisc.

The benefits were pretty impressive at a 50x1 Down/Up ratio:

http://blog.cerowrt.org/post/ack_filtering/

And quite noticeable at 16x1 ratios as well.

I'd rather like to have a compelling list of reasons why not to do
this! And ways to do it better, if not. The relevant code is hovering
at:

https://github.com/dtaht/sch_cake/blob/cobalt/sch_cake.c#L902


Your post is already quite comprehensive when it comes to downsides.

The better solution would of course be to have the TCP peeps change the 
way TCP works so that it sends fewer ACKs. I don't want middle boxes 
making "smart" decisions when the proper solution is for both end TCP 
speakers to do less work by sending fewer ACKs. In the TCP implementations 
I tcpdump regularily, it seems they send one ACK per 2 downstream packets.


At 1 gigabit/s that's in the order of 35k pps of ACKs (100 megabyte/s 
divided by 1440 divided by 2). That's in my opinion completely ludicrous 
rate of ACKs for no good reason.


I don't know what the formula should be, but it sounds like the ACK 
sending ratio should be influenced by how many in-flight ACKs there might 
be. Is there any reason to have more than 100 ACKs in flight at any given 
time? 500? 1000?


My DOCSIS connection (inferred through observation) seems to run on 1ms 
upstream time slots, and my modem will delete contigous ACKs at 16 or 32 
ACK intervals, ending up running at typically 1-2 ACKs per 1ms time slot. 
This cuts down the ACK rate when I do 250 megabit/s downloads from 5-8 
megabit/s to 400 kilobit/s of used upstream bw.


Since this ACK reduction is done on probably hundreds of millions of 
fixed-line subscriber lines today, what arguments do designers of TCP have 
to keep sending one ACK per 2 received TCP packets?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] benefits of ack filtering

2017-11-29 Thread Mikael Abrahamsson


On Wed, 29 Nov 2017, Sebastian Moeller wrote:

Well, ACK filtering/thinning is a simple trade-off: redundancy versus 
bandwidth. Since the RFCs say a receiver should acknoledge every second 
full MSS I think the decision whether to filter or not should be kept to


Why does it say to do this? What benefit is there to either end system to 
send 35kPPS of ACKs in order to facilitate a 100 megabyte/s of TCP 
transfer?


Sounds like a lot of useless interrupts and handling by the stack, apart 
from offloading it to the NIC to do a lot of handling of these mostly 
useless packets so the CPU doesn't have to do it.


Why isn't 1kPPS of ACKs sufficient for most usecases?

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] benefits of ack filtering

2017-11-29 Thread Mikael Abrahamsson


On Wed, 29 Nov 2017, Luca Muscariello wrote:


Why does it say to do this? What benefit is there to either end system to
send 35kPPS of ACKs in order to facilitate a 100 megabyte/s of TCP transfer?


Did you check RFC 3449 ?
https://tools.ietf.org/html/rfc3449#section-5.2.1


RFC3449 is all about middleboxes doing things.

I wanted to understand why TCP implementations find it necessary to send 
one ACK per 2xMSS at really high PPS. Especially when NIC offloads and 
middleboxes frequently strip out this information anyway so it never 
reaches the IP stack (right?).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] benefits of ack filtering

2017-11-30 Thread Mikael Abrahamsson


On Thu, 30 Nov 2017, Eric Dumazet wrote:

I agree that TCP itself should generate ACK smarter, on receivers that 
are lacking GRO. (TCP sends at most one ACK per GRO packets, that is why 
we did not feel an urgent need for better ACK generation)


Could you elaborate a bit more on the practical implications of the above 
text? What is the typical GRO size used when doing gigabit ethernet 
transmissions?


So if we're receiving 70kPPS of 1500 byte packets containing 1460 MSS 
sized packet (~100 megabyte/s), what would a typical ACK rate be in that 
case?


In response to some other postings here, my question regarding "is 35kPPS 
really needed" my proposal is not "let's send 50 PPS of ACKs". My proposal 
is if we can't come up with a smarter algorithm than something from the 
90ties that says "let's send one ACK per 2*MSS" when we today have 
magnitudes higher rates of forwarding. Also, on for instance DOCSIS 
networks then you're going to get several ACKs back-to-back anyway 
(because if they're not pruned by the DOCSIS network, they're anyway sent 
in "bursts" within a single DOCSIS transmit opportunity), so imagining 
that 35kPPS gives you higher resolution than 1kPPS of ACKs is just an 
illusion.


So if GRO results in (I'm just speculating here) "we're only sending one 
ACK per X kilobytes received if the packets arrived in the same 
millisecond" and X is in the 16-64 kilobyte range, then that's fine by me.


Any network worth anything should be able to smooth out "bursts" of 16-64 
kilobytes at line rate anyway, in case of egress and the line rate there 
is lower than the sending end is transmitting packets at.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Bufferbloat in high resolution + non-stationarity

2017-11-30 Thread Mikael Abrahamsson


On Thu, 30 Nov 2017, Jonathan Morton wrote:

I submit that to provide *deployable* QoS schemes, you must either solve 
the classification problem elegantly (which is a Hard Problem), or else 
show that your scheme works adequately in the absence of classification. 
I'm taking the latter approach with Cake, even though it *also* supports 
Diffserv awareness to enhance its performance where classification is 
straightforward.


In IETF INT-AREA, there is now discussion about allocating a new diffserv 
codepoint for "less-than-best-effort" traffic. I have been advocate for 
this for quite a while, and I actually believe that this is incrementally 
deployable and has a chance to actually get ISP buy-in.


The idea is to use TOS 0, but use the last 3 diffserv bits to indicate 
that this is less-than-BE. Non-implementing networks will treat this as 
BE, implementing networks can use some kind of DRR scheme to give this 
traffic less bandwidth in case of congestion, or just drop it earlier when 
there is queue buildup.


I think this is the only chance we have to get internet-wide coordination 
for a diffserv codepoint that people will do anything with, and the 
recommendation should be to only act on this at the customer access line 
(the one connecting the ISP to the residential gateway) or perhaps within 
the customer network. The hope is that ISPs will not mangle/bleach this 
codepoint, because it actually indicates traffic should get lower 
priority, not higher.


I am in complete agreement with you that any scheme that relies on 
Internet-wide QoS scheme based on diffserv/TOS is a no-go. No ISP will 
listen to this and act on it, as it's a DoS vector.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Make-wifi-fast] benefits of ack filtering

2017-12-03 Thread Mikael Abrahamsson


On Sun, 3 Dec 2017, Juliusz Chroboczek wrote:


As far as I know, DOCSIS has an asymmetry factor that is between 4 and 10,
depending on the deployment.  With worst case asymmetry being 10, this


I can buy 300/10 megabit/s access from my cable provider. So that's a lot 
worse. My cable box has 16 downstream channels, and 4 upstream ones. Each 
channel is TDM based, and there is some kind of scheduler granting sending 
opportunities for each channel to each modem, as needed. I'm not a DOCSIS 
expert.



means that you can send an Ack for every data packet with 400 byte data
packets, every second data packet with 200 byte data packets.  If the
asymmetry is a more reasonable 4, then the figures are 100 and 50
respectively.

Try as I might, I fail to see the problem.  Are we advocating deploying
TCP-aware middleboxes, with all the problems that entails, in order to
work around a problem that doesn't exist?


If I understand correctly, DOCSIS has ~1ms sending opportunities upstream. 
So sending more than 1kPPS of ACKs is meaningless, as these ACKs will just 
come back to back at wire-speed as the CMTS receives them from the modem 
in chunks. So instead, the cable modem just deletes all the sequential 
ACKs and doesn't even send these back-to-back ones.


LTE works the same, it's also frequency divided and TDM, so I can see the 
same benefit there of culling sequential ACKs sitting there in the buffer. 
I don't know if this is done though.


I've seen people I think are involved in TCP design. They seem to be under 
the impression that more ACKs give higher resolution and granularity to 
TCP. My postulation is that this is commonly false because of how the 
network access is designed and how also the NICs are designed (the 
transmit/receive offloading). So sending 35kPPS of ACKs for a gigabit/s 
transfer is just inefficient and shouldn't be done. I would prefer if end 
points would send less ACKs instead of the network killing them.


And the network does kill them, as we have seen. Because any novice 
network access technology designer can say "oh, having 16 sequential ACKs 
here in my buffer, sitting waiting to get sent, is just useless 
information. Let's kill the 15 first ones."


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today


On Sun, 3 Dec 2017, Dave Taht wrote:

What Jesper's been working on for ages has been to try and get linux's 
PPS up for small packets, which last I heard was hovering at about 
4Gbits.


You might want to look into what the VPP (https://fd.io/) peeps are doing. 
They can at least forward packets at pretty impressive rates. 200Mpps zero 
frame loss with 2M FIB, limited to NIC and PCIe, not CPU (on many-core 
machine).


I have never thought there was much of a market for gbit to or from the 
home. 40Mbits is enough for nearly everybody until > 4k video with 
smellovision and tactile feedback become a standard.


I'd say the sweet spot right now is in the 100-250 megabit/s range, 
considering "cost of production" and "what do people need/use". This means 
it still can be done on 1 gigabit/s access links.


Anything faster than 1GE is going to be significantly more expensive than 
1GE because 1GE is "good enough for most" when it comes to hundreds of 
millions of households for their inter/intra home need. Also for SME use, 
1GE is good enough for a lot of use cases.


I personally now have 250/50 which is good enough for me, and I don't want 
to pay 2x my current MRC to get 1000/100. However, if I had to downgrade 
to 30 megabit/s I would most certinaly notice it, and in my market that 
would just be a 20-30% saving which definitely isn't worth it.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today


On Mon, 4 Dec 2017, Joel Wirāmu Pauling wrote:


I'm not going to pretend that 1Gig isn't enough for most people. But I
refuse to believe it's the networks equivalent of a 10A power (20A
depending on where you live in the world) AC residential phase
distribution circuit.


That's a good analogy. I actually believe it is, at least for the near 
5-10 years.



This isn't a question about what people need, it's more about what the
market can deliver. 10GPON (GPON-X) and others now make it a viable
service that can and is being deployed in residential and commercial
access networks.


Well, you're sharing that bw with everybody else on that splitter. Sounds 
to me that the service being delivered over that would instead be in the 
2-3 gigabit/s range for the individual subscriber (this is what I 
typically see on equivalent shared mediums, that the top speed individual 
subscriptions are will be in the 20-40% of max theoretical speed the 
entire solution can deliver).


The problem is now that Retail Servicer Provider X can deliver a post 
Gigabit service... what is capable of taking it off the ONU/CMNT point 
in the home? As usual it's a follow the money question, once RSP's can 
deliver Gbit+ they will need an ecosystem in the home to feed into it, 
and right now there isn't a good technology platform that supports it; 
10GBase-X/10GBaseT is a non-starter due to the variability in home 
wiring - arguably the 7 year leap from 100-1000mbit was easy It's mean a 
gap of 12 years and counting for the same.. it's not just the NIC's and 
CPU's in the gateways it's the connector and in-home wiring problems as 
well.


As soon as one goes above 1GE, prices increases A LOT on everything 
involved. I doubt we'll see any 2.5G or higher speed equipment in wide use 
in home/SME in the next 5 years.



Blatant Plug - request :
I'm interested to hear opinions on this as I have a talk on this very
topic 'The long and Winding Road to 10Gbit+ in the home'
https://linux.conf.au/ at Linuxconf in January. In particular if you
have any home network gore/horror stories and photos you would be
happy for me to include in my talk, please include.


I am still waiting for a decently priced 10GE switch. I can get 1GE 
24port managed ones, fanless, for 100-200USD. As soon as I go 10GE, price 
jumps up a lot, and I get fans. The NICs aren't widely available, even 
though they're not the biggest problem. My in-house cabling can do 10GE, 
but I guess I'm an outlier.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today


On Mon, 4 Dec 2017, Joel Wirāmu Pauling wrote:


How to deliver a switch, when the wiring and port standard isn't
actually workable?


Not workable?


10GBase-T is out of Voltage Spec with SFP+ ; you can get copper SFP+


Yep, the "Cu SFP" was a luxury for a while. Physics is harsh mistress 
though.



but they are out of spec... 10GbaseT doesn't really work over Cat5e
more than a couple of meters (if you are lucky) and even Cat6 is only
rated at 30M... there is a reason no-one is producing Home Copper
switches and it's not just the NIC Silicon cost (that was a factor
until Recently obviously, but only part of the equation).


I have CAT6 in my home, and not more than 30 meters anywhere. So it would 
work for me. You need CAT6e for 100M, so anyone doing new installs should 
use that. Stiff cable, though.



On the flip side:
Right now I am typing this via a 40gbit network, comprised of the
cheap and readily available Tb3 port - it's daisy chained and limited
to 6 ports, but right now it's easily the cheapest and most effective
port. Pitty that the fabled optical tb3 cables are damn expensive...
so you're limited to daisy-chains of 2m. They seem to have screwed the
pooch on the USB-C network standard quite badly - which looked so
promising, so for the moment Tb3 it is for me at least.


With that distance, you could probably run 10GE over CAT3 wiring. So there 
is a reason 10GE requires more for longer distances, because it's bad 
cable so instead you need lots of power and DSPs to figure out what's 
going on.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today


On Mon, 4 Dec 2017, Pedro Tumusok wrote:


Looking at chipsets coming/just arrived from the chipset vendors, I think
we will see CPE with 10G SFP+ and 802.11ax Q3/Q4 this year.
Price is of course a bit steeper than the 15USD USB DSL modem :P, but
probably fits nicely for the SMB segment.


https://kb.netgear.com/31408/What-SFP-modules-are-compatible-with-my-Nighthawk-X10-R9000-router

This has been available for a while now. Only use-case I see for it is 
Comcast 2 gigabit/s service, that's the only one I know of that would fit 
this product (since it has no downlink 10GE ports).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-07 Thread Mikael Abrahamsson


On Mon, 4 Dec 2017, dpr...@reed.com wrote:

I suggest we stop talking about throughput, which has been the mistaken 
idea about networking for 30-40 years.


We need to talk both about latency and speed. Yes, speed is talked about 
too much (relative to RTT), but it's not irrelevant.


Speed of light in fiber means RTT is approx 1ms per 100km, so from 
Stockholm to SFO my RTT is never going to be significantly below 85ms 
(8625km great circle). It's current twice that.


So we just have to accept that some services will never be deliverable 
across the wider Internet, but have to be deployed closer to the customer 
(as per your examples, some need 1ms RTT to work well), and we need lower 
access latency and lower queuing delay. So yes, agreed.


However, I am not going to concede that speed is "mistaken idea about 
networking". No amount of smarter queuing is going to fix the problem if I 
don't have enough throughput available to me that I need for my 
application.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] benefits of ack filtering

2017-12-13 Thread Mikael Abrahamsson


On Wed, 13 Dec 2017, Jonathan Morton wrote:

the uplink shaper is set to about a fiftieth of that.  I seriously doubt 
that DOCSIS is ever inherently that asymmetric.


Well, the products are, because that's what the operators seems to want, 
probably also because that's what the customers demand.


So my modem has 16x4 (16 downstream channels and 4 upstream channels), 
meaning built into the hardware, I have 1/4 split.


Then providers typically (this is my understanding, I haven't worked 
professionally with DOCSIS networks) do is they have 24 downstream 
channels and 4 upstream channels. Older modems can have 8 downstream and 4 
upstream for instance, so they'll "tune" to the amount of channels they 
can, and then there is an on-demand scheduler that handles upstream and 
downstream traffic.


So I guess theoretically the operator could (if large enough) make a hw 
vendor create a 16x16 modem and have 32 channels total. But nobody does 
that, because that doesn't sell as well as having more downstream (because 
people don't seem to care about upstream). It just makes more market sense 
to sell these asymmetric services, because typically people are eyeballs 
and they don't need a lot of upstream bw (or think they need it).


On the ADSL side, I have seen 28/3 (28 down, 3 up) for annex-M with 
proprietary extensions. The fastest symmetric I have seen is 4.6/4.6. So 
if you as an operator can choose between selling a 28/3 or 4.6/4.6 
service, what will you do? To consumers, it's 28/3 all day.


So people can blame the ISPs all day long, but there is still (as you 
stated) physical limitations on capacity on RF spectrum in air/copper, and 
you need to handle this reality somehow. If a lot of power is used 
upstream then you'll get worse SNR for the downstream, meaning less 
capacity overall. Symmetric access capacity costs real money and results 
in less overall capacity unless it's on point to point fiber.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-13 Thread Mikael Abrahamsson


On Wed, 13 Dec 2017, Jonathan Morton wrote:

Occasionally, of course, practically everyone in the country wants to 
tune into coverage of some event at the same time.  More commonly, they 
simply get home from work and school at the same time every day.  That 
breaks the assumptions behind pure statistical multiplexing, and 
requires a greater provisioning factor.


Reasonable operators have provisioning guidelines that look at actual 
usage, although they probably look at it in 5 minute averages and not 
millisecond as done here in this context.


So they might say "if busy hour average is over 50% 3 days in a week" this 
will trigger a provisioning alarm for that link, and the person (or 
system) will take a more detailed look and look at 5minute average graph 
and decide if this needs to be upgraded or not.


For me the interesting point is always "what's going on in busy hour of 
the day" and never "what's the monthly average transferred amount of 
data".


Of course, this can hide subsecond bufferbloat extremely well (and has), 
but at least this is typically how statistical overprovisioning is done. 
You look at actual usage and make sure your network is never full for any 
sustained amount of time, in normal operation, and make sure you perform 
upgrades well before the growth has resulted in network being full.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-14 Thread Mikael Abrahamsson


On Wed, 13 Dec 2017, Jonathan Morton wrote:

Ten times average demand estimated at time of deployment, and struggling 
badly with peak demand a decade later, yes.  And this is the 
transportation industry, where a decade is a *short* time - like less 
than a year in telecoms.


I've worked in ISPs since 1999 or so. I've been at startups and I've been 
at established ISPs.


It's kind of an S curve when it comes to traffic growth, when you're 
adding customers you can easily see 100%-300% growth per year (or more). 
Then after market becomes saturated growth comes from per-customer 
increased usage, and for the past 20 years or so, this has been in the 
neighbourhood of 20-30% per year.


Running a network that congests parts of the day, it's hard to tell what 
"Quality of Experience" your customers will have. I've heard of horror 
stories from the 90ties where a then large US ISP was running an OC3 (155 
megabit/s) full most of the day. So someone said "oh, we need to upgrade 
this", and after a while, they did, to 2xOC3. Great, right? No, after that 
upgrade both OC3:s were completely congested. Ok, then upgrade to OC12 
(622 megabit/s). After that upgrade, evidently that link was not congested 
a few hours of the day, and of course needed more upgrades.


So at the places I've been, I've advocated for planning rules that say 
that when the link is peaking at 5 minute averages of more than 50% of 
link capacity, then upgrade needs to be ordered. This 50% number can be 
larger if the link aggregates larger number of customers, because 
typically your "statistical overbooking" varies less the more customers 
participates.


These devices do not do per-flow anything. They might have 10G or 100G 
link to/from it with many many millions of flows, and it's all NPU 
forwarding. Typically they might do DIFFserv-based queueing and WRED to 
mitigate excessive buffering. Today, they typically don't even do ECN 
marking (which I have advocated for, but there is not much support from 
other ISPs in this mission).


Now, on the customer access line it's a completely different matter. 
Typically people build with BRAS or similar, where (tens of) thousands of 
customers might sit on a (very expensive) access card with hundreds of 
thousands of queues per NPU. This still leaves just a few queues per 
customer, unfortunately. So these do not do per-flow anything either. This 
is where PIE comes in, because these devices like these can do PIE in the 
NPU fairly easily because it's kind of like WRED.


So back to the capacity issue. Since these devices typically aren't good 
at assuring per-customer access to the shared medium (backbone links), 
it's easier to just make sure the backbone links are not regularily full. 
This doesn't mean you're going to have 10x capacity all the time, it 
probably means you're going to be bouncing between 25-70% utilization of 
your links (for the normal case, because you need spare capacity to handle 
events that increase traffic temporarily, plus handle loss of capacity in 
case of a link fault). The upgrade might be to add another link, or a 
higher tier speed interface, bringing down the utilization to typically 
half or quarter of what you had before.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-17 Thread Mikael Abrahamsson


On Sun, 17 Dec 2017, Matthias Tafelmeier wrote:

What I actually wanted to posit in relation to that is that one could 
get sooner a c-cabable backbone sibling by marrying two ideas: the 
airborne concept ongoing as outlined plus what NASA is planning to bring 
about for the space backbone, e.g [1][2]. It's laser based instead of 
directed radio-wave only. Sure, both is in the speed range of c, 
apparantely, laser transmission has in addition a significantly higher 
bandwidth to offer. "10 to 100 times as much data at a time as 
radio-frequency systems"[3]. Attenuations to photons in clean 
atmospheric air are neglible (few mps - refractive index of about 
1.0003), so actually a neglible slowdown - easily competing with top 
notch fibres (99.7% the vacuum speed of light). Sure, that's the ideal 
case, though, if cleverly done from the procurement of platforms and 
overall system steering perspective, might feasible.


Todays laser links are in the few km per hop range, with is easily at 
least one magnitude shorter than radio based equivalents.


I don't know the physics behind it, but people who have better insight 
than I do tell me "it's hard" to run longer hops (if one wants any kind of 
high bitrate).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today

2017-12-18 Thread Mikael Abrahamsson

 only a 40ms delta
between the pre-hop and hitting my ISP, where it was normally about 11ms
for that link. You could say about 30ms of buffering was going on. The
really interesting thing is I was only getting about 5-10Mb/s, which means
there was virtually zero free bandwidth. but I had almost no packet-loss. I
called my ISP shortly after the issue started and that's when they told me
they were under a DDOS and were at 100% trunk, and they said they were
going to have their trunk bandwidth increased shortly. 5 minutes later, the
issue was gone. About 30 minutes later I was called back and told the DDOS
was still on-going, they just upgraded to enough bandwidth to soak it all.
I found it very interesting that a DDOS large enough to effectively kill
95% of my provisioned bandwidth and increase my ping 30ms over normal, did
not seem to affect packet-loss almost at all. It was well under 0.1%. Is
this due to the statistical nature of large links or did Level 3 have an
AQM to my ISP?


This is interesting. I thought about this for several minutes, but can't 
come up with an explanation to this behaviour, at least not from the 
typical kind of DDOS that's going around. If there was some kind of ddos 
mitigration equipment put into the mix, that might explain what you were 
seeing.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] The Blind Men and the Elephant.

2018-02-14 Thread Mikael Abrahamsson


On Mon, 12 Feb 2018, Dave Taht wrote:


but to me the simpler thing would be to garner folk to ask at
vendor/isp press conferences: "Have you implemented RFC8290 yet? If
not, when?"


Has anyone implemented FQ_CODEL in a packet accelerator, or is this still 
a CPU thing only?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348

2018-04-03 Thread Mikael Abrahamsson


On Tue, 3 Apr 2018, Jonathan Morton wrote:

notwithstanding).  In the end, people have kept reinventing "reliable 
datagram" protocols on top of UDP, whenever they ran up against 
requirements that TCP didn't fulfil.


Yes, for multiple reasons. TCP is ossified and typically lives in the OS, 
because of NAT the only options for protocols that work are TCP and UDP, 
so if you want to move your "transmission stack" to userspace, your only 
choice is UDP. So enter things like QUIC and others that are mux:ed stream 
protocols over UDP, which can then live in userland on all major operating 
systems.


This is not ideal, but it's not strange that this is happening. The only 
way to innovate as an application/protocol developer is to use UDP.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348


On Tue, 3 Apr 2018, Michael Welzl wrote:

Sure, when you’re in control of both ends of a connection, you can build 
whatever you want on top of UDP - but there’s a lot of wheel 
re-inventing there. Really, the transport layer can’t change as long as 
applications (or their libraries) are exposed to only the services of 
TCP and UDP, and thereby statically bound to these transport protocols.


I'm aware of TAPS and I have been trying to gather support for this kind 
of effort for years now, and I'm happy to see there is movement. I have 
also heard encouraging talk from several entities interested in actually 
doing serious work in this area, including some opensourcing part of their 
now non-FOSS code-base as part of that work.


So we need applications to be able to get more access to what's going on 
the wire, including access to non-TCP/UDP, but also to be able to create 
"pluggable TCP-stacks" so that a host can have several different ones, and 
the user can install new ones even on older operating systems.


With more and more IPv6 around, I hope we'll be able to deploy new 
protocols that are not TCP/UDP (A+P), and that this will bring back some 
innovation in that area.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348


On Wed, 4 Apr 2018, Dave Taht wrote:


How dead is posix these days? Ietf does not generally do apis well.


POSIX nowadays is

http://pubs.opengroup.org/onlinepubs/9699919799/

My take on it is that the IETF should not be scared to do APIs, even 
though there is a lot of resistance still.


However, the IETF should not do POSIX APIs, but instead something of their 
own.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348


On Wed, 4 Apr 2018, Michael Welzl wrote:

well - they have been refusing too long to do them at all. i guess 
that’s part of the problem


It's not about refusing to do so, it's because other SDOs have told the 
IETF not to. If IETF tries to touch POSIX, the SDO that does POSIX doesn't 
appreciate this.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Seen in passing: mention of Valve's networking scheme and RFC 5348


On Wed, 4 Apr 2018, Luca Muscariello wrote:

And yes, flow queueing, absolutely. Flow isolation, becomes fundamental 
is such a zoo, or jungle.


There was talk in IETF about a transport protocol that was proposed to do 
a lot of things TCP doesn't do, but still retain some things that has been 
useful with TCP.


I think it was this one:

https://datatracker.ietf.org/doc/draft-ietf-nvo3-gue/

I'd like to see it not over UDP, but rather as a native IP protocol. The 
talk was about having the network being able to look into the state 
machine of the protocol (MSS size, equivalent of SYN, etc) but not into 
payload (which would be end-to-end encrypted). It would also be able to do 
muxed streams/message based to avoid head-of-line-blocking because of 
single packet loss.


So any of this that comes up then the whole FQ machinery might benefit 
frmo being able to identify flows in any new protocol, but I imagine this 
is not a hard thing to do. I still have hopes for the flow label in IPv6 
to do this job, even though it hasn't seen wide adoption so far.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Fwd: [Bug 1436945] Re: devel: consider fq_codel as the default qdisc for networking

2018-06-05 Thread Mikael Abrahamsson


On Tue, 5 Jun 2018, Jonas Mårtensson wrote:


What about PLPMTU?  Do you think they might tweak that too?

 net.ipv4.tcp_mtu_probing=2
 (despite name, applies to IPv6 too)



Maybe, suggest it on their github. But I would maybe propose instead
net.ipv4.tcp_mtu_probing=1.


MTU probing would be awsome. I am great fan of PLPMTU and this should be 
default-on everywhere in all protocols.


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Make-wifi-fast] Van Jacobson's slides on timing wheels at netdevconf

2018-07-23 Thread Mikael Abrahamsson


On Sat, 21 Jul 2018, Jonathan Morton wrote:

An example of such a situation would be sparse flows in DRR++, which is 
a key part of fq_codel and Cake.  So to implement DRR++ using timing 
wheels, you have to choose your scheduling horizon carefully so as to 
minimise the delay to sparse packets.


At the spring IETF, there was talk from IEEE person about using ethernet 
pause frames to get senders to stop talking for a while. My understanding 
was that this was on microsecond scale or even nanosecond time scales.


One of the mentions in the presentation was on slide 10 about 
"fat-buffered router". In the data center, these are kind of going away, 
because on-die memory is small and rates are high. A 64x100GE forwarding 
asic might have 16MB of buffer, which is very little buffer for the kind 
of bit rates we're talking here.


https://www.youtube.com/watch?v=sJMvAqEQCBE 1h44m in (proposed IEEE 
802.1Qcz work) is the one I am thinking of.


Wonder how this would interact with the timing wheel proposed by Van 
Jacobson?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Flow offload's impact on bufferbloat

2018-08-17 Thread Mikael Abrahamsson


On Fri, 10 Aug 2018, Rosen Penev wrote:


My question is not really how to fix it. I already know that. I just
got the feeling that bypassing parts of the linux network stack would
result in less buffering.


On the OpenWrt configuration page for the "software flow offload":

"Experimental feature. Not fully compatible with QoS/SQM."

I don't know exactly what it does, it reduces amount of CPU cycles needed 
to forward packets in an already established flow it seems, but I'd 
imagine that it might very well bypass some of the scheduling code which 
could explain what you're seeing. So you might get faster forwarding but 
less AQM.


So if your device isn't fast enough to keep up with your total Internet 
access speed, then this might be a good thing. If your device is faster 
than what's needed, then you'd better spend the cycles on getting good AQM 
instead of freeing up more CPU that isn't used for anything anyway.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] beating the drum for BQL

2018-08-22 Thread Mikael Abrahamsson


On Wed, 22 Aug 2018, Dave Taht wrote:

I/we really should have beat the bql drum harder over the last 6 years. 
It's the basic start to all the debloating.


It only helps with kernel based forwarding. A lot of devices don't even 
use this, especially as speeds go up. They use packet accelerators so the 
kernel never sees the packets after initial flow setup.


So you need to get the people developing that silicon to get with the 
program.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] beating the drum for BQL

2018-08-23 Thread Mikael Abrahamsson


On Thu, 23 Aug 2018, Pete Heist wrote:




On Aug 23, 2018, at 2:49 AM, Dave Taht  wrote:

I had a chance to give a talk at broadcom recently, slides here:

http://flent-fremont.bufferbloat.net/~d/broadcom_aug9.pdf 
<http://flent-fremont.bufferbloat.net/~d/broadcom_aug9.pdf>


Thanks for sharing, this is really useful, raising awareness where it matters. 
Quite a bit of content... :)

Ubiquiti needs some work getting this into more of their products (EdgeMAX in 
particular). A good time to lobby for this might be, well a couple months ago, 
as they’re producing alpha builds for their upcoming 2.0 release with kernel 
4.9 and new Cavium/Mediatek/Octeon SDKs. I just asked about the status in the 
EdgeRouter Beta forum, in case it finds the right eyes before the release:

https://community.ubnt.com/t5/EdgeRouter-Beta/BQL-support/m-p/2466657 
<https://community.ubnt.com/t5/EdgeRouter-Beta/BQL-support/m-p/2466657>

https://community.ubnt.com/t5/EdgeMAX-Beta-Blog/New-EdgeRouter-firmware-2-0-0-alpha-2-has-been-released/ba-p/2414938
 
<https://community.ubnt.com/t5/EdgeMAX-Beta-Blog/New-EdgeRouter-firmware-2-0-0-alpha-2-has-been-released/ba-p/2414938>


My only experience with these devices is the Edgerouter 3/5/X, and they 
have very low performance if you disable offloads (which you need to do to 
enable AQM) and run everything in CPU, around 100 megabit/s of 
uni-directional traffic.


Do they have other platforms where this would actually matter?


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] beating the drum for BQL

2018-08-23 Thread Mikael Abrahamsson


On Thu, 23 Aug 2018, Sebastian Moeller wrote:

router should be able to handle at least the sold plan's bandwidth with 
its main CPU...)


There is exactly one SoC on the market that does this, and that's Marvell 
Armada 385, and it hasn't been very successful when it comes to ending up 
in these kinds of devices. It's mostly ended up in NASes and devices such 
as WRT1200AC, WRT1900ACS, WRT3200AC.


	Sure doing less/ a half asses job is less costly than doing it 
right, but in the extreme not doing the job at all saves even more 
energy ;). And I am not sure we are barking up the right tree here, it 
is not that all home CPE are rigorously optimized for low power and 
energy saving... my gut feeling is that the only optimizing principle is 
cost for the manufacturer/OEM and that causes underpowered CPU that are 
packet-accerlerated"-doped to appear to be able to do their job. I might 
be wrong though, as I have ISP internal numbers on this issue.


The CPU power and RAM/flash has crept up a lot in the past 5 years because 
other requirements in having the HGW support other applications than just 
being a very simple NAT44+wifi router.


Cost is definitely an optimization, and when you're expected to have a 
price-to-customer including software in the 20-40 EUR/device range, then 
the SoC can't cost much. There has also been a lot of vendor lock-in.


But now speeds are creeping up even more, we're now seeing 2.5GE and 10GE 
platforms, which require substantial CPU power to do forwarding. The Linux 
kernel is now becoming the bottleneck in the forwarding, not even on a 
3GHz Intel CPU is it possible to forward even 10GE using the normal Linux 
kernel path (my guess right now is that this is due to context switching 
etc, not really CPU performance).


Marvell has been the only one to really aim for lots of CPU performance in 
their SoC, there might be others now going the same path but it's also a 
downside if the CPU becomes bogged down with packet forwarding when it's 
also expected to perform other tasks on behalf of the user (and ISP).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] beating the drum for BQL

2018-08-23 Thread Mikael Abrahamsson


On Thu, 23 Aug 2018, Rosen Penev wrote:

Flow offloading can save quite a bit of CPU, even when done in software. 
It also helps that the kernel network stack is getting better.


I tried this on my 10GE x86-64 test bed. It didn't help. It's %sirq 
limited it seems. flowoffload changed nothing. It helps on lower end CPU 
platforms (I've tried it there too), but not for the 10GE forwarding case.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] beating the drum for BQL


On Thu, 23 Aug 2018, Dave Taht wrote:

On the marvell front... yes, they tend to produce hardware that runs too 
hot. I too rather like the chipset, and it's become my default hw for 
most things in the midrange.


I checked my WRT1200AC and it idles at 8W. My similar Broadcom box idles 
at 10W, but that one has a lot more on the motherboard plus 4x4 wifi that 
tends to run very hot. I intend to try them under load though and see how 
much power usage changes.



Lastly... there are still billions of slower ISP links left in the
world to fix, with hardware that now costs well under
40 bucks. The edgerouter X is 50 bucks (sans wifi) and good to
~180mbps for inbound shaping presently. Can we get those edge
connections fixed???


There are indeed these kinds of slower devices, but it's also that they 
tend to be the kind of device that last saw development a few years ago 
and only reason it's still being new installed is because it's cheap.


In most of the world, customers do not rent the CPE so there is no cash 
flow to the ISP to fix anything. So they tend to sit there until they 
break.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] beating the drum for BQL


On Thu, 23 Aug 2018, Dave Taht wrote:


I should also point out that the kinds of routing latency numbers in
those blog entries was on very high end intel hardware. It would be
good to re-run those sort of tests on the armada and others for
1,10,100, 1000 routes. Clever complicated algorithms have a tendency
to bloat icache and cost more than they are worth, fairly often, on
hardware that typically has 32k i/d caches, and a small L2.


My testing has been on OpenWrt with 4.14 on intel x86-64. Looking how the 
box behaves, I'd say it's limited by context switching / interrupt load, 
and not actually by CPU being busy doing "hard work".


All of the fast routing implementations (snabbswitch, FD.IO/VPP etc) they 
take away CPU and devices from Linux, and runs busy-loop with polling a 
lot of the time, an never context switching which means L1 cache is never 
churned. This is how they become fast. I see potential to do "XDP 
offload" of forwarding here, basically doing similar job to what a 
hardware packet accelerator does. Then we can optimise forwarding by using 
lessons learnt from the other projects potentially. Need to keep the 
bufferbloat work in mind when doing this though, so we don't make that bad 
again.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] beating the drum for BQL


On Fri, 24 Aug 2018, Dave Taht wrote:

My ar71xx/ath9 hw - like nanostations - was below 2w. wndr3800 don't 
remember, I think the the ethernet switch added quite a bit. But 8Ws? 
not even close to that. A modern LED lightbulb eats that and sheds quite 
a lot of light.


My very simple and stupid 1GE SFP/ethernet fiber media converter, uses 
4.3W when idling.



Random curiosity: what do various SFP+ interfaces (notably gpon) eat?
has anyone got a gpon interface for the omnia yet? I *hate* the need
for ONTs.


These can easily be 1-2 Watts. I put in a 1GE SFP into the before 
mentioned Broadcom HGW and power usage went up from 9.4W to 10.2W. So if 
it's a GPON or similar then I'd imagine it's substantially more 
considering that it's quite a lot more things a GPON device needs to do.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] beating the drum for BQL


On Fri, 24 Aug 2018, Toke Høiland-Jørgensen wrote:

Are there actually any 10GE embedded platforms one can buy? I've been 
thinking about how to upgrade my home network without putting x86 boxes 
everywhere...


https://www.solid-run.com/marvell-armada-family/macchiatobin/

I know people currently working on XDP-enabling the drivers for that 
board (Marvell 8040).


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] beating the drum for BQL


On Fri, 24 Aug 2018, Jan Ceuleers wrote:


On 24/08/18 13:46, Jan Ceuleers wrote:

On 24/08/18 10:06, Dave Taht wrote:

Random curiosity: what do various SFP+ interfaces (notably gpon) eat?


I have taken a look at a couple. I see numbers in the range 1.7 - 2.2W
for GPON ONTs.


Just to be clear: that's for GPON SFP ONTs.


Just the SFP, right?

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Marvell 385

2018-08-26 Thread Mikael Abrahamsson


On Sat, 25 Aug 2018, Dave Taht wrote:

The expressobin is a Marvell Armada "3700LP (88F3720) dual core ARM 
Cortex A53 processor up to 1.2GHz" - how does that compare? I have 
plenty of ath10k and ath9k pcmcia cards


I have one of these, incl wifi. Right now the drivers are not in great 
shape, but they're being worked on. My espressobin has worse performance 
than on its wired ports than my WRT1200AC (Armada 385).


I have talked to people who say the drivers are being worked on though... 
If you have input, Kaloz is probably a great person to take that input. I 
know other people working on Marvell drivers as well.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Marvell 385

2018-08-27 Thread Mikael Abrahamsson


On Sun, 26 Aug 2018, Dave Taht wrote:


I was on that thread. It was broken before entirely. As for the single
interrupt on this chip variant - believe it or not, I'm not huge on


When doing 10GE tests on x86-64 I received the highest performance when I 
set interrupt affinity to single core per interface.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] first bufferbloat free cablemodem?

2018-10-07 Thread Mikael Abrahamsson


On Sun, 7 Oct 2018, Aaron Wood wrote:


Maybe he's on a DOCSIS 3.1 headend that's also using pie?  Pie doesn't need
to know the outbound rate, correct?  as it's meant to be driven by the
RTS/CTS type behavior that the upstream traffic on cable has (the correct
terms for cable aren't coming to mind at the moment).


Correct, PIE acts on the queue just like CODEL. From what I can tell, PIE 
is a queue discipline that can be implemented on hardware that supports 
WRED (which most can) with the help of extra software and some CPU cycles 
to tune it over time. That's why HW manufacturers like PIE.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DNSSEC key rollover today

2018-10-11 Thread Mikael Abrahamsson


On Thu, 11 Oct 2018, Dave Taht wrote:


if any of you are still using cerowrt, and dnssec, it's gonna break
unless you update this, or disable dnssec... I do not know if the new
key was in openwrt 18.06 either...

http://www.circleid.com/posts/20181005_how_to_prepare_for_dnssec_root_ksk_rollover_on_october_11_2018/


Just as an operational concern, if you have an old image of something (pre 
mid 2017) that doesn't have the new key, it's not going to be able to 
download the new key using the old key, as of today.


Any old install might have the key update function implemented and might 
have the new key, but as soon as you re-install and the new key is not 
there anymore, it'll stop working.


A DNSSEC validating device needs to have functionality to get the root key 
somehow and keep it updated. Otherwise it's better to just not validate at 
all if one cares about operational availability of the service.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] vyatta in AT&T 5G gear

2018-10-16 Thread Mikael Abrahamsson


On Mon, 15 Oct 2018, Dave Taht wrote:


Vyos (the open source fork of vyatta) was one of the first to add
fq_codel support... I wonder

http://linuxgizmos.com/att-releases-white-box-spec-for-its-linux-based-5g-routers/


Isn't Vyos just running the Linux kernel for forwarding? So they received 
fq_codel for free when the Linux kernel got support for it? They just had 
to make it configurable?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

[Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH

2018-11-11 Thread Mikael Abrahamsson



Hi,

I am running "stock" OpenWrt 18.06.1 on an WRT1200AC with 
CAKE+piece_of_cake.qos and set to 250 down 100 up. This is on an ethernet 
point-to-point FTTH connection in Stockholm, Sweden. Basically just 
installed OpenWrt and then added the sqm-scripts-extra and luci-app-sqm 
packages, went in and configured the correct settings in the web UI, and 
then everything was great.


Biggest benefit with this FTTH setup is that I don't have to experience 
the first-hop sceduler I had with my previous DOCSIS connection (that also 
sometimes didn't do advertised bandwidth so I ended up getting 10-30ms of 
bufferbloat).


http://www.dslreports.com/speedtest/41682104

The smokeping screenshots below show the difference between DOCSIS and 
FTTH scheduler, but the much lower access RTT (1-2 ms ) and the lower PDV 
(which seems to be several ms on DOCSIS but not on my P2P FTTH).


https://imgur.com/a/96dFdho

Thanks everybody for the excellent packaging and ease of use for end users 
to get this to work. I've had this running now for 40 days without any 
issue.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH


On Mon, 12 Nov 2018, Dave Taht wrote:


tc -s qdisc show dev your_device?
tc -s qdisc show dev your_ifbdevice?


I haven't restarted in 40 days and I don't remember restarting cake, so 
this should be several weeks of data.


qdisc cake 8031: dev eth1.2 root refcnt 2 bandwidth 100Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 70822286277 bytes 202513660 pkt (dropped 13984, overlimits 25350421 
requeues 0)

 backlog 0b 0p requeues 0
 memory used: 5156288b of 500b
 capacity estimate: 100Mbit
 min/max network layer size:   42 /1514
 min/max overhead-adjusted size:   42 /1514
 average network hdr offset:   14

  Tin 0
  thresh100Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay  4us
  av_delay  1us
  sp_delay  1us
  backlog0b
  pkts202527644
  bytes 70842325936
  way_inds  4939006
  way_miss 11834545
  way_cols0
  drops   13984
  marks 512
  ack_drop0
  sp_flows2
  bk_flows1
  un_flows0
  max_len 28766
  quantum  1514

qdisc ingress : dev eth1.2 parent :fff1 
 Sent 807912654344 bytes 631652827 pkt (dropped 0, overlimits 0 requeues 
0)

 backlog 0b 0p requeues 0
qdisc cake 8032: dev ifb4eth1.2 root refcnt 2 bandwidth 250Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 829571211610 bytes 631641016 pkt (dropped 11811, overlimits 
79004 requeues 0)

 backlog 0b 0p requeues 0
 memory used: 4540528b of 1250b
 capacity estimate: 250Mbit
 min/max network layer size:   60 /1514
 min/max overhead-adjusted size:   60 /1514
 average network hdr offset:   14

  Tin 0
  thresh250Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay1.2ms
  av_delay559us
  sp_delay  1us
  backlog0b
  pkts631652827
  bytes829588333230
  way_inds 12061686
  way_miss 12913211
  way_cols1
  drops   11811
  marks3589
  ack_drop0
  sp_flows1
  bk_flows1
  un_flows0
  max_len 38444
  quantum  1514




--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH


On Mon, 12 Nov 2018, Dave Taht wrote:


I guess my biggest question is how bloated is the "Before cake"
version of the link?


Not very.

http://www.dslreports.com/speedtest/41693199

I then did another test while at the same time doing a different vendor 
speedtest:


http://www.dslreports.com/speedtest/41693256

Ping just increased 5-10 ms when doing this.

If I then re-enable cake with 25/10 I get:

http://www.dslreports.com/speedtest/41693346

qdisc after this last test:

qdisc cake 8034: dev eth1.2 root refcnt 2 bandwidth 100Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 391610860 bytes 650447 pkt (dropped 1430, overlimits 645558 requeues 
0)

 backlog 0b 0p requeues 0
 memory used: 2425408b of 500b
 capacity estimate: 100Mbit
 min/max network layer size:   46 /1514
 min/max overhead-adjusted size:   46 /1514
 average network hdr offset:   14

  Tin 0
  thresh100Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay 82us
  av_delay  6us
  sp_delay  1us
  backlog0b
  pkts   651877
  bytes   393761357
  way_inds11602
  way_miss 3103
  way_cols0
  drops1430
  marks   0
  ack_drop0
  sp_flows   16
  bk_flows1
  un_flows0
  max_len 18168
  quantum  1514

qdisc ingress : dev eth1.2 parent :fff1 
 Sent 896042971 bytes 760157 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev tun2025 root refcnt 2 limit 10240p flows 1024 
quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn

 Sent 21580 bytes 166 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc cake 8035: dev ifb4eth1.2 root refcnt 2 bandwidth 250Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 912501460 bytes 754253 pkt (dropped 5904, overlimits 926439 requeues 
0)

 backlog 0b 0p requeues 0
 memory used: 805712b of 1250b
 capacity estimate: 250Mbit
 min/max network layer size:   60 /1514
 min/max overhead-adjusted size:   60 /1514
 average network hdr offset:   14

  Tin 0
  thresh250Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay650us
  av_delay429us
  sp_delay  1us
  backlog0b
  pkts   760157
  bytes   921432581
  way_inds17426
  way_miss 3168
  way_cols0
  drops5904
  marks   0
  ack_drop0
  sp_flows7
  bk_flows1
  un_flows0
  max_len 15104
  quantum  1514


it seems to smoothe out the flows better than my ISP shaper.

These tests are done when rest of people in the household was also using 
Internet for other things, so not "clean room".


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH


On Mon, 12 Nov 2018, Dave Taht wrote:


Don't use this connection much, do you? :)


Last 4 week average is 300 kilobit/s up and 3000 kilobit/s down. So no. 
Mostly streaming Netflix and similar things.



   marks 512


and that you have at least one device with ecn enabled. Would this be
OSX or IOS perhaps?


I typically turn it on on all devices I remember to turn it on. There are 
plenty of iOS devices in the household, but also ECN enabled OSX machines.



I don't suppose you have someone else "across town" you could run some
benchmarks against?


Surely. I can run anything you need, I have 1GE ubuntu machine 
~3ms away. What tests do you want me to run? I have ubuntu laptop here I 
can run wired tests with. It already has flent installed, so just tell me 
what you want me to do and test. If you want me to change qdisc settings 
I'm going to need good instructions, I am not proficient in changing those 
settings.



Similarly, a cpu number under load. I note here, that splitting GSO
has a big cost, (primarily in routing table lookup) and you can at
these speeds, probably disable it.


sirq% peaks out around 35-40% when doing download at 250 megabit/s. Around 
10% when doing upload at 100 megabit/s. Armada 385 is nice.


I was also expecting 64k here. I imagine you are using modern linuxes 
that don't overuse TSO anymore, and osx and windows never got into it to 
the extreme that linux did.


root@wrt1200-hemma:~# uname -a
Linux wrt1200-hemma 4.14.63 #0 SMP Wed Aug 15 20:42:39 2018 armv7l GNU/Linux



--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH


On Mon, 12 Nov 2018, Dave Taht wrote:


If I then re-enable cake with 25/10 I get:

http://www.dslreports.com/speedtest/41693346


I don't "get" the knee in the download curve here and the prior test.


That's when I start a competing speedtest to the local swedish speedtest 
site using an OSX app they ship.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] one benefit of turning off shaping + fq_codel

2018-11-14 Thread Mikael Abrahamsson


On Tue, 13 Nov 2018, Dave Taht wrote:


It turns out we are contributing to global warming.

https://community.ubnt.com/t5/UniFi-Routing-Switching/USG-temperature/m-p/2547046/highlight/true#M115060


There is a reason vendors have packet accelerators. It's more efficient 
compared to doing everything in CPU.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] known buffer sizes on switches

2018-11-24 Thread Mikael Abrahamsson


On Sat, 24 Nov 2018, Dave Taht wrote:


https://people.ucsc.edu/~warner/buffer.html


Nice resource, thanks.

If someone wonders why things look the way they do, so it's all about 
on-die and off-die memory. Either you use off-die or on-die memory, often 
SRAM which requires 6 gates per bit. So spending half a billion gates 
gives you ~10MB buffer on-die. If you're doing off-die memory (DRAM or 
similar) then you'll get the gigabytes of memory seen in some equipment. 
There basically is nothing in between. As soon as you go off-die you might 
as well put at least 2-6 GB in there.


Also, off-die memory takes IO capacity. A forwarding chip might have 4 
"sides" with I/O lanes sets. If you put it in a 1RU device with no buffer, 
you can connect ports to all of the lanes. This gives you a very high port 
density low buffer size device and a very good price point.


Now, if you want more buffer and more route memory (taking one "side" 
each) plus connecting it to a backplane (another side), you now only have 
a single "side" left for ports. This is why high route-count, high buffer, 
modular switches are so much more expensive compared low-route, 
low-buffer, fixed configuration ones.


Above is principle, there are of course combinations and optimizations to 
be made so not all devices adhere exactly to the above.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?


On Tue, 27 Nov 2018, Luca Muscariello wrote:

link fully utilized is defined as Q>0 unless you don't include the 
packet currently being transmitted. I do, so the TXtteer is never idle. 
But that's a detail.


As someone who works with moving packets, it's perplexing to me to 
interact with transport peeps who seem enormously focused on "goodput". My 
personal opinion is that most people would be better off with 80% of their 
available bandwidth being in use without any noticable buffer induced 
delay, as opposed to the transport protocol doing its damndest to fill up 
the link to 100% and sometimes failing and inducing delay instead.


Could someone perhaps comment on the thinking in the transport protocol 
design "crowd" when it comes to this?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?


On Tue, 27 Nov 2018, Luca Muscariello wrote:


A BDP is not a large buffer. I'm not unveiling a secret.


It's complicated. I've had people throw in my face that I need 2xBDP in 
buffer size to smoothe things out. Personally I don't want more than 10ms 
buffer (max), and I don't see why I should need more than that even if 
transfers are running over hundreds of ms of light-speed-in-medium induced 
delay between the communicating systems.


I have routers that are perfectly capable at buffering packets for 
hundreds of ms even at hundreds of megabits/s of access speed. I choose 
not to use them though, and configure them to drop packets much earlier.


My point was that FQ_codel helps to get very close to the optimum w/o 
adding useless queueing and latency. With a single queue that's almost 
impossible. No, sorry. Just impossible.


Right, I realise I wasn't clear I wasn't actually commenting on your 
specific text directly, my question was more generic.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?


On Tue, 27 Nov 2018, Luca Muscariello wrote:


If you, Mikael don't want more than 10ms buffer, how do you achieve that?


class class-default
  random-detect 10 ms 2000 ms

That's the only thing available to me on the platforms I have. If you 
would like this improved, please reach out to the Cisco ASR9k BU and tell 
them to implement ECN and PIE (or something even better). They won't do it 
because I say so, it seems. WRED is all they give me.



You change the behaviour of the source and hope flow isolation is available.


Sorry, I only transport the packets, I don't create them.


If you just cut the buffer down to 10ms and do nothing else, the only thing
you get is a short queue and may throw away half of your link capacity.


If i have lots of queue I might instead get customer complaints about high 
latency for their interactive applications.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?


On Tue, 27 Nov 2018, Luca Muscariello wrote:

This is a whole different discussion but if you want to have a per-user 
context at the BNG level + TM + FQ I'm not sure that kind of beast will 
ever exist. Unless you have a very small user fan-out the hardware 
clocks could loop over several thousands of contexts. You should expect 
those kind of features to be in the CMTS or OLT.


This is per queue per customer access port (250 customers per 10GE port, 
so 250 queues). It's on an "service edge" linecard that I imagine people 
use for BNG purposes. I tend to not use words like that because to me a 
router is a router.


I do not do coax. I do not do PON. I do point to point ethernet using 
routers and switches, like god^WIEEE intended.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

[Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

2018-11-28 Thread Mikael Abrahamsson


On Wed, 28 Nov 2018, Dave Taht wrote:


see ecn-sane. Please try to write a position paper as to where and why
ecn is good and bad.

if one day we could merely establish a talmud of commentary
around this religion it would help.


From my viewpoint it seems to be all about incremental deployment. We have 
30 years of "crud" that things need to work with, and the worst-case needs 
to be a disaster for anything that wants to deploy.


This is one thing about L4S, ETC(1) is the last "codepoint" in the header 
not used, that can statelessly identify something. If anyone sees a better 
way to use it compared to "let's put it in a separate queue and CE-mark it 
agressively at very low queue depths and also do not care about 
re-ordering so a ARQ L2 can re-order all it wants", then they need to 
speak up, soon.


I actually think the "let's not care about re-ordering" would be a 
brilliant thing, it'd help quite a lot of packet network types become less 
costly and more efficient, while at the same time not doing blocking of 
subsequent packets just because some earlier packet needed to be 
retransmitted. Brilliant for QUIC for instance, that already handles this 
(at least per-stream).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

2018-11-28 Thread Mikael Abrahamsson


On Thu, 29 Nov 2018, Jonathan Morton wrote:


You are essentially proposing using ECT(1) to take over an intended function of 
Diffserv.


Well, I am not proposing anything. I am giving people a heads-up that the 
L4S authors are proposing this.


But yes, you're right. Diffserv has shown itself to be really hard to 
incrementally deploy across the Internet, so it's generally bleached 
mid-path.


In my view, that is the wrong approach.  Better to improve Diffserv to 
the point where it becomes useful in practice.


I agree, but unfortunately nobody has made me king of the Internet yet so 
I can't just decree it into existance.


 Cake has taken steps in that direction, by implementing some reasonable 
interpretation of some Diffserv codepoints.


Great. I don't know if I've asked this but is CAKE easily implementable in 
hardware? From what I can tell it's still only Marvell that is trying to 
put high performance enough CPUs into HGWs to do forwarding in CPU (which 
can do CAKE), all others still rely on packet accelerators to achieve the 
desired speeds.


My alternative use of ECT(1) is more in keeping with the other 
codepoints represented by those two bits, to allow ECN to provide more 
fine-grained information about congestion than it presently does.  The 
main challenge is communicating the relevant information back to the 
sender upon receipt, ideally without increasing overhead in the TCP/IP 
headers.


You need to go into the IETF process and voice this opinion then, because 
if nobody opposes in the near time then ECT(1) might go to L4S 
interpretation of what is going on. They do have ECN feedback mechanisms 
in their proposal, have you read it? It's a whole suite of documents, 
architecture, AQM proposal, transport proposal, the entire thing.


On the other hand, what you want to do and what L4S tries to do might be 
closely related. It doesn't sound too far off.


Also, Bob Briscoe works for Cable Labs now, so he will now have silicon 
behind him. This silicon might go into other things, not just DOCSIS 
equipment, so if you have use-cases that L4S doesn't do but might do with 
minor modification, it might be better to join him than to fight him.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)


On Thu, 29 Nov 2018, Jonathan Morton wrote:


I'd say the important bits are only slightly harder than doing the same with 
fq_codel.


Ok, FQ_CODEL is way off to get implemented in HW. I haven't heard anyone 
even discussing it. Have you (or anyone else) heard differently?


I believe much of Cake's perceived CPU overhead is actually down to 
inefficiencies in the Linux network stack.  Using a CPU and some modest 
auxiliary hardware dedicated to moving packets, not tied up in handling 
general-purpose duties, then achieving greater efficiency with 
reasonable hardware costs could be quite easy, without losing the 
flexibility to change algorithms later.


I need to watch the MT7621 packet accelerator talk at the most recent 
OpenWrt summit. I installed OpenWrt 18.06.1 on an Mikrotik RB750vGR3 and 
just clicked my way around in LUCI and enabled flow offload and b00m, it 
now did full gig NAT44 forwarding. It's implemented as a -j FLOWOFFLOAD 
iptables rule. The good thing here might be that we could throw 
unimportant high speed flows off to the accelerator and then just handle 
the time sensitive flows in CPU, and just make sure the CPU has 
preferential access to the media for its time-sensitive flow. That kind of 
approach might make FQ_CODEL deployable even on slow CPU platforms with 
accelerators because you would only run some flows through FQ_CODEL, where 
the bulk high-speed flows would be handed off to acceleration (and we 
guess they don't care about PDV and bufferbloat).


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)

On Thu, 29 Nov 2018, Sebastian Moeller wrote:

As far as I can tell intel is pushing atom/x86 cores into its
docsis SoCs (puma5/6/7) as well as into the high-end dsl SoCs (formerly
lantiq,
https://www.intel.com/content/www/us/en/smart-home/anywan-grx750-home-gateway-brief.html?wapkw=grx750),
I am quite confident that those also pack enough punch for CPU based
routing at Gbps-rates. In docsis modems these are already rolled-out, I
do not know of any DSL modem/router that uses the GRX750

"10 Gbit/s packet processor".

Game over, again.

Call me naive, but the solution to the impasse at getting a common
definition of diffserv agreed upon is replacing all TCP CC algorithms?
This is replacing changing all endpoints (and network nodes) to honor
diffserve with changing all endpoints to use a different TCP CC. At
least I would call that ambitious (unless L4S offers noticeable
advantages for all participating without being terribly unfair to the
non-participating legacy TCP users*).

L4S proposes a separate queue for the L4S compatible traffic, and some
kind of fair split between L4S and non-L4S traffic. I guess it's kind of
along the lines of my earlier proposals about having some kind of fair
split with 3 queues for PHB LE, BE and the rest. It makes it deployable in
current HW without the worst kind of DDoS downsides imaginable.

The Internet is all about making things incrementally deployable. It's
very frustrating, but that's the way it is. Whatever we want to propose
needs to work so-so with what's already out there and it's ok if it takes
a while before it makes everything better.

I'd like diffserv to work better, but it would take a lot of work in the
operator community to bring it out to where it needs to be. It's not
hopeless though, and I think
https://tools.ietf.org/html/draft-ietf-tsvwg-le-phb-06 is one step in the
right direction. Just the fact that we might have two queues instead of
one in the simplest implementations might help. The first step is to get
ISPs to not bleach diffserv but at least allow 000xxx.

--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)


On Thu, 29 Nov 2018, Jonathan Morton wrote:

I have to ask, why would the network care?  What optimisations can be 
obtained by reordering packets *within* a flow, when it's usually just 
as easy to deliver them in order?


Because most implementations aren't flow aware at all and might have 4 
queues, saying "oh, this single queue is for transports that don't care 
about ordering" means everything in that queue can just be sent as soon as 
it can, ignoring HOL caused by ARQ.


Of course, we already have FQ which reorders packets in *different* 
flows.  The benefits are obvious in that case.


FQ is a fringe in real life (speaking as a packet moving monkey). It's 
just on this mailing list that it's the norm.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] when does the CoDel part of fq_codel help in the real world?


On Thu, 29 Nov 2018, Stephen Hemminger wrote:

The problem is that any protocol is mostly blind to the underlying 
network (and that can change).  To use dave's analogy it is like being 
put in the driver seat of a vehicle blind folded.  When you step on the 
gas you don't know if it is a dragster, jet fighter, or a soviet 
tractor. The only way a protocol can tell is based on the perceived 
inertia and when it runs into things...


Actually, I've made the argument to IETF TCPM that this is not true. You 
can be able to communicate earlier data from previous flows on the same 
connection so that new flows can re-learn this.


If no flow the past hour has been able to run faster than 1 megabit/s and 
always PMTUD to 1460 bytes MTU outbound, then there is good chance that 
the next flow will encounter the same thing. Why not use this information 
when guessing how things will behave going forward?


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] incremental deployment, transport and L4S (Re: when does the CoDel part of fq_codel help in the real world?)