Re: [Bloat] [Starlink] Of interest: Comcast AQM Paper
On Wed, 4 Aug 2021, Sebastian Moeller wrote: I guess the point is AQM is not really that expensive, even FQ AQM, traffic shaping however is expensive. But for wifi shaping is not required so AQM became feasible. My point is that CPU based forwarding has very bad performance on some platforms, regardless if you're doing shaping, AQM or none of them (FIFO). If it's not hw accelerated, it sucks. When I did tests on MT7621 it did ~100 meg/s without flow-offload, and full gig with it. -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] [Starlink] Of interest: Comcast AQM Paper
On Wed, 4 Aug 2021, Jonathan Morton wrote: Linux-based CPE devices have AQM functionality integrated into the Wifi stack. The AQM itself operates at layer 3, but the Linux Wifi stack implementation uses information from layers 2 and 4 to improve scheduling decisions, eg. airtime-fairness and flow-isolation (FQ). This works best on soft-MAC Wifi hardware, such as ath9k/10k and MT76, where this information is most readily available to software. In principle it could also be implemented in the MAC, but I don't know of any vendor that's done that yet. Does this work also with flowoffload enabled, or is that not accelerated on for instance MT76? I'm surprised since MT76 can barely do 100 meg/s of large packets using only CPU? -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Updated Bufferbloat Test
On Thu, 25 Feb 2021, Simon Barber wrote: The ITU say voice should be <150mS, however in the real world people are a lot more tolerant. A GSM -> GSM phone call is ~350mS, and very few people complain about that. That said the quality of the conversation is affected, and staying under 150mS is better for a fast free flowing conversation. Most people won't have a problem at 600mS and will have a problem at 1000mS. That is for a 2 party voice call. A large group presentation over video can tolerate more, but may have issues with talking over when switching from presenter to questioner for example. I worked at a phone company 10+ years ago. We had some equipment that internally was ATM based and each "hop" added 7ms. This in combination with IP based telephony at the end points that added 40ms one-way per end-point (PDV buffer) caused people to complain when RTT started creeping up to 300-400ms. This was for PSTN calls. Yes, people might have more tolerance with mobile phone calls because they have lower expectations when out and about, but my experience is that people will definitely notice 300-400ms RTT but they might not get upset enough to open a support ticket until 600ms or more. -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Updated Bufferbloat Test
On Wed, 24 Feb 2021, Sina Khanifar wrote: https://www.waveform.com/tools/bufferbloat I thought I just wanted to confirm that the tool seems to accurately seems to measure even higher speeds. This is my 1000/1000 debloated with FQ_CODEL to 900/900: https://www.waveform.com/tools/bufferbloat?test-id=1ad173ce-9b9f-483c-842c-ea5cc08c2ff6 This is with SQM removed: https://www.waveform.com/tools/bufferbloat?test-id=67168eb7-f7e2-44eb-9720-0dd52c725e8c My ISP has told me that they have a 10ms FIFO in my downstream direction, and openwrt defaults with FQ_CODEL in the upstream direction, and this seems to be accurately reflected in what the tool shows. Also my APU2 can't really keep up with 900/900 I think, because when I set SQM to 500/500 I get very tightly controlled PDV: https://www.waveform.com/tools/bufferbloat?test-id=58626d8c-2eea-43f9-9904-b1ec43f28235 Tool looks good, I like it! Thanks! -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] UniFi Dream Machine Pro
On Fri, 22 Jan 2021, Stuart Cheshire via Bloat wrote: Is implementing CoDel queueing really 10x more burden than running “Ubiquiti’s proprietary Deep Packet Inspection (DPI) engine”? Is CoDel 4x more burden than Ubiquiti’s IDS (Intrusion Detection System) and IPS (Intrusion Prevention System)? No, it isn't but all the other functions have hw offloads and the CoDel numbers shown are when you turn hw offloads off, basically only running it in CPU forwarding mode. That's when you get those kinds of numbers (~800 megabit/s). When enabling SQM on their USG3 you get ~100 megabit/s of throughput, because it has a very slow CPU (but has plenty of offloads, so full gig with offloads enabled works well, but then you don't get any SQM/DPI). -- Mikael Abrahamssonemail: swm...@swm.pp.se___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Other CAKE territory (was: CAKE in openwrt high CPU)
On Fri, 4 Sep 2020, Jonathan Morton wrote: We're usually seeing problems with the smaller-scale CPUs found in CPE SoCs, which are very much geared to take advantage of hardware accelerated packet forwarding. I think in some cases there might actually be insufficient internal I/O bandwidth to get 1Gbps out of the NIC, into the CPU, and back out to the NIC again, only through the dedicated forwarding path. That could manifest itself as a lot of kernel time spent waiting for the hardware, and can only really be solved by redesigning the hardware. There are lots of SoCs where CPU routing results in ~100 megabit/s of throughput, whilst the HW offload engine is perfectly capable of full gig speeds. MT7621 being one that actually is supported in OpenWrt. -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Thu, 3 Sep 2020, Sebastian Moeller wrote: Mmmh, how did you measure the sirq percentage? Some top versions show overall percentage with 100% meaning all CPUs, so 35% in a quadcore could mean 1 fully maxed out CPU (25%) plus an additional 10% spread over the other three, or something more benign. Better top (so not busybox's) or htop versions also can show the load per CPU which is helpful to pinpoint hotspots... If I run iperf3 with 10 parallel sessions then htop shows this (in the CAKE upstream direction I believe): 1 [* 0.7%] Tasks: 19, 0 thr; 2 running 2 [*100.0%] Load average: 0.48 0.16 0.05 3 [#*** 44.4%] Uptime: 10 days, 04:46:37 4 [ 54.2%] Mem[|#* 36.7M/3.84G] Swp[ 0K/0K] The other direction (-R), typically this: 1 [#*** 13.0%] Tasks: 19, 0 thr; 2 running 2 [*** 53.9%] Load average: 0.54 0.25 0.09 3 [#* 55.8%] Uptime: 10 days, 04:47:36 4 [** 84.4%] Topology is: PC - HGW -> Internet iperf3 is run on the PC, HGW has CAKE in the -> Internet direction. Best Regards Sebastian root@OpenWrt:~# tc -s qdisc qdisc noqueue 0: dev lo root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 179) backlog 0b 0p requeues 179 memory used: 2751976b of 15140Kb capacity estimate: 900Mbit min/max network layer size: 42 /1514 min/max overhead-adjusted size: 42 /1514 average network hdr offset: 14 Bulk Best EffortVoice thresh 56250Kbit 900Mbit 225Mbit target 5.0ms5.0ms5.0ms interval 100.0ms 100.0ms 100.0ms pk_delay 0us 22us232us av_delay 0us 6us 7us sp_delay 0us 4us 5us backlog0b 0b 0b pkts0 959747 90 bytes 0 93543739440 way_inds0229640 way_miss0 2752 way_cols000 drops 0 1340 marks 000 ack_drop000 sp_flows031 bk_flows010 un_flows000 max_len 068130 3714 quantum 1514 1514 1514 -- Mikael Abrahamssonemail: swm...@swm.pp.se___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote: Yup, the number of cores is only going to go up, so for CAKE to stay relevant it'll need to be able to take advantage of this eventually :) https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, it has a quad core machine with 2 x 2.5GbE NICs. When using something like this for routing with HTB+CAKE for bidirectional shaping below line rate, what would be the main things that would need to be improved? -- Mikael Abrahamssonemail: swm...@swm.pp.se___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote: And what about when you're running CAKE in 'unlimited' mode? I tried this: # tc qdisc add dev eth0 root cake bandwidth 900mbit This seems fine from a performance point of view (not that high sirq%, around 35%) and does seem to limit my upstream traffic correctly. Not sure it helps though, at these speeds the bufferbloat problem is not that obvious and easy to test over the Internet :) root@OpenWrt:~# tc -s qdisc qdisc noqueue 0: dev lo root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 179) backlog 0b 0p requeues 179 memory used: 2751976b of 15140Kb capacity estimate: 900Mbit min/max network layer size: 42 /1514 min/max overhead-adjusted size: 42 /1514 average network hdr offset: 14 Bulk Best EffortVoice thresh 56250Kbit 900Mbit 225Mbit target 5.0ms5.0ms5.0ms interval 100.0ms 100.0ms 100.0ms pk_delay 0us 22us232us av_delay 0us 6us 7us sp_delay 0us 4us 5us backlog0b 0b 0b pkts0 959747 90 bytes 0 93543739440 way_inds0229640 way_miss0 2752 way_cols000 drops 0 1340 marks 000 ack_drop000 sp_flows031 bk_flows010 un_flows000 max_len 068130 3714 quantum 1514 1514 1514 -- Mikael Abrahamssonemail: swm...@swm.pp.se___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote: Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that would be FQ-CoDel+HTB)? An exact config might be useful (or just the output of tc -s qdisc). Yeah, I guess I'm also using HTB to get the 900 megabit/s SQM is looking for. If I only use FQ_CODEL to get interface speeds my performance is fine. If you are indeed not shaping, maybe you're hitting the issue fixed by this commit? https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n I enabled it just now to get the config. qdisc cake 8030: dev eth0 root refcnt 9 bandwidth 900Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 Sent 4346128 bytes 11681 pkt (dropped 0, overlimits 1004 requeues 17) backlog 0b 0p requeues 17 memory used: 33328b of 15140Kb capacity estimate: 900Mbit min/max network layer size: 42 /1514 min/max overhead-adjusted size: 42 /1514 average network hdr offset: 14 Tin 0 thresh900Mbit target 5.0ms interval 100.0ms pk_delay 18us av_delay 6us sp_delay 4us backlog0b pkts11681 bytes 4346128 way_inds 30 way_miss 735 way_cols0 drops 0 marks 0 ack_drop0 sp_flows3 bk_flows1 un_flows0 max_len 22710 quantum 1514 qdisc ingress : dev eth0 parent :fff1 Sent 4716199 bytes 10592 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 ... qdisc cake 8031: dev ifb4eth0 root refcnt 2 bandwidth 900Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 Sent 4946683 bytes 10592 pkt (dropped 0, overlimits 492 requeues 0) backlog 0b 0p requeues 0 memory used: 35Kb of 15140Kb capacity estimate: 900Mbit min/max network layer size: 60 /1514 min/max overhead-adjusted size: 60 /1514 average network hdr offset: 14 Tin 0 thresh900Mbit target 5.0ms interval 100.0ms pk_delay 19us av_delay 6us sp_delay 4us backlog0b pkts10592 bytes 4946683 way_inds 33 way_miss 969 way_cols0 drops 0 marks 0 ack_drop0 sp_flows2 bk_flows1 un_flows0 max_len 21196 quantum 1514 -- Mikael Abrahamssonemail: swm...@swm.pp.se___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] Does employing a AQM on the home router also solve bufferbloat between home router and upstream devices?
--- Begin Message --- On Tue, 2 Jun 2020, Tianhe wrote: What does it mean? What I do is that on my WAN, I do bidirectional shaping/AQM at 90% of the ISP configured rate, meaning buffering will generally be done in my device instead of the ISP device, and my device has proper AQM so I have no bufferbloat. This is not perfect but it works well enough to make a big difference for all normal use-cases. -- Mikael Abrahamssonemail: swm...@swm.pp.se --- End Message --- ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat