Re: [Bloat] CAKE in openwrt high CPU
Hi Mikael, Thanks! That looks like a fully saturated core, no? I do not know how to parse the symbols here, so not sure what "class" of load is denoted by the star, but I would guess something including sirqs? Anyway the average is ~49% load, while clearly CPU is pegged already. I assume the htop data is from the HGW... best regards Sebastian > On Sep 4, 2020, at 15:37, Mikael Abrahamsson wrote: > > On Thu, 3 Sep 2020, Sebastian Moeller wrote: > >> Mmmh, how did you measure the sirq percentage? Some top versions show >> overall percentage with 100% meaning all CPUs, so 35% in a quadcore could >> mean 1 fully maxed out CPU (25%) plus an additional 10% spread over the >> other three, or something more benign. Better top (so not busybox's) or htop >> versions also can show the load per CPU which is helpful to pinpoint >> hotspots... > > If I run iperf3 with 10 parallel sessions then htop shows this (in the CAKE > upstream direction I believe): > > 1 [* > 0.7%] Tasks: 19, 0 thr; 2 running > 2 > [*100.0%] >Load average: 0.48 0.16 0.05 > 3 [#*** > 44.4%] Uptime: 10 days, 04:46:37 > 4 [ > 54.2%] > Mem[|#* >36.7M/3.84G] > Swp[ > 0K/0K] > > The other direction (-R), typically this: > > 1 [#*** > 13.0%] Tasks: 19, 0 thr; 2 running > 2 [*** > 53.9%] Load average: 0.54 0.25 0.09 > 3 [#* > 55.8%] Uptime: 10 days, 04:47:36 > 4 > [** > 84.4%] > > Topology is: > > PC - HGW -> Internet > > iperf3 is run on the PC, HGW has CAKE in the -> Internet direction. > >> Best Regards >> Sebastian >> >>> >>> root@OpenWrt:~# tc -s qdisc >>> qdisc noqueue 0: dev lo root refcnt 2 >>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >>> backlog 0b 0p requeues 0 >>> qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 >>> triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw >>> overhead 0 >>> Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues >>> 179) >>> backlog 0b 0p requeues 179 >>> memory used: 2751976b of 15140Kb >>> capacity estimate: 900Mbit >>> min/max network layer size: 42 /1514 >>> min/max overhead-adjusted size: 42 /1514 >>> average network hdr offset: 14 >>> >>> Bulk Best EffortVoice >>> thresh 56250Kbit 900Mbit 225Mbit >>> target 5.0ms5.0ms5.0ms >>> interval 100.0ms 100.0ms 100.0ms >>> pk_delay 0us 22us232us >>> av_delay 0us 6us 7us >>> sp_delay 0us 4us 5us >>> backlog0b 0b 0b >>> pkts0 959747 90 >>> bytes 0 93543739440 >>> way_inds0229640 >>> way_miss0 2752 >>> way_cols000 >>> drops 0 1340 >>> marks 000 >>> ack_drop000 >>> sp_flows031 >>> bk_flows010 >>> un_flows000 >>> max_len 068130 3714 >>> quantum 1514 1514 1514 >>> >>> >>> -- >>> Mikael Abrahamssonemail: >>> swm...@swm.pp.se___ >>> Bloat mailing list >>> Bloat@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/bloat >> > > -- > Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Thu, 3 Sep 2020, Sebastian Moeller wrote: Mmmh, how did you measure the sirq percentage? Some top versions show overall percentage with 100% meaning all CPUs, so 35% in a quadcore could mean 1 fully maxed out CPU (25%) plus an additional 10% spread over the other three, or something more benign. Better top (so not busybox's) or htop versions also can show the load per CPU which is helpful to pinpoint hotspots... If I run iperf3 with 10 parallel sessions then htop shows this (in the CAKE upstream direction I believe): 1 [* 0.7%] Tasks: 19, 0 thr; 2 running 2 [*100.0%] Load average: 0.48 0.16 0.05 3 [#*** 44.4%] Uptime: 10 days, 04:46:37 4 [ 54.2%] Mem[|#* 36.7M/3.84G] Swp[ 0K/0K] The other direction (-R), typically this: 1 [#*** 13.0%] Tasks: 19, 0 thr; 2 running 2 [*** 53.9%] Load average: 0.54 0.25 0.09 3 [#* 55.8%] Uptime: 10 days, 04:47:36 4 [** 84.4%] Topology is: PC - HGW -> Internet iperf3 is run on the PC, HGW has CAKE in the -> Internet direction. Best Regards Sebastian root@OpenWrt:~# tc -s qdisc qdisc noqueue 0: dev lo root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 179) backlog 0b 0p requeues 179 memory used: 2751976b of 15140Kb capacity estimate: 900Mbit min/max network layer size: 42 /1514 min/max overhead-adjusted size: 42 /1514 average network hdr offset: 14 Bulk Best EffortVoice thresh 56250Kbit 900Mbit 225Mbit target 5.0ms5.0ms5.0ms interval 100.0ms 100.0ms 100.0ms pk_delay 0us 22us232us av_delay 0us 6us 7us sp_delay 0us 4us 5us backlog0b 0b 0b pkts0 959747 90 bytes 0 93543739440 way_inds0229640 way_miss0 2752 way_cols000 drops 0 1340 marks 000 ack_drop000 sp_flows031 bk_flows010 un_flows000 max_len 068130 3714 quantum 1514 1514 1514 -- Mikael Abrahamssonemail: swm...@swm.pp.se___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat -- Mikael Abrahamssonemail: swm...@swm.pp.se ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
> On 3 Sep, 2020, at 5:32 pm, Toke Høiland-Jørgensen via Bloat > wrote: > > Yeah, offloading of some sort is another option, but I consider that > outside of the "CAKE stays relevant" territory, since that will most > likely involve an entirely programmable packet scheduler. Offload of *just* shaping could be valuable in itself at higher rates, when combined with BQL, as it would avoid having to interact with the CPU-side timer infrastructure so much. It would also not be difficult at all to implement in hardware at line rate, even with overhead compensation. It's the sort of thing you could sensibly do with 74-series logic and a lookup table in a cheap SRAM, up to millions of PPS, and considerably faster in FPGA or ASIC territory. I think that's what the questions about combining "unlimited Cake" with some other shaper are angling towards, though I suspect that the way Cake's shaper is integrated is still better than having an external one in software. With that said, it's also possible that something a bit lighter than Cake might be appropriate at cable speeds. There is background work in this general area going on, so don't despair. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On 3 September 2020 17:31:07 CEST, Luca Muscariello wrote: >On Thu, Sep 3, 2020 at 4:32 PM Toke Høiland-Jørgensen >wrote: >> >> Luca Muscariello writes: >> >> > On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat >> > wrote: >> >> >> >> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote: >> >> >> >> > Yup, the number of cores is only going to go up, so for CAKE to >stay >> >> > relevant it'll need to be able to take advantage of this >eventually :) >> >> >> >> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting >platform, >> >> it has a quad core machine with 2 x 2.5GbE NICs. >> >> >> >> When using something like this for routing with HTB+CAKE for >bidirectional >> >> shaping below line rate, what would be the main things that would >need to >> >> be improved? >> > >> > IMO, hardware offloading for shaping, beyond this specific >platform. >> > I ignore if there is any roadmap with that objective. >> >> Yeah, offloading of some sort is another option, but I consider that >> outside of the "CAKE stays relevant" territory, since that will most >> likely involve an entirely programmable packet scheduler. There was >some >> discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1] >> algorithm seems promising. >> >> -Toke >> >> [0] https://linuxplumbersconf.org/event/7/contributions/679/ >> [1] https://www.usenix.org/conference/nsdi19/presentation/saeed > >These are all interesting efforts for scheduling but orthogonal to >shaping >and not going to help make shaping more scalable. Eiffel says it can do shaping by way of a global calendar queue... Planning to put that to the test :) -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Thu, Sep 3, 2020 at 4:32 PM Toke Høiland-Jørgensen wrote: > > Luca Muscariello writes: > > > On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat > > wrote: > >> > >> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote: > >> > >> > Yup, the number of cores is only going to go up, so for CAKE to stay > >> > relevant it'll need to be able to take advantage of this eventually :) > >> > >> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, > >> it has a quad core machine with 2 x 2.5GbE NICs. > >> > >> When using something like this for routing with HTB+CAKE for bidirectional > >> shaping below line rate, what would be the main things that would need to > >> be improved? > > > > IMO, hardware offloading for shaping, beyond this specific platform. > > I ignore if there is any roadmap with that objective. > > Yeah, offloading of some sort is another option, but I consider that > outside of the "CAKE stays relevant" territory, since that will most > likely involve an entirely programmable packet scheduler. There was some > discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1] > algorithm seems promising. > > -Toke > > [0] https://linuxplumbersconf.org/event/7/contributions/679/ > [1] https://www.usenix.org/conference/nsdi19/presentation/saeed These are all interesting efforts for scheduling but orthogonal to shaping and not going to help make shaping more scalable. ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Luca Muscariello writes: > On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat > wrote: >> >> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote: >> >> > Yup, the number of cores is only going to go up, so for CAKE to stay >> > relevant it'll need to be able to take advantage of this eventually :) >> >> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, >> it has a quad core machine with 2 x 2.5GbE NICs. >> >> When using something like this for routing with HTB+CAKE for bidirectional >> shaping below line rate, what would be the main things that would need to >> be improved? > > IMO, hardware offloading for shaping, beyond this specific platform. > I ignore if there is any roadmap with that objective. Yeah, offloading of some sort is another option, but I consider that outside of the "CAKE stays relevant" territory, since that will most likely involve an entirely programmable packet scheduler. There was some discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1] algorithm seems promising. -Toke [0] https://linuxplumbersconf.org/event/7/contributions/679/ [1] https://www.usenix.org/conference/nsdi19/presentation/saeed ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Ho Toke, > On Sep 3, 2020, at 15:29, Toke Høiland-Jørgensen via Bloat > wrote: > > Mikael Abrahamsson writes: > >> On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote: >> >>> And what about when you're running CAKE in 'unlimited' mode? >> >> I tried this: >> >> # tc qdisc add dev eth0 root cake bandwidth 900mbit > > So the difference from before is just the lack of inbound shaping, or? Good point, so worst-case just half the load to handle, indicating that a single CPU is sufficient for gigabit shaping, but not for dual-gigabit shaping, no? Best Regards Sebastian > > -Toke > ___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Hi Mikael, > On Sep 3, 2020, at 15:10, Mikael Abrahamsson via Bloat > wrote: > > On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote: > >> And what about when you're running CAKE in 'unlimited' mode? > > I tried this: > > # tc qdisc add dev eth0 root cake bandwidth 900mbit That still employs the cake shaper, so is not equivalent with unlimited, I believe. [PEDANT_MODE] 900 Mbps without explicit overhead will result in a typical maximum TCP/IPv4 goodput of 900 * ((1500-20-20)/(1500+14)) = 867.899603699 Mbps but since ethernet overhead is actually 38 bytes instead of 14 this actually occupies (900 * ((1500-20-20)/(1500+14))) * ((1500+38)/(1500-20-20)) = 914.266842801 on the ethernet link which for small packets will become problematic: (900 * ((150-20-20)/(100+14))) * ((150+38)/(150-20-20)) = 1484.21052632 Mbps gross speed out of the 1000.0 Gigabit ethernet offers. in fact, packet sizes below 202 will spend all the "credit" you got from reducing the shaper rate to 900 Mbps in the first place. (900 * ((202-20-20)/(202 +14))) * ((202 +38)/(202-20-20)) = 1000 Maybe tell cake that you run on ethernet by adding the "ethernet keyword" which will both take care of the per-packet overhead of 38 bytes and the minimum packet size on the link of 88 bytes? Please note that for throughput this does not really matter that much, but latency-under-load is not going to be pretty when too many small packets are in flight... [/PEDANT_MODE] > > This seems fine from a performance point of view (not that high sirq%, around > 35%) and does seem to limit my upstream traffic correctly. Not sure it helps > though, at these speeds the bufferbloat problem is not that obvious and easy > to test over the Internet :) Mmmh, how did you measure the sirq percentage? Some top versions show overall percentage with 100% meaning all CPUs, so 35% in a quadcore could mean 1 fully maxed out CPU (25%) plus an additional 10% spread over the other three, or something more benign. Better top (so not busybox's) or htop versions also can show the load per CPU which is helpful to pinpoint hotspots... Best Regards Sebastian > > root@OpenWrt:~# tc -s qdisc > qdisc noqueue 0: dev lo root refcnt 2 > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 > triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 > Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 179) > backlog 0b 0p requeues 179 > memory used: 2751976b of 15140Kb > capacity estimate: 900Mbit > min/max network layer size: 42 /1514 > min/max overhead-adjusted size: 42 /1514 > average network hdr offset: 14 > > Bulk Best EffortVoice > thresh 56250Kbit 900Mbit 225Mbit > target 5.0ms5.0ms5.0ms > interval 100.0ms 100.0ms 100.0ms > pk_delay 0us 22us232us > av_delay 0us 6us 7us > sp_delay 0us 4us 5us > backlog0b 0b 0b > pkts0 959747 90 > bytes 0 93543739440 > way_inds0229640 > way_miss0 2752 > way_cols000 > drops 0 1340 > marks 000 > ack_drop000 > sp_flows031 > bk_flows010 > un_flows000 > max_len 068130 3714 > quantum 1514 1514 1514 > > > -- > Mikael Abrahamssonemail: > swm...@swm.pp.se___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Mikael Abrahamsson writes: > On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote: > >> And what about when you're running CAKE in 'unlimited' mode? > > I tried this: > > # tc qdisc add dev eth0 root cake bandwidth 900mbit So the difference from before is just the lack of inbound shaping, or? -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Mikael Abrahamsson writes: > On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote: > >> Yup, the number of cores is only going to go up, so for CAKE to stay >> relevant it'll need to be able to take advantage of this eventually :) > > https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, > it has a quad core machine with 2 x 2.5GbE NICs. > > When using something like this for routing with HTB+CAKE for bidirectional > shaping below line rate, what would be the main things that would need to > be improved? The aforementioned multi-processor support... -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat wrote: > > On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote: > > > Yup, the number of cores is only going to go up, so for CAKE to stay > > relevant it'll need to be able to take advantage of this eventually :) > > https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, > it has a quad core machine with 2 x 2.5GbE NICs. > > When using something like this for routing with HTB+CAKE for bidirectional > shaping below line rate, what would be the main things that would need to > be improved? IMO, hardware offloading for shaping, beyond this specific platform. I ignore if there is any roadmap with that objective. > > -- > Mikael Abrahamssonemail: > swm...@swm.pp.se___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote: Yup, the number of cores is only going to go up, so for CAKE to stay relevant it'll need to be able to take advantage of this eventually :) https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, it has a quad core machine with 2 x 2.5GbE NICs. When using something like this for routing with HTB+CAKE for bidirectional shaping below line rate, what would be the main things that would need to be improved? -- Mikael Abrahamssonemail: swm...@swm.pp.se___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote: And what about when you're running CAKE in 'unlimited' mode? I tried this: # tc qdisc add dev eth0 root cake bandwidth 900mbit This seems fine from a performance point of view (not that high sirq%, around 35%) and does seem to limit my upstream traffic correctly. Not sure it helps though, at these speeds the bufferbloat problem is not that obvious and easy to test over the Internet :) root@OpenWrt:~# tc -s qdisc qdisc noqueue 0: dev lo root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 179) backlog 0b 0p requeues 179 memory used: 2751976b of 15140Kb capacity estimate: 900Mbit min/max network layer size: 42 /1514 min/max overhead-adjusted size: 42 /1514 average network hdr offset: 14 Bulk Best EffortVoice thresh 56250Kbit 900Mbit 225Mbit target 5.0ms5.0ms5.0ms interval 100.0ms 100.0ms 100.0ms pk_delay 0us 22us232us av_delay 0us 6us 7us sp_delay 0us 4us 5us backlog0b 0b 0b pkts0 959747 90 bytes 0 93543739440 way_inds0229640 way_miss0 2752 way_cols000 drops 0 1340 marks 000 ack_drop000 sp_flows031 bk_flows010 un_flows000 max_len 068130 3714 quantum 1514 1514 1514 -- Mikael Abrahamssonemail: swm...@swm.pp.se___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Jonathan Foulkes writes: >> Right, so some benefit might be possible here. Does the NIC have >> multiple hardware queues (`ls /sys/class/net/$IFACE/queues` should tell >> you)? > > Here is the output of: > /sys/devices/virtual/net/eth0.2/queues# ls > rx-0 tx-0 > /sys/devices/virtual/net/eth0.2/queues/rx-0# cat rps_cpus > 0 > > /sys/devices/virtual/net/eth0.2/queues/tx-0# cat xps_cpus > 0 Hmm, so no multiq support on this driver, it looks like. So not sure to what extent it will be possible to effectively utilise both cores on this box, sadly :/ >> Yup, the number of cores is only going to go up, so for CAKE to stay >> relevant it'll need to be able to take advantage of this eventually :) > > True, the mid-range market is already there, and so soon will be the > lower-end. And with ISPs lighting up more and more capacity, the > demand will be there to be able to shape higher and higher rates. > > But I agree with Jonathan Morton that once every deice has sufficient > capacity, more makes no difference. I went for 100/15 to 300/24 and > never noticed the difference. > > Hell, there are days I switch to my backup 10/0.7 DSL line for a test, > and forget to switch back, and will work for hours and not notice I’m > not on the 300Mbps line ;-) Heh, if you can live with a 10/0.7 line without noticing I think you're more patient than me ;) But still, fair point; doesn't mean that people will still not *want* to run a higher speeds, though... :) -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
> Right, so some benefit might be possible here. Does the NIC have > multiple hardware queues (`ls /sys/class/net/$IFACE/queues` should tell > you)? Here is the output of: /sys/devices/virtual/net/eth0.2/queues# ls rx-0 tx-0 /sys/devices/virtual/net/eth0.2/queues/rx-0# cat rps_cpus 0 /sys/devices/virtual/net/eth0.2/queues/tx-0# cat xps_cpus 0 > Yup, the number of cores is only going to go up, so for CAKE to stay > relevant it'll need to be able to take advantage of this eventually :) True, the mid-range market is already there, and so soon will be the lower-end. And with ISPs lighting up more and more capacity, the demand will be there to be able to shape higher and higher rates. But I agree with Jonathan Morton that once every deice has sufficient capacity, more makes no difference. I went for 100/15 to 300/24 and never noticed the difference. Hell, there are days I switch to my backup 10/0.7 DSL line for a test, and forget to switch back, and will work for hours and not notice I’m not on the 300Mbps line ;-) Cheers, Jonathan > On Sep 1, 2020, at 5:11 PM, Toke Høiland-Jørgensen wrote: > > Jonathan Foulkes writes: > >> Thanks Toke, we currently are on an MT7621a @880, so a dual-core. > > Right, so some benefit might be possible here. Does the NIC have > multiple hardware queues (`ls /sys/class/net/$IFACE/queues` should tell > you)? > >> And we are looking for a good quad-core platform that will support >> 600Mbps or more with Cake enabled, hopefully with AX radios as well. > > Yup, the number of cores is only going to go up, so for CAKE to stay > relevant it'll need to be able to take advantage of this eventually :) > > -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Jonathan Foulkes writes: > Thanks Toke, we currently are on an MT7621a @880, so a dual-core. Right, so some benefit might be possible here. Does the NIC have multiple hardware queues (`ls /sys/class/net/$IFACE/queues` should tell you)? > And we are looking for a good quad-core platform that will support > 600Mbps or more with Cake enabled, hopefully with AX radios as well. Yup, the number of cores is only going to go up, so for CAKE to stay relevant it'll need to be able to take advantage of this eventually :) -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Jonathan Morton writes: >> On 1 Sep, 2020, at 9:45 pm, Toke Høiland-Jørgensen via Bloat >> wrote: >> >> CAKE takes the global qdisc lock. > > Presumably this is a default mechanism because CAKE doesn't handle any > locking itself. > > Obviously it would need to be replaced with at least a lock over > CAKE's complete data structures, taking the lock on each entry point > and releasing it at each return point, and I assume there is a flag we > can set to indicate we do so. Finer-grained locking might be possible, > but CAKE is fairly complex so that might be hard to implement. Locking > per CAKE instance would at least allow running ingress and egress on > different CPUs. What you're describing here is basically the existing qdisc root lock. It is per instance of the qdisc, and it is held only while enqueueing and dequeueing packets from that qdisc. So it is possible today to run the ingress and egress instances of CAKE on different CPUs. All you have to do is schedule the packets to be processed on different CPUs in the different directions - which usually means messing with RPS settings for the NIC, and as I remarked to Sebastian, for many OpenWrt SOCs this is not really supported... To make CAKE truly take advantage of multiple CPUs, there are to options: 1. Make it aware of multiple hardware queues. To do this, we to implement the 'attach()' method in the Qdisc_ops struct (see sch_mq for an example). The idea here would be to create stub child qdiscs with a separate struct Qdisc_ops implementing enqueue() and dequeue(). These would be called separately for each hardware queue, with their separate locks held at the time; and with proper XPS steering, each hardware queue can be serviced by a separate CPU. 2. Set the TCQ_F_NOLOCK in the qdisc flags; this will cause the existing enqueue() and dequeue() functions to be called without the root lock being held, and the qdisc is responsible for dealing with that itself. Of course in either case, the trick is to get the CAKE data structures to play nice with concurrent access from multiple CPUs. For option 1. above, we could just duplicate all the flow queues for each netdev queue and take the hit in wasted space - or we could partition the data structure, either statically at init, or dynamically as each flow becomes active. But at a minimum there would need to be some way for the shaper to enforce the maximum rate. Maybe a granular lock or an atomic is good enough for this, though? Note also that for 2. there's an ongoing issue[0] with packets getting stuck which is still unresolved, as far as I can tell - so not sure if this is the right way to go. However, apart from this, the benefit of 2. is that CAKE could *potentially* process packets on multiple CPUs without relying on hardware multi-Q. I'm not quite sure if the stack will actually process packets on more than one CPU without them, though. Either way, I suppose some experimentation would be needed to find the best solution. -Toke [0] https://lore.kernel.org/netdev/CACS=qq+a0H=e8ylfu95ae7hr0bq9ytcbbn2rfx82ojnppkb...@mail.gmail.com/ ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
> On 1 Sep, 2020, at 11:04 pm, Sebastian Moeller wrote: > >> The challenge are the end users, who only understand the silly ’speed’ >> metric, and feel anything that lowers that number is a ‘bad’ thing. It takes >> effort to get even technical users to get it. > > I repeatedly fall into that trap... For a lot of users, I rather suspect that setting 40/10 Mbps would give them entirely sufficient speed, and most existing CPE would be able to keep up with those settings even with all of Cake's bells and whistles turned on. The trouble is that that might be 10% of what the cable company is advertising to them. - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Hi Jonathan, > On Sep 1, 2020, at 21:31, Jonathan Foulkes wrote: > > Hi Sebastian, Cake functions wonderfully, it’s a marvel in terms of goodput. > > My comment was more oriented at the metrics process users use to evaluate > results. Only those who spend time analyzing just how busy an ‘idle’ network > can be know that there are a lot of processes in constant communications with > their cloud services. True, intestinally, quite a number of speedtests seem to err on the side of too high, probably because that way users are happy to see something close to their contracted rates... > The challenge are the end users, who only understand the silly ’speed’ > metric, and feel anything that lowers that number is a ‘bad’ thing. It takes > effort to get even technical users to get it. I repeatedly fall into that trap... > But even beyond the basic, the further cuts induced by fairness is the new > wrinkle in dealing with widely varying speed test results with isolation > enabled on a busy network. Yes, but one can try to make lemonade out of it, by running speedtests from two devices while observing something like "sudo mtr -ezb4 -i 0.3 8.8.8.8" not budging much een though the tests come and go; demonstrating the quality of the isolation and that low queueing delay can "happen" even on a busy link. > > The high density of devices and constant chatter with cloud services means > the average home has way more devices and connections than many realize. Keep > a note of the number of ‘active connections’ displayed on the OpenWRT > overview page, you might be surprised (well, not you Seb ;) ) Count me in, I just switched over to a turris omnia (which I had crowd-funded before I realized IQrouters will be delivered to Germany ;) ) and while playning with its pakon feature I was quite baffled by how many addresses are used even in a short amount of time. (All of this is just a hobby to me, so I keep forgetting stuff regularly, because I do approach things a bit casually at times). > > As an example, on my network, I average 1,000 active connections all day, it > rarely drops below 700. And it’s just two WFH professionals and 60+ network > devices, not all of which are active at any one time. > I actually run some custom firewall rules to de-prioritize four IoT devices > that generate a LOT of traffic to their services. Two of which power panel > monitors with real-time updates. This is why my bulk tin on egress has such > high traffic. Nice, I think being able to deprioritize stuff is one of the best reasons for using diffserve. > > Since you like to see tc output, here’s the one from my system after nearly a > week. > I run four-layer Cake as we do a lot of Zoom calls and our accounts are set > up to do the appropriate DSCP marking. I saw your nice writeup of how to do that on the OpenWrt forum IIRC. Need to talk to our IT guys at work, whether they are willing to actually configure it in the first place. > > root@IQrouter:~# tc -s qdisc > qdisc noqueue 0: dev lo root refcnt 2 > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 > target 5.0ms interval 100.0ms memory_limit 4Mb ecn > Sent 51311363856 bytes 86785488 pkt (dropped 53, overlimits 0 requeues 9114) > backlog 0b 0p requeues 9114 > maxpacket 12112 drop_overlimit 0 new_flow_count 691740 ecn_mark 0 > new_flows_len 0 old_flows_len 0 > qdisc noqueue 0: dev br-lan root refcnt 2 > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > qdisc noqueue 0: dev eth0.1 root refcnt 2 > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > qdisc cake 8005: dev eth0.2 root refcnt 2 bandwidth 22478Kbit diffserv4 > dual-srchost nat nowash ack-filter split-gso rtt 100.0ms raw overhead 0 mpu > 64 > Sent 6943407136 bytes 35467722 pkt (dropped 51747, overlimits 3912091 > requeues 0) > backlog 0b 0p requeues 0 > memory used: 843816b of 4Mb > capacity estimate: 22478Kbit > min/max network layer size: 42 /1514 > min/max overhead-adjusted size: 64 /1514 > average network hdr offset: 14 > >Bulk Best EffortVideoVoice > thresh 1404Kbit22478Kbit11239Kbit 5619Kbit > target 12.9ms5.0ms5.0ms5.0ms > interval 107.9ms 100.0ms 100.0ms 100.0ms > pk_delay5.9ms6.4ms3.7ms1.6ms > av_delay426us445us124us188us > sp_delay 13us 13us 12us 8us > backlog0b 0b 0b 0b > pkts 3984407 30899121 474818 161123 > bytes 789740113 5883832402246917562 30556915 > way_inds65175
Re: [Bloat] CAKE in openwrt high CPU
Hi Sebastian, Cake functions wonderfully, it’s a marvel in terms of goodput. My comment was more oriented at the metrics process users use to evaluate results. Only those who spend time analyzing just how busy an ‘idle’ network can be know that there are a lot of processes in constant communications with their cloud services. The challenge are the end users, who only understand the silly ’speed’ metric, and feel anything that lowers that number is a ‘bad’ thing. It takes effort to get even technical users to get it. But even beyond the basic, the further cuts induced by fairness is the new wrinkle in dealing with widely varying speed test results with isolation enabled on a busy network. The high density of devices and constant chatter with cloud services means the average home has way more devices and connections than many realize. Keep a note of the number of ‘active connections’ displayed on the OpenWRT overview page, you might be surprised (well, not you Seb ;) ) As an example, on my network, I average 1,000 active connections all day, it rarely drops below 700. And it’s just two WFH professionals and 60+ network devices, not all of which are active at any one time. I actually run some custom firewall rules to de-prioritize four IoT devices that generate a LOT of traffic to their services. Two of which power panel monitors with real-time updates. This is why my bulk tin on egress has such high traffic. Since you like to see tc output, here’s the one from my system after nearly a week. I run four-layer Cake as we do a lot of Zoom calls and our accounts are set up to do the appropriate DSCP marking. root@IQrouter:~# tc -s qdisc qdisc noqueue 0: dev lo root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn Sent 51311363856 bytes 86785488 pkt (dropped 53, overlimits 0 requeues 9114) backlog 0b 0p requeues 9114 maxpacket 12112 drop_overlimit 0 new_flow_count 691740 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc noqueue 0: dev br-lan root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc noqueue 0: dev eth0.1 root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc cake 8005: dev eth0.2 root refcnt 2 bandwidth 22478Kbit diffserv4 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms raw overhead 0 mpu 64 Sent 6943407136 bytes 35467722 pkt (dropped 51747, overlimits 3912091 requeues 0) backlog 0b 0p requeues 0 memory used: 843816b of 4Mb capacity estimate: 22478Kbit min/max network layer size: 42 /1514 min/max overhead-adjusted size: 64 /1514 average network hdr offset: 14 Bulk Best EffortVideoVoice thresh 1404Kbit22478Kbit11239Kbit 5619Kbit target 12.9ms5.0ms5.0ms5.0ms interval 107.9ms 100.0ms 100.0ms 100.0ms pk_delay5.9ms6.4ms3.7ms1.6ms av_delay426us445us124us188us sp_delay 13us 13us 12us 8us backlog0b 0b 0b 0b pkts 3984407 30899121 474818 161123 bytes 789740113 5883832402246917562 30556915 way_inds65175 2580935 10645 way_miss 1427 91852915960 1120 way_cols0000 drops 0 2966 5117 marks 0 10500 ack_drop04826300 sp_flows2410 bk_flows0000 un_flows0000 max_len 103543094 3094 590 quantum 300 685 342 300 qdisc ingress : dev eth0.2 parent :fff1 Sent 43188461026 bytes 67870269 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc noqueue 0: dev br-guest root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc noqueue 0: dev wlan1 root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc noqueue 0: dev wlan0 root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc noqueue 0: dev wlan0-1 root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc noqueue 0: dev wlan1-1 root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qd
Re: [Bloat] CAKE in openwrt high CPU
> On 1 Sep, 2020, at 9:45 pm, Toke Høiland-Jørgensen via Bloat > wrote: > > CAKE takes the global qdisc lock. Presumably this is a default mechanism because CAKE doesn't handle any locking itself. Obviously it would need to be replaced with at least a lock over CAKE's complete data structures, taking the lock on each entry point and releasing it at each return point, and I assume there is a flag we can set to indicate we do so. Finer-grained locking might be possible, but CAKE is fairly complex so that might be hard to implement. Locking per CAKE instance would at least allow running ingress and egress on different CPUs. Is there an example anywhere on how to do this? - Jonathan Morton ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Thanks Toke, we currently are on an MT7621a @880, so a dual-core. And we are looking for a good quad-core platform that will support 600Mbps or more with Cake enabled, hopefully with AX radios as well. Jonathan > On Sep 1, 2020, at 12:11 PM, Toke Høiland-Jørgensen wrote: > > Jonathan Foulkes writes: > >> Toke, that link returns a 404 for me. > > Ah, seems an extra character snuck in at the end - try this: > > https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6 > >> For others, I’ve found that testing cake throughput with isolation options >> enabled is tricky if there are many competing connections. >> Like I keep having to tell my customers, fairness algorithms mean no one >> device will ever gain 100% of the bandwidth so long as there are other open >> & active connections from other devices. >> >> That said, I’d love to find options to increase throughput for >> single-tin configs. > > Yeah, doing something about this is on my list, one way or another. Not > sure how much more we can do in terms of overhead, so we may have to go > for multi-q (and multi-CPU) support. How many CPU cores does the > IQrouter have? > > -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Sebastian Moeller writes: > Hi Toke, > > >> On Sep 1, 2020, at 18:11, Toke Høiland-Jørgensen via Bloat >> wrote: >> >> Jonathan Foulkes writes: >> >>> Toke, that link returns a 404 for me. >> >> Ah, seems an extra character snuck in at the end - try this: >> >> https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6 >> >>> For others, I’ve found that testing cake throughput with isolation options >>> enabled is tricky if there are many competing connections. >>> Like I keep having to tell my customers, fairness algorithms mean no one >>> device will ever gain 100% of the bandwidth so long as there are other open >>> & active connections from other devices. >>> >>> That said, I’d love to find options to increase throughput for >>> single-tin configs. >> >> Yeah, doing something about this is on my list, one way or another. Not >> sure how much more we can do in terms of overhead, so we may have to go >> for multi-q (and multi-CPU) support. How many CPU cores does the >> IQrouter have? > > It might be worth looking how the typical two cake instances > distribute across the available CPUs, in some version of OpenWrt > all cake's and ethernet interupt processing crowed up on a > single CPU leading to "out of CPU" behaviour with 50% idle > remaining... I think that usinf a different RPS scheme might > work better. Well, many home routers don't have any functional RPS at all. Also, it doesn't help since CAKE takes the global qdisc lock. Both of those issues should be fixed, ideally :) -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Hi Toke, > On Sep 1, 2020, at 18:11, Toke Høiland-Jørgensen via Bloat > wrote: > > Jonathan Foulkes writes: > >> Toke, that link returns a 404 for me. > > Ah, seems an extra character snuck in at the end - try this: > > https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6 > >> For others, I’ve found that testing cake throughput with isolation options >> enabled is tricky if there are many competing connections. >> Like I keep having to tell my customers, fairness algorithms mean no one >> device will ever gain 100% of the bandwidth so long as there are other open >> & active connections from other devices. >> >> That said, I’d love to find options to increase throughput for >> single-tin configs. > > Yeah, doing something about this is on my list, one way or another. Not > sure how much more we can do in terms of overhead, so we may have to go > for multi-q (and multi-CPU) support. How many CPU cores does the > IQrouter have? It might be worth looking how the typical two cake instances distribute across the available CPUs, in some version of OpenWrt all cake's and ethernet interupt processing crowed up on a single CPU leading to "out of CPU" behaviour with 50% idle remaining... I think that usinf a different RPS scheme might work better. Best Regards Sebastian > > -Toke > ___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
HI Jonathan, > On Sep 1, 2020, at 17:41, Jonathan Foulkes wrote: > > Toke, that link returns a 404 for me. > > For others, I’ve found that testing cake throughput with isolation options > enabled is tricky if there are many competing connections. Are you talking about the fact that with competing connections, you only see the current isolation quantum's equivalent f the actual rate? In that case maybe parse the "tc -s qdisc" output to get an idea how much data/packets cake managed to push through in total in each direction instead of relaying on the measured goodput? I am probably barking up the wrong tree here... > Like I keep having to tell my customers, fairness algorithms mean no one > device will ever gain 100% of the bandwidth so long as there are other open & > active connections from other devices. That sounds like solid advice ;) Especially in the light of the exceedingly useful "ingress" keyword, which under-load-will drop depending on a flow's "unresponsiveness" such that more responsive flows end up getting a somewhat bigger share of the post-cake throughput... > > That said, I’d love to find options to increase throughput for single-tin > configs. With or without isolation options? Best Regards Sebastian > > Cheers, > > Jonathan > >> On Aug 31, 2020, at 7:35 AM, Toke Høiland-Jørgensen via Bloat >> wrote: >> >> Mikael Abrahamsson via Bloat writes: >> >>> Hi, >>> >>> I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as residential >>> router, from my previous WRT1200AC (marvell armada 385). >>> >>> I was running OpenWrt 18.06 on that one, now I am running latest 19.07.3 >>> on the APU2. >>> >>> Before I had 500/100 and I had to use FQ_CODEL because CAKE took too much >>> CPU to be able to do 500/100 on the WRT1200AC. Now I upgraded to 1000/1000 >>> and tried it again, and even the APU2 can only do CAKE up to ~300 >>> megabit/s. With FQ_CODEL I get full speed (configure 900/900 in SQM in >>> OpenWrt). >>> >>> Looking in top, I see sirq% sitting at 50% pegged. This is typical what I >>> see when CPU based forwarding is maxed out. From my recollection of >>> running CAKE on earlier versions of openwrt (17.x) I don't remember CAKE >>> using more CPU than FQ_CODEL. >>> >>> Anyone know what's up? I'm fine running FQ_CODEL, it solves any >>> bufferbloat but... I thought CAKE supposedly should use less CPU, not >>> more? >> >> Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that >> would be FQ-CoDel+HTB)? An exact config might be useful (or just the >> output of tc -s qdisc). >> >> If you are indeed not shaping, maybe you're hitting the issue fixed by this >> commit? >> >> https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n >> >> -Toke >> ___ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > ___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Jonathan Foulkes writes: > Toke, that link returns a 404 for me. Ah, seems an extra character snuck in at the end - try this: https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6 > For others, I’ve found that testing cake throughput with isolation options > enabled is tricky if there are many competing connections. > Like I keep having to tell my customers, fairness algorithms mean no one > device will ever gain 100% of the bandwidth so long as there are other open & > active connections from other devices. > > That said, I’d love to find options to increase throughput for > single-tin configs. Yeah, doing something about this is on my list, one way or another. Not sure how much more we can do in terms of overhead, so we may have to go for multi-q (and multi-CPU) support. How many CPU cores does the IQrouter have? -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Toke, that link returns a 404 for me. For others, I’ve found that testing cake throughput with isolation options enabled is tricky if there are many competing connections. Like I keep having to tell my customers, fairness algorithms mean no one device will ever gain 100% of the bandwidth so long as there are other open & active connections from other devices. That said, I’d love to find options to increase throughput for single-tin configs. Cheers, Jonathan > On Aug 31, 2020, at 7:35 AM, Toke Høiland-Jørgensen via Bloat > wrote: > > Mikael Abrahamsson via Bloat writes: > >> Hi, >> >> I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as residential >> router, from my previous WRT1200AC (marvell armada 385). >> >> I was running OpenWrt 18.06 on that one, now I am running latest 19.07.3 >> on the APU2. >> >> Before I had 500/100 and I had to use FQ_CODEL because CAKE took too much >> CPU to be able to do 500/100 on the WRT1200AC. Now I upgraded to 1000/1000 >> and tried it again, and even the APU2 can only do CAKE up to ~300 >> megabit/s. With FQ_CODEL I get full speed (configure 900/900 in SQM in >> OpenWrt). >> >> Looking in top, I see sirq% sitting at 50% pegged. This is typical what I >> see when CPU based forwarding is maxed out. From my recollection of >> running CAKE on earlier versions of openwrt (17.x) I don't remember CAKE >> using more CPU than FQ_CODEL. >> >> Anyone know what's up? I'm fine running FQ_CODEL, it solves any >> bufferbloat but... I thought CAKE supposedly should use less CPU, not >> more? > > Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that > would be FQ-CoDel+HTB)? An exact config might be useful (or just the > output of tc -s qdisc). > > If you are indeed not shaping, maybe you're hitting the issue fixed by this > commit? > > https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n > > -Toke > ___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Mikael Abrahamsson writes: > On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote: > >> Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that >> would be FQ-CoDel+HTB)? An exact config might be useful (or just the >> output of tc -s qdisc). > > Yeah, I guess I'm also using HTB to get the 900 megabit/s SQM is looking > for. Ah, right, makes more sense :) > If I only use FQ_CODEL to get interface speeds my performance is fine. And what about when you're running CAKE in 'unlimited' mode? >> If you are indeed not shaping, maybe you're hitting the issue fixed by this >> commit? >> >> https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n > > I enabled it just now to get the config. > > qdisc cake 8030: dev eth0 root refcnt 9 bandwidth 900Mbit besteffort > triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw > overhead 0 Hmm, right, you could try no-split-gso as an option as well; you're pretty close to the point where we turn it off by default, and you're getting pretty large packets (max_len), so your performance may be suffering from the splitting... -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote: Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that would be FQ-CoDel+HTB)? An exact config might be useful (or just the output of tc -s qdisc). Yeah, I guess I'm also using HTB to get the 900 megabit/s SQM is looking for. If I only use FQ_CODEL to get interface speeds my performance is fine. If you are indeed not shaping, maybe you're hitting the issue fixed by this commit? https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n I enabled it just now to get the config. qdisc cake 8030: dev eth0 root refcnt 9 bandwidth 900Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 Sent 4346128 bytes 11681 pkt (dropped 0, overlimits 1004 requeues 17) backlog 0b 0p requeues 17 memory used: 33328b of 15140Kb capacity estimate: 900Mbit min/max network layer size: 42 /1514 min/max overhead-adjusted size: 42 /1514 average network hdr offset: 14 Tin 0 thresh900Mbit target 5.0ms interval 100.0ms pk_delay 18us av_delay 6us sp_delay 4us backlog0b pkts11681 bytes 4346128 way_inds 30 way_miss 735 way_cols0 drops 0 marks 0 ack_drop0 sp_flows3 bk_flows1 un_flows0 max_len 22710 quantum 1514 qdisc ingress : dev eth0 parent :fff1 Sent 4716199 bytes 10592 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 ... qdisc cake 8031: dev ifb4eth0 root refcnt 2 bandwidth 900Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 Sent 4946683 bytes 10592 pkt (dropped 0, overlimits 492 requeues 0) backlog 0b 0p requeues 0 memory used: 35Kb of 15140Kb capacity estimate: 900Mbit min/max network layer size: 60 /1514 min/max overhead-adjusted size: 60 /1514 average network hdr offset: 14 Tin 0 thresh900Mbit target 5.0ms interval 100.0ms pk_delay 19us av_delay 6us sp_delay 4us backlog0b pkts10592 bytes 4946683 way_inds 33 way_miss 969 way_cols0 drops 0 marks 0 ack_drop0 sp_flows2 bk_flows1 un_flows0 max_len 21196 quantum 1514 -- Mikael Abrahamssonemail: swm...@swm.pp.se___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
Mikael Abrahamsson via Bloat writes: > Hi, > > I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as residential > router, from my previous WRT1200AC (marvell armada 385). > > I was running OpenWrt 18.06 on that one, now I am running latest 19.07.3 > on the APU2. > > Before I had 500/100 and I had to use FQ_CODEL because CAKE took too much > CPU to be able to do 500/100 on the WRT1200AC. Now I upgraded to 1000/1000 > and tried it again, and even the APU2 can only do CAKE up to ~300 > megabit/s. With FQ_CODEL I get full speed (configure 900/900 in SQM in > OpenWrt). > > Looking in top, I see sirq% sitting at 50% pegged. This is typical what I > see when CPU based forwarding is maxed out. From my recollection of > running CAKE on earlier versions of openwrt (17.x) I don't remember CAKE > using more CPU than FQ_CODEL. > > Anyone know what's up? I'm fine running FQ_CODEL, it solves any > bufferbloat but... I thought CAKE supposedly should use less CPU, not > more? Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that would be FQ-CoDel+HTB)? An exact config might be useful (or just the output of tc -s qdisc). If you are indeed not shaping, maybe you're hitting the issue fixed by this commit? https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n -Toke ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat
Re: [Bloat] CAKE in openwrt high CPU
cake reschedules too much compared to the tweaks we have to keep htb fed, at these rates. It was kind of my hope to gain a hw assist in future versions of the apu series. a programmable completion interrupt is available in some versions of that chipset, On Sun, Aug 30, 2020 at 10:27 AM Mikael Abrahamsson via Bloat wrote: > > > Hi, > > I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as residential > router, from my previous WRT1200AC (marvell armada 385). > > I was running OpenWrt 18.06 on that one, now I am running latest 19.07.3 > on the APU2. > > Before I had 500/100 and I had to use FQ_CODEL because CAKE took too much > CPU to be able to do 500/100 on the WRT1200AC. Now I upgraded to 1000/1000 > and tried it again, and even the APU2 can only do CAKE up to ~300 > megabit/s. With FQ_CODEL I get full speed (configure 900/900 in SQM in > OpenWrt). > > Looking in top, I see sirq% sitting at 50% pegged. This is typical what I > see when CPU based forwarding is maxed out. From my recollection of > running CAKE on earlier versions of openwrt (17.x) I don't remember CAKE > using more CPU than FQ_CODEL. > > Anyone know what's up? I'm fine running FQ_CODEL, it solves any > bufferbloat but... I thought CAKE supposedly should use less CPU, not > more? > > -- > Mikael Abrahamssonemail: swm...@swm.pp.se > ___ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat -- "For a successful technology, reality must take precedence over public relations, for Mother Nature cannot be fooled" - Richard Feynman d...@taht.net CTO, TekLibre, LLC Tel: 1-831-435-0729 ___ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat