Re: [Bloat] Other CAKE territory (was: CAKE in openwrt high CPU)

2020-09-03 Thread Jonathan Morton
> On 4 Sep, 2020, at 1:14 am, David Collier-Brown  wrote:
> 
> I'm wondering if edge servers with 1Gb NICs are inside the "CAKE stays 
> relevant" territory?  

Edge servers usually have strong enough CPUs and I/O - by which I mean anything 
from AMD K8 and Intel Core 2 onwards with PCIe attached NICs - to run Cake at 
1Gbps without needing special measures.  I should run a test to see how much I 
can shove through an AMD Bobcat these days - not exactly a speed demon.

We're usually seeing problems with the smaller-scale CPUs found in CPE SoCs, 
which are very much geared to take advantage of hardware accelerated packet 
forwarding.  I think in some cases there might actually be insufficient 
internal I/O bandwidth to get 1Gbps out of the NIC, into the CPU, and back out 
to the NIC again, only through the dedicated forwarding path.  That could 
manifest itself as a lot of kernel time spent waiting for the hardware, and can 
only really be solved by redesigning the hardware.

 - Jonathan Morton

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


[Bloat] Other CAKE territory (was: CAKE in openwrt high CPU)

2020-09-03 Thread David Collier-Brown

On 2020-09-03 10:32 a.m., Toke Høiland-Jørgensen via Bloat wrote


Yeah, offloading of some sort is another option, but I consider that
outside of the "CAKE stays relevant" territory, since that will most
likely involve an entirely programmable packet scheduler. There was some
discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1]
algorithm seems promising.

-Toke


I'm wondering if edge servers with 1Gb NICs are inside the "CAKE stays 
relevant" territory?


My main customer/employer has a gazillion of those, currently reporting

**

*qdisc mq 0: root*

*

qdisc pfifo_fast 0: parent :8 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 
1 1 1


...

*

because their OS is just a tiny bit elderly (;-)). We we're planning to 
roll forward this quarter to centos 8.2, where CAKE is an option.


It strikes me that the self-tuning capacity of CAKE might be valuable 
for a whole /class/ of small rack-mounted machines, but you just 
mentioned the desire for better multi-processor support.


Am I reaching for the moon, or is this something within reach?

--dave

--

--
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
dav...@spamcop.net   |  -- Mark Twain

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Jonathan Morton
> On 3 Sep, 2020, at 5:32 pm, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> Yeah, offloading of some sort is another option, but I consider that
> outside of the "CAKE stays relevant" territory, since that will most
> likely involve an entirely programmable packet scheduler.

Offload of *just* shaping could be valuable in itself at higher rates, when 
combined with BQL, as it would avoid having to interact with the CPU-side timer 
infrastructure so much.  It would also not be difficult at all to implement in 
hardware at line rate, even with overhead compensation.  It's the sort of thing 
you could sensibly do with 74-series logic and a lookup table in a cheap SRAM, 
up to millions of PPS, and considerably faster in FPGA or ASIC territory.

I think that's what the questions about combining "unlimited Cake" with some 
other shaper are angling towards, though I suspect that the way Cake's shaper 
is integrated is still better than having an external one in software.

With that said, it's also possible that something a bit lighter than Cake might 
be appropriate at cable speeds.  There is background work in this general area 
going on, so don't despair.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Toke Høiland-Jørgensen via Bloat


On 3 September 2020 17:31:07 CEST, Luca Muscariello  
wrote:
>On Thu, Sep 3, 2020 at 4:32 PM Toke Høiland-Jørgensen 
>wrote:
>>
>> Luca Muscariello  writes:
>>
>> > On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat
>> >  wrote:
>> >>
>> >> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
>> >>
>> >> > Yup, the number of cores is only going to go up, so for CAKE to
>stay
>> >> > relevant it'll need to be able to take advantage of this
>eventually :)
>> >>
>> >> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting
>platform,
>> >> it has a quad core machine with 2 x 2.5GbE NICs.
>> >>
>> >> When using something like this for routing with HTB+CAKE for
>bidirectional
>> >> shaping below line rate, what would be the main things that would
>need to
>> >> be improved?
>> >
>> > IMO, hardware offloading for shaping, beyond this specific
>platform.
>> > I ignore if there is any roadmap with that objective.
>>
>> Yeah, offloading of some sort is another option, but I consider that
>> outside of the "CAKE stays relevant" territory, since that will most
>> likely involve an entirely programmable packet scheduler. There was
>some
>> discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1]
>> algorithm seems promising.
>>
>> -Toke
>>
>> [0] https://linuxplumbersconf.org/event/7/contributions/679/
>> [1] https://www.usenix.org/conference/nsdi19/presentation/saeed
>
>These are all interesting efforts for scheduling but orthogonal to
>shaping
>and not going to help make shaping more scalable.

Eiffel says it can do shaping by way of a global calendar queue... Planning to 
put that to the test :)

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Luca Muscariello
On Thu, Sep 3, 2020 at 4:32 PM Toke Høiland-Jørgensen  wrote:
>
> Luca Muscariello  writes:
>
> > On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat
> >  wrote:
> >>
> >> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
> >>
> >> > Yup, the number of cores is only going to go up, so for CAKE to stay
> >> > relevant it'll need to be able to take advantage of this eventually :)
> >>
> >> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform,
> >> it has a quad core machine with 2 x 2.5GbE NICs.
> >>
> >> When using something like this for routing with HTB+CAKE for bidirectional
> >> shaping below line rate, what would be the main things that would need to
> >> be improved?
> >
> > IMO, hardware offloading for shaping, beyond this specific platform.
> > I ignore if there is any roadmap with that objective.
>
> Yeah, offloading of some sort is another option, but I consider that
> outside of the "CAKE stays relevant" territory, since that will most
> likely involve an entirely programmable packet scheduler. There was some
> discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1]
> algorithm seems promising.
>
> -Toke
>
> [0] https://linuxplumbersconf.org/event/7/contributions/679/
> [1] https://www.usenix.org/conference/nsdi19/presentation/saeed

These are all interesting efforts for scheduling but orthogonal to shaping
and not going to help make shaping more scalable.
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Toke Høiland-Jørgensen via Bloat
Luca Muscariello  writes:

> On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat
>  wrote:
>>
>> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
>>
>> > Yup, the number of cores is only going to go up, so for CAKE to stay
>> > relevant it'll need to be able to take advantage of this eventually :)
>>
>> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform,
>> it has a quad core machine with 2 x 2.5GbE NICs.
>>
>> When using something like this for routing with HTB+CAKE for bidirectional
>> shaping below line rate, what would be the main things that would need to
>> be improved?
>
> IMO, hardware offloading for shaping, beyond this specific platform.
> I ignore if there is any roadmap with that objective.

Yeah, offloading of some sort is another option, but I consider that
outside of the "CAKE stays relevant" territory, since that will most
likely involve an entirely programmable packet scheduler. There was some
discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1]
algorithm seems promising.

-Toke

[0] https://linuxplumbersconf.org/event/7/contributions/679/
[1] https://www.usenix.org/conference/nsdi19/presentation/saeed
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Sebastian Moeller
Ho Toke,

> On Sep 3, 2020, at 15:29, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> Mikael Abrahamsson  writes:
> 
>> On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:
>> 
>>> And what about when you're running CAKE in 'unlimited' mode?
>> 
>> I tried this:
>> 
>> # tc qdisc add dev eth0 root cake bandwidth 900mbit
> 
> So the difference from before is just the lack of inbound shaping, or?

Good point, so worst-case just half the load to handle, indicating that 
a single CPU is sufficient for gigabit shaping, but not for dual-gigabit 
shaping, no?

Best Regards
Sebastian


> 
> -Toke
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Sebastian Moeller
Hi Mikael,



> On Sep 3, 2020, at 15:10, Mikael Abrahamsson via Bloat 
>  wrote:
> 
> On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:
> 
>> And what about when you're running CAKE in 'unlimited' mode?
> 
> I tried this:
> 
> # tc qdisc add dev eth0 root cake bandwidth 900mbit

That still employs the cake shaper, so is not equivalent with 
unlimited, I believe.

[PEDANT_MODE]

900 Mbps without explicit overhead will result in a typical maximum TCP/IPv4 
goodput of

900 * ((1500-20-20)/(1500+14)) = 867.899603699 Mbps
but since ethernet overhead is actually 38 bytes instead of 14 this actually 
occupies 

(900 * ((1500-20-20)/(1500+14))) * ((1500+38)/(1500-20-20)) = 914.266842801 on 
the ethernet link

which for small packets will become problematic:
(900 * ((150-20-20)/(100+14))) * ((150+38)/(150-20-20)) = 1484.21052632 Mbps 
gross speed out of the 1000.0 Gigabit ethernet offers.

in fact, packet sizes below 202 will spend all the "credit" you got from 
reducing the shaper rate to 900 Mbps in the first place.
(900 * ((202-20-20)/(202 +14))) * ((202 +38)/(202-20-20)) = 1000  

Maybe tell cake that you run on ethernet by adding the "ethernet keyword" which 
will both take care of the per-packet overhead of 38 bytes and the minimum 
packet size on the link of 88 bytes?

Please note that for throughput this does not really matter that much, but 
latency-under-load is not going to be pretty when too many small packets are in 
flight...

[/PEDANT_MODE]


> 
> This seems fine from a performance point of view (not that high sirq%, around 
> 35%) and does seem to limit my upstream traffic correctly. Not sure it helps 
> though, at these speeds the bufferbloat problem is not that obvious and easy 
> to test over the Internet :)

Mmmh, how did you measure the sirq percentage? Some top versions show 
overall percentage with 100% meaning all CPUs, so 35% in a quadcore could mean 
1 fully maxed out CPU (25%) plus an additional 10% spread over the other three, 
or something more benign. Better top (so not busybox's) or htop versions also 
can show the load per CPU which is helpful to pinpoint hotspots...

Best Regards
Sebastian

> 
> root@OpenWrt:~# tc -s qdisc
> qdisc noqueue 0: dev lo root refcnt 2
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 
> triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
> Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 179)
> backlog 0b 0p requeues 179
> memory used: 2751976b of 15140Kb
> capacity estimate: 900Mbit
> min/max network layer size:   42 /1514
> min/max overhead-adjusted size:   42 /1514
> average network hdr offset:   14
> 
>   Bulk  Best EffortVoice
>  thresh  56250Kbit  900Mbit  225Mbit
>  target  5.0ms5.0ms5.0ms
>  interval  100.0ms  100.0ms  100.0ms
>  pk_delay  0us 22us232us
>  av_delay  0us  6us  7us
>  sp_delay  0us  4us  5us
>  backlog0b   0b   0b
>  pkts0   959747   90
>  bytes   0   93543739440
>  way_inds0229640
>  way_miss0  2752
>  way_cols000
>  drops   0  1340
>  marks   000
>  ack_drop000
>  sp_flows031
>  bk_flows010
>  un_flows000
>  max_len 068130 3714
>  quantum  1514 1514 1514
> 
> 
> -- 
> Mikael Abrahamssonemail: 
> swm...@swm.pp.se___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Toke Høiland-Jørgensen via Bloat
Mikael Abrahamsson  writes:

> On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:
>
>> And what about when you're running CAKE in 'unlimited' mode?
>
> I tried this:
>
> # tc qdisc add dev eth0 root cake bandwidth 900mbit

So the difference from before is just the lack of inbound shaping, or?

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Toke Høiland-Jørgensen via Bloat
Mikael Abrahamsson  writes:

> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
>
>> Yup, the number of cores is only going to go up, so for CAKE to stay 
>> relevant it'll need to be able to take advantage of this eventually :)
>
> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, 
> it has a quad core machine with 2 x 2.5GbE NICs.
>
> When using something like this for routing with HTB+CAKE for bidirectional 
> shaping below line rate, what would be the main things that would need to 
> be improved?

The aforementioned multi-processor support...

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Luca Muscariello
On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat
 wrote:
>
> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
>
> > Yup, the number of cores is only going to go up, so for CAKE to stay
> > relevant it'll need to be able to take advantage of this eventually :)
>
> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform,
> it has a quad core machine with 2 x 2.5GbE NICs.
>
> When using something like this for routing with HTB+CAKE for bidirectional
> shaping below line rate, what would be the main things that would need to
> be improved?

IMO, hardware offloading for shaping, beyond this specific platform.
I ignore if there is any roadmap with that objective.

>
> --
> Mikael Abrahamssonemail: 
> swm...@swm.pp.se___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Mikael Abrahamsson via Bloat

On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:

Yup, the number of cores is only going to go up, so for CAKE to stay 
relevant it'll need to be able to take advantage of this eventually :)


https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, 
it has a quad core machine with 2 x 2.5GbE NICs.


When using something like this for routing with HTB+CAKE for bidirectional 
shaping below line rate, what would be the main things that would need to 
be improved?


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Mikael Abrahamsson via Bloat

On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:


And what about when you're running CAKE in 'unlimited' mode?


I tried this:

# tc qdisc add dev eth0 root cake bandwidth 900mbit

This seems fine from a performance point of view (not that high sirq%, 
around 35%) and does seem to limit my upstream traffic correctly. Not sure 
it helps though, at these speeds the bufferbloat problem is not that 
obvious and easy to test over the Internet :)


root@OpenWrt:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 
179)

 backlog 0b 0p requeues 179
 memory used: 2751976b of 15140Kb
 capacity estimate: 900Mbit
 min/max network layer size:   42 /1514
 min/max overhead-adjusted size:   42 /1514
 average network hdr offset:   14

   Bulk  Best EffortVoice
  thresh  56250Kbit  900Mbit  225Mbit
  target  5.0ms5.0ms5.0ms
  interval  100.0ms  100.0ms  100.0ms
  pk_delay  0us 22us232us
  av_delay  0us  6us  7us
  sp_delay  0us  4us  5us
  backlog0b   0b   0b
  pkts0   959747   90
  bytes   0   93543739440
  way_inds0229640
  way_miss0  2752
  way_cols000
  drops   0  1340
  marks   000
  ack_drop000
  sp_flows031
  bk_flows010
  un_flows000
  max_len 068130 3714
  quantum  1514 1514 1514


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat