Re: [Bloat] CAKE in openwrt high CPU

2020-09-04 Thread Sebastian Moeller
Hi Mikael,

Thanks! That looks like a fully saturated core, no? I do not know how to parse 
the symbols here, so not sure what "class" of load is denoted by the star, but 
I would guess something including sirqs? Anyway the average is ~49% load, while 
clearly CPU is pegged already. I assume the htop data is from the HGW...

best regards
Sebastian

> On Sep 4, 2020, at 15:37, Mikael Abrahamsson  wrote:
> 
> On Thu, 3 Sep 2020, Sebastian Moeller wrote:
> 
>>  Mmmh, how did you measure the sirq percentage? Some top versions show 
>> overall percentage with 100% meaning all CPUs, so 35% in a quadcore could 
>> mean 1 fully maxed out CPU (25%) plus an additional 10% spread over the 
>> other three, or something more benign. Better top (so not busybox's) or htop 
>> versions also can show the load per CPU which is helpful to pinpoint 
>> hotspots...
> 
> If I run iperf3 with 10 parallel sessions then htop shows this (in the CAKE 
> upstream direction I believe):
> 
>  1  [*
>   0.7%]   Tasks: 19, 0 thr; 2 running
>  2  
> [*100.0%]
>Load average: 0.48 0.16 0.05
>  3  [#*** 
>  44.4%]   Uptime: 10 days, 04:46:37
>  4  [ 
>  54.2%]
>  Mem[|#*  
>36.7M/3.84G]
>  Swp[ 
>  0K/0K]
> 
> The other direction (-R), typically this:
> 
> 1  [#***  
> 13.0%]   Tasks: 19, 0 thr; 2 running
> 2  [***   
> 53.9%]   Load average: 0.54 0.25 0.09
> 3  [#*
> 55.8%]   Uptime: 10 days, 04:47:36
> 4  
> [**   
>  84.4%]
> 
> Topology is:
> 
> PC - HGW -> Internet
> 
> iperf3 is run on the PC, HGW has CAKE in the -> Internet direction.
> 
>> Best Regards
>>  Sebastian
>> 
>>> 
>>> root@OpenWrt:~# tc -s qdisc
>>> qdisc noqueue 0: dev lo root refcnt 2
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 
>>> triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
>>> overhead 0
>>> Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 
>>> 179)
>>> backlog 0b 0p requeues 179
>>> memory used: 2751976b of 15140Kb
>>> capacity estimate: 900Mbit
>>> min/max network layer size:   42 /1514
>>> min/max overhead-adjusted size:   42 /1514
>>> average network hdr offset:   14
>>> 
>>>  Bulk  Best EffortVoice
>>> thresh  56250Kbit  900Mbit  225Mbit
>>> target  5.0ms5.0ms5.0ms
>>> interval  100.0ms  100.0ms  100.0ms
>>> pk_delay  0us 22us232us
>>> av_delay  0us  6us  7us
>>> sp_delay  0us  4us  5us
>>> backlog0b   0b   0b
>>> pkts0   959747   90
>>> bytes   0   93543739440
>>> way_inds0229640
>>> way_miss0  2752
>>> way_cols000
>>> drops   0  1340
>>> marks   000
>>> ack_drop000
>>> sp_flows031
>>> bk_flows010
>>> un_flows000
>>> max_len 068130 3714
>>> quantum  1514 1514 1514
>>> 
>>> 
>>> --
>>> Mikael Abrahamssonemail: 
>>> swm...@swm.pp.se___
>>> Bloat mailing list
>>> Bloat@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>> 
> 
> -- 
> Mikael Abrahamssonemail: swm...@swm.pp.se

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Other CAKE territory (was: CAKE in openwrt high CPU)

2020-09-04 Thread Mikael Abrahamsson via Bloat

On Fri, 4 Sep 2020, Jonathan Morton wrote:

We're usually seeing problems with the smaller-scale CPUs found in CPE 
SoCs, which are very much geared to take advantage of hardware 
accelerated packet forwarding.  I think in some cases there might 
actually be insufficient internal I/O bandwidth to get 1Gbps out of the 
NIC, into the CPU, and back out to the NIC again, only through the 
dedicated forwarding path.  That could manifest itself as a lot of 
kernel time spent waiting for the hardware, and can only really be 
solved by redesigning the hardware.


There are lots of SoCs where CPU routing results in ~100 megabit/s of 
throughput, whilst the HW offload engine is perfectly capable of full gig 
speeds. MT7621 being one that actually is supported in OpenWrt.


--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-04 Thread Mikael Abrahamsson via Bloat

On Thu, 3 Sep 2020, Sebastian Moeller wrote:

	Mmmh, how did you measure the sirq percentage? Some top versions 
show overall percentage with 100% meaning all CPUs, so 35% in a quadcore 
could mean 1 fully maxed out CPU (25%) plus an additional 10% spread 
over the other three, or something more benign. Better top (so not 
busybox's) or htop versions also can show the load per CPU which is 
helpful to pinpoint hotspots...


If I run iperf3 with 10 parallel sessions then htop shows this (in the 
CAKE upstream direction I believe):


  1  [* 
 0.7%]   Tasks: 19, 0 thr; 2 running
  2  
[*100.0%]
   Load average: 0.48 0.16 0.05
  3  [#***  
44.4%]   Uptime: 10 days, 04:46:37
  4  [  
54.2%]
  Mem[|#*   
  36.7M/3.84G]
  Swp[  
0K/0K]

The other direction (-R), typically this:

 1  [#***   
   13.0%]   Tasks: 19, 0 thr; 2 running
 2  [***
   53.9%]   Load average: 0.54 0.25 0.09
 3  [#* 
   55.8%]   Uptime: 10 days, 04:47:36
 4  [** 
   84.4%]

Topology is:

PC - HGW -> Internet

iperf3 is run on the PC, HGW has CAKE in the -> Internet direction.


Best Regards
Sebastian



root@OpenWrt:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 179)
backlog 0b 0p requeues 179
memory used: 2751976b of 15140Kb
capacity estimate: 900Mbit
min/max network layer size:   42 /1514
min/max overhead-adjusted size:   42 /1514
average network hdr offset:   14

  Bulk  Best EffortVoice
 thresh  56250Kbit  900Mbit  225Mbit
 target  5.0ms5.0ms5.0ms
 interval  100.0ms  100.0ms  100.0ms
 pk_delay  0us 22us232us
 av_delay  0us  6us  7us
 sp_delay  0us  4us  5us
 backlog0b   0b   0b
 pkts0   959747   90
 bytes   0   93543739440
 way_inds0229640
 way_miss0  2752
 way_cols000
 drops   0  1340
 marks   000
 ack_drop000
 sp_flows031
 bk_flows010
 un_flows000
 max_len 068130 3714
 quantum  1514 1514 1514


--
Mikael Abrahamssonemail: 
swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat




--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Other CAKE territory (was: CAKE in openwrt high CPU)

2020-09-04 Thread Toke Høiland-Jørgensen via Bloat
David Collier-Brown  writes:

> On 2020-09-03 10:32 a.m., Toke Høiland-Jørgensen via Bloat wrote
>
>> Yeah, offloading of some sort is another option, but I consider that
>> outside of the "CAKE stays relevant" territory, since that will most
>> likely involve an entirely programmable packet scheduler. There was some
>> discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1]
>> algorithm seems promising.
>>
>> -Toke
>
> I'm wondering if edge servers with 1Gb NICs are inside the "CAKE stays 
> relevant" territory?
>
> My main customer/employer has a gazillion of those, currently reporting
>
> **
>
> *qdisc mq 0: root*
>
> *
>
> qdisc pfifo_fast 0: parent :8 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 
> 1 1 1
>
> ...
>
> *
>
> because their OS is just a tiny bit elderly (;-)). We we're planning to 
> roll forward this quarter to centos 8.2, where CAKE is an option.
>
> It strikes me that the self-tuning capacity of CAKE might be valuable 
> for a whole /class/ of small rack-mounted machines, but you just 
> mentioned the desire for better multi-processor support.
>
> Am I reaching for the moon, or is this something within reach?

As Jonathan says, servers mostly have enough CPU that running at 1gbps
is not an issue. And especially if you're not shaping, running CAKE in
unlimited mode should not be an issue.

However, do consider what you're trying to achieve here. Most of the
specific features of CAKE are targeting gateway routers. For instance,
for a server you may be better off with sch_fq to also get efficient
pacing support. Depends on what the server is doing...

But please, get rid of pfifo_fast! Anything is better than that! ;)

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat