[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Thomas Rosenstein via Bloat Wed, 04 Nov 2020 07:23:58 -0800

Hi all,

I'm coming from the lartc mailing list, here's the original text:


=====

I have multiple routers which connect to multiple upstream providers, Ihave noticed a high latency shift in icmp (and generally all connection)if I run b2 upload-file --threads 40 (and I can reproduce this)


What options do I have to analyze why this happens?

General Info:

Routers are connected between each other with 10G Mellanox Connect-Xcards via 10G SPF+ DAC cables via a 10G Switch from fs.com

Latency generally is around 0.18 ms between all routers (4).
Throughput is 9.4 Gbit/s with 0 retransmissions when tested with iperf3.

2 of the 4 routers are connected upstream with a 1G connection (separateport, same network card)All routers have the full internet routing tables, i.e. 80k entries forIPv6 and 830k entries for IPv4

Conntrack is disabled (-j NOTRACK)
Kernel 5.4.60 (custom)
2x Xeon X5670 @ 2.93 Ghz
96 GB RAM
No Swap
CentOs 7

During high latency:

Latency on routers which have the traffic flow increases to 12 - 20 ms,for all interfaces, moving of the stream (via bgp disable session) movesalso the high latency

iperf3 performance plumets to 300 - 400 MBits
CPU load (user / system) are around 0.1%
Ram Usage is around 3 - 4 GB
if_packets count is stable (around 8000 pkt/s more)

for b2 upload-file with 10 threads I can achieve 60 MB/s consistently,with 40 threads the performance drops to 8 MB/s

I do not believe that 40 tcp streams should be any problem for a machineof that size.

Thanks for any ideas, help, pointers, things I can verify / check /provide additional!


=======


So far I have tested:

1) Use Stock Kernel 3.10.0-541 -> issue does not happen
2) setup fq_codel on the interfaces:

Here is the tc -s qdisc output:

qdisc fq_codel 8005: dev eth4 root refcnt 193 limit 10240p flows 1024quantum 1514 target 5.0ms interval 100.0ms ecnSent 8374229144 bytes 10936167 pkt (dropped 0, overlimits 0 requeues6127)

 backlog 0b 0p requeues 6127
  maxpacket 25398 drop_overlimit 0 new_flow_count 15441 ecn_mark 0
  new_flows_len 0 old_flows_len 0

qdisc fq_codel 8008: dev eth5 root refcnt 193 limit 10240p flows 1024quantum 1514 target 5.0ms interval 100.0ms ecnSent 1072480080 bytes 1012973 pkt (dropped 0, overlimits 0 requeues735)

 backlog 0b 0p requeues 735
  maxpacket 19682 drop_overlimit 0 new_flow_count 15963 ecn_mark 0
  new_flows_len 0 old_flows_len 0

qdisc fq_codel 8004: dev eth4.2300 root refcnt 2 limit 10240p flows 1024quantum 1514 target 5.0ms interval 100.0ms ecnSent 8441021899 bytes 11021070 pkt (dropped 0, overlimits 0 requeues0)

 backlog 0b 0p requeues 0
  maxpacket 68130 drop_overlimit 0 new_flow_count 257055 ecn_mark 0
  new_flows_len 0 old_flows_len 0

qdisc fq_codel 8006: dev eth5.2501 root refcnt 2 limit 10240p flows 1024quantum 1514 target 5.0ms interval 100.0ms ecn

 Sent 571984459 bytes 2148377 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 7570 drop_overlimit 0 new_flow_count 11300 ecn_mark 0
  new_flows_len 0 old_flows_len 0

qdisc fq_codel 8007: dev eth5.2502 root refcnt 2 limit 10240p flows 1024quantum 1514 target 5.0ms interval 100.0ms ecn

 Sent 1401322222 bytes 1966724 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 19682 drop_overlimit 0 new_flow_count 76653 ecn_mark 0
  new_flows_len 0 old_flows_len 0

I have no statistics / metrics that would point to a slow down on theserver, cpu / load / network / packets / memory all show normal very lowload.Is there other, (hidden) metrics I can collect to analyze this issuefurther?


Thanks
Thomas




_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Reply via email to