On 4 Nov 2020, at 17:10, Toke Høiland-Jørgensen wrote:

Thomas Rosenstein via Bloat <bloat@lists.bufferbloat.net> writes:

Hi all,

I'm coming from the lartc mailing list, here's the original text:

=====

I have multiple routers which connect to multiple upstream providers, I have noticed a high latency shift in icmp (and generally all connection)
if I run b2 upload-file --threads 40 (and I can reproduce this)

What options do I have to analyze why this happens?

General Info:

Routers are connected between each other with 10G Mellanox Connect-X
cards via 10G SPF+ DAC cables via a 10G Switch from fs.com
Latency generally is around 0.18 ms between all routers (4).
Throughput is 9.4 Gbit/s with 0 retransmissions when tested with iperf3. 2 of the 4 routers are connected upstream with a 1G connection (separate
port, same network card)
All routers have the full internet routing tables, i.e. 80k entries for
IPv6 and 830k entries for IPv4
Conntrack is disabled (-j NOTRACK)
Kernel 5.4.60 (custom)
2x Xeon X5670 @ 2.93 Ghz
96 GB RAM
No Swap
CentOs 7

During high latency:

Latency on routers which have the traffic flow increases to 12 - 20 ms, for all interfaces, moving of the stream (via bgp disable session) moves
also the high latency
iperf3 performance plumets to 300 - 400 MBits
CPU load (user / system) are around 0.1%
Ram Usage is around 3 - 4 GB
if_packets count is stable (around 8000 pkt/s more)

I'm not sure I get you topology. Packets are going from where to where,
and what link is the bottleneck for the transfer you're doing? Are you
measuring the latency along the same path?

Have you tried running 'mtr' to figure out which hop the latency is at?

I tried to draw the topology, I hope this is okay and explains betters what's happening:

https://drive.google.com/file/d/15oAsxiNfsbjB9a855Q_dh6YvFZBDdY5I/view?usp=sharing

There is definitly no bottleneck in any of the links, the maximum on any link is 16k packets/sec and around 300 Mbit/s.
In the iperf3 tests I can easily get up to 9.4 Gbit/s

So it must be something in the kernel tacking on a delay, I could try to do a bisect and build like 10 kernels :)


Here is the tc -s qdisc output:

This indicates ("dropped 0" and "ecn_mark 0") that there's no
backpressure on the qdisc, so something else is going on.

Also, you said the issue goes away if you downgrade the kernel? That
does sound odd...

Yes, indeed. I have only recently upgraded the kernel to the 5.4.60 and haven't had the issue before.


-Toke
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to