On 16 Nov 2020, at 13:34, Jesper Dangaard Brouer wrote:

On Wed, 04 Nov 2020 16:23:12 +0100
Thomas Rosenstein via Bloat <bloat@lists.bufferbloat.net> wrote:

[...]
I have multiple routers which connect to multiple upstream providers, I have noticed a high latency shift in icmp (and generally all connection)
if I run b2 upload-file --threads 40 (and I can reproduce this)

What options do I have to analyze why this happens?

General Info:

Routers are connected between each other with 10G Mellanox Connect-X
cards via 10G SPF+ DAC cables via a 10G Switch from fs.com
Latency generally is around 0.18 ms between all routers (4).
Throughput is 9.4 Gbit/s with 0 retransmissions when tested with iperf3. 2 of the 4 routers are connected upstream with a 1G connection (separate
port, same network card)
All routers have the full internet routing tables, i.e. 80k entries for
IPv6 and 830k entries for IPv4
Conntrack is disabled (-j NOTRACK)
Kernel 5.4.60 (custom)
2x Xeon X5670 @ 2.93 Ghz

I think I have spotted your problem... This CPU[1] Xeon X5670 is more
than 10 years old! It basically corresponds to the machines I used for my presentation at LinuxCon 2009 see slides[2]. Only with large frames
and with massive scaling across all CPUs was I able to get close to
10Gbit/s through these machines.  And on top I had to buy low-latency
RAM memory-blocks to make it happen.

As you can see on my slides[2], memory bandwidth and PCIe speeds was at
the limit for making it possible on the hardware level.  I had to run
DDR3 memory at 1333MHz and tune the QuickPath Interconnect (QPI) to
6.4GT/s (default 4.8GT/s).

This generation Motherboards had both PCIe gen-1 and gen-2 slots. Only the PCIe gen-2 slots had barely enough bandwidth. Maybe you physically
placed NIC in PCIe gen-1 slot?

On top of this, you also have a NUMA system, 2x Xeon X5670, which can
result is A LOT of "funny" issue, that is really hard to troubleshoot...


Yes, I'm aware of the limits of what to expect, but as we agree 60 tcp streams with not even 200 Mbits shouldn't overload the PCIex bus or the cpus.

Also, don't forget, no issues with Kernel 3.10.

PCI slot is a Gen2, x8, so more than enough bandwidth there luckily ;)

But yes, they are quite old...


[1] https://ark.intel.com/content/www/us/en/ark/products/47920/intel-xeon-processor-x5670-12m-cache-2-93-ghz-6-40-gt-s-intel-qpi.html

[2] https://people.netfilter.org/hawk/presentations/LinuxCon2009/LinuxCon2009_JesperDangaardBrouer_final.pdf

--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to