On 6 Nov 2020, at 15:13, Jesper Dangaard Brouer wrote:

On Fri, 6 Nov 2020 13:53:58 +0100
Jesper Dangaard Brouer <bro...@redhat.com> wrote:

[...]

Could this be related to netlink? I have gobgpd running on these
routers, which injects routes via netlink.
But the churn rate during the tests is very minimal, maybe 30 - 40
routes every second.

Yes, this could be related.  The internal data-structure for FIB
lookups is a fibtrie which is a compressed patricia tree, related to
radix tree idea.  Thus, I can imagine that the kernel have to
rebuild/rebalance the tree with all these updates.

Reading the kernel code. The IPv4 fib_trie code is very well tuned,
fully RCU-ified, meaning read-side is lock-free. The resize() function code in net//ipv4/fib_trie.c have max_work limiter to avoid it uses too
much time.  And the update looks lockfree.

The IPv6 update looks more scary, as it seems to take a "bh" spinlock
that can block softirq from running code in net/ipv6/ip6_fib.c
(spin_lock_bh(&f6i->fib6_table->tb6_lock).

I'm using ping on IPv4, but I'll try to see if IPv6 makes any difference!


Have you tried to use 'perf record' to observe that is happening on the system while these latency incidents happen? (let me know if you want some cmdline hints)

Haven't tried this yet. If you have some hints what events to monitor I'll take them!


--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to