On 6 Nov 2020, at 15:13, Jesper Dangaard Brouer wrote:
On Fri, 6 Nov 2020 13:53:58 +0100
Jesper Dangaard Brouer <bro...@redhat.com> wrote:
[...]
Could this be related to netlink? I have gobgpd running on these
routers, which injects routes via netlink.
But the churn rate during the tests is very minimal, maybe 30 - 40
routes every second.
Yes, this could be related. The internal data-structure for FIB
lookups is a fibtrie which is a compressed patricia tree, related to
radix tree idea. Thus, I can imagine that the kernel have to
rebuild/rebalance the tree with all these updates.
Reading the kernel code. The IPv4 fib_trie code is very well tuned,
fully RCU-ified, meaning read-side is lock-free. The resize()
function
code in net//ipv4/fib_trie.c have max_work limiter to avoid it uses
too
much time. And the update looks lockfree.
The IPv6 update looks more scary, as it seems to take a "bh" spinlock
that can block softirq from running code in net/ipv6/ip6_fib.c
(spin_lock_bh(&f6i->fib6_table->tb6_lock).
I'm using ping on IPv4, but I'll try to see if IPv6 makes any
difference!
Have you tried to use 'perf record' to observe that is happening on
the system while these latency incidents happen? (let me know if you
want some cmdline hints)
Haven't tried this yet. If you have some hints what events to monitor
I'll take them!
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat