A fix to the static_calls series (on which this series depends), and a really hacky proof-of-concept of runtime-patched branch trees of static_calls to avoid indirect calls / retpolines in the hot-path. Rather than any generally applicable machinery, the patch just open-codes it for one call site (the pt_prev->func() call in deliver_skb and __netif_receive_skb_one_core()); it should however be possible to make a macro that takes a 'name' parameter and expands to the whole thing. Also the _update() function could be shared and get something useful from its work_struct, rather than needing a separate copy of the function for every indirect call site.
Performance testing so far has been somewhat inconclusive; I applied this on net-next, hacked up my Kconfig to use out-of-line static calls on x86-64, and ran some 1-byte UDP stream tests with the DUT receiving. On a single stream test, I saw packet rate go up by 7%, from 470Kpps to 504Kpps, with a considerable reduction in variance; however, CPU usage increased by a larger factor: (packet rate / RX cpu) is a much lower-variance measurement and went down by 13%. This however may be because it often got into a state where, while patching the calls (and thus sending all callers down the slow path) we continue to gather stats and see enough calls to trigger another update; as there's no code to detect and skip an update that doesn't change anything, we get into a tight loop of redoing updates. I am working on this & plan to change it to not collect any stats while an update is actually in progress. On a 4-stream test, the variance I saw was too high to draw any conclusions; the packet rate went down about 2½% but this was not statistically significant (and the fastest run I saw was with dynamic calls present). Edward Cree (2): static_call: fix out-of-line static call implementation net: core: rather hacky PoC implementation of dynamic calls include/linux/static_call.h | 6 +- net/core/dev.c | 222 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 221 insertions(+), 7 deletions(-)