Le 26/09/2022 à 08:43, Benjamin Gray a écrit : > Implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI. > Static calls patch an indirect branch into a direct branch at runtime. > Out-of-line specifically has a caller directly call a trampoline, and > the trampoline gets patched to directly call the target. > > Previous version here: > https://lore.kernel.org/all/20220916062330.430468-1-bg...@linux.ibm.com/ > > I couldn't see a dedicated ftrace benchmark in the kernel, but my own > benchmarking showed no significant impact to ftrace activation.
I use the following hack for benchmarking: diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 439e2ab6905e..e7d0d3deb8bf 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -2628,10 +2628,11 @@ void __weak ftrace_replace_code(int mod_flags) bool enable = mod_flags & FTRACE_MODIFY_ENABLE_FL; int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL; int failed; + int t0; if (unlikely(ftrace_disabled)) return; - +t0 = mftb(); do_for_each_ftrace_rec(pg, rec) { if (rec->flags & FTRACE_FL_DISABLED) @@ -2646,6 +2647,8 @@ void __weak ftrace_replace_code(int mod_flags) if (schedulable) cond_resched(); } while_for_each_ftrace_rec(); +t0 = mftb() - t0; +pr_err("%s: %d\n", __func__, t0); } struct ftrace_rec_iter { > > The __patch_memory function is meant to be accessed through the size checking > patch_memory wrapper. I don't think there's a way to expose the macro without > also exposing __patch_memory though. I considered making the type an explicit > macro param, but using the value type seemed more ergonomic. > > V2: > Mostly accounting for feedback from Christophe: > * Code patching rewritten > - Rename to *_memory > - Use __always_inline to get the compiler to realise it can > collapse all the sub-functions > - Pass data directly instead of through a pointer, elliding a redundant > load > - Flush the last byte of data too (technically redundant if an > instrucion, but > saves a conditional branch + the isync will be the bottleneck). > - Handle a non-cohenrent icache, assume a coherent dcache > - Handle when we don't assume a 64 byte icache on 64-bits > - Flatten the poke address init and teardown > - Check the data size in patch_memory at build time > (inline function was suggested, but a macro makes checking > based on the data type easier). > - It builds now on 32 bit and without strict RWX > * Static call enabling is no longer configurable > * Refactored arch_static_call_transform to minimise casting > * Made the KUnit tests more robust (previously they changed non-volatile > registers in the init hook, but that's incorrect because it returns to > the KUnit framework before the test case is called). > * Some other minor refactoring in other patches > > > Benjamin Gray (6): > powerpc/code-patching: Implement generic text patching function > powerpc/module: Handle caller-saved TOC in module linker > powerpc/module: Optimise nearby branches in ELF V2 ABI stub > static_call: Move static call selftest to static_call_selftest.c > powerpc/64: Add support for out-of-line static calls > powerpc/64: Add tests for out-of-line static calls > > arch/powerpc/Kconfig | 12 +- > arch/powerpc/include/asm/code-patching.h | 8 + > arch/powerpc/include/asm/static_call.h | 80 +++++++- > arch/powerpc/kernel/Makefile | 4 +- > arch/powerpc/kernel/module_64.c | 27 ++- > arch/powerpc/kernel/static_call.c | 151 +++++++++++++- > arch/powerpc/kernel/static_call_test.c | 251 +++++++++++++++++++++++ > arch/powerpc/kernel/static_call_test.h | 56 +++++ > arch/powerpc/lib/code-patching.c | 90 +++++--- > kernel/Makefile | 1 + > kernel/static_call_inline.c | 43 ---- > kernel/static_call_selftest.c | 41 ++++ > 12 files changed, 682 insertions(+), 82 deletions(-) > create mode 100644 arch/powerpc/kernel/static_call_test.c > create mode 100644 arch/powerpc/kernel/static_call_test.h > create mode 100644 kernel/static_call_selftest.c > > > base-commit: 3d7a198cfdb47405cfb4a3ea523876569fe341e6 > -- > 2.37.3