Le 20/05/2023 à 20:17, Naveen N Rao a écrit : > Christophe Leroy wrote: >> >> >> Le 20/05/2023 à 12:34, Christophe Leroy a écrit : >>> >>> >>> Le 19/05/2023 à 21:26, Naveen N Rao a écrit : >>>> [Vous ne recevez pas souvent de courriers de nav...@kernel.org. >>>> Découvrez pourquoi ceci est important à >>>> https://aka.ms/LearnAboutSenderIdentification ] >>>> >>>> Refactor ftrace code and move to using ftrace_replace_code() to help >>>> simplify and make the code more maintainable. >>>> >>>> - The existing ftrace.c code is moved to a separate file so that ppc64 >>>> elfv1 and clang -pg only support continue. This makes it possible to >>>> converge ppc32 and ppc64 support further. >>>> - Drop code to re-purpose compiler-generated long branches for ftrace >>>> use in support of large kernels. We still retain the ftrace stubs at >>>> the end of .text, so we now support kernels upto ~64MB. >>>> - Add ftrace_init_nop() to keep boot-time validations and init separate >>>> from runtime. >>>> - Implement ftrace_replace_code() to simplify overall ftrace setup. >>>> This >>>> will be especially useful when adding ability to nop out 'mflr r0' >>>> later, and for other subsequent ftrace features. >>>> - Add support for -fpatchable-function-entry. On ppc64, this needs gcc >>>> v13.1 so that the nops are generated at LEP. This also moves >>>> ppc32 to >>>> using the same two-instruction sequence as that of ppc64. >>>> >>>> This applies atop patches 1-3 of Nick's series for elfv2 conversion, as >>>> well as Nick's patch enabling -mprofile-kernel for elfv2 BE: >>>> - >>>> https://lore.kernel.org/all/20230505071850.228734-1-npig...@gmail.com/ >>>> - https://lore.kernel.org/all/20230506011814.8766-1-npig...@gmail.com/ >>>> >>>> This builds for me and passes a quick test, posting this as an early >>>> RFC. >>>> >>>> Signed-off-by: Naveen N Rao <nav...@kernel.org> >>> >>> Looks good, works on PPC32 but I observed some performance >>> degradation, around 25% more time needed to activate function tracer >>> and around 10% more time needed to de-activate function tracer (by >>> writting function/nop into /sys/kernel/debug/tracing/current_tracer. > > Thanks for the test! > > I hadn't looked at the performance, though I was expecting it to be > better. On ppc64, I am actually not seeing much of a difference. > >> >> >> perf record with your patch applied: >> >> 20.59% echo [kernel.kallsyms] [k] ftrace_check_record >> 15.71% echo [kernel.kallsyms] [k] patch_instruction >> 6.75% echo [kernel.kallsyms] [k] ftrace_replace_code >> 4.30% echo [kernel.kallsyms] [k] __ftrace_hash_rec_update >> 3.96% echo [kernel.kallsyms] [k] >> __rb_reserve_next.constprop.0 >> 3.20% echo [kernel.kallsyms] [k] >> ftrace_get_call_inst.isra.0 >> 2.62% echo [kernel.kallsyms] [k] ftrace_get_addr_new >> 2.44% echo [kernel.kallsyms] [k] ftrace_rec_iter_next >> 2.15% echo [kernel.kallsyms] [k] function_trace_call >> 2.09% echo [kernel.kallsyms] [k] rb_commit >> 1.92% echo [kernel.kallsyms] [k] >> ring_buffer_unlock_commit >> 1.69% echo [kernel.kallsyms] [k] ring_buffer_lock_reserve >> 1.63% echo [kernel.kallsyms] [k] copy_page >> 1.45% echo [kernel.kallsyms] [k] >> ftrace_create_branch_inst.constprop.0 >> 1.40% echo [kernel.kallsyms] [k] unmap_page_range >> 1.34% echo [kernel.kallsyms] [k] mas_next_entry >> 1.28% echo ld-2.23.so [.] do_lookup_x >> 1.22% echo [kernel.kallsyms] [k] ftrace_call >> 1.05% echo [kernel.kallsyms] [k] trace_function >> 0.99% echo [kernel.kallsyms] [k] ftrace_caller >> 0.81% echo [kernel.kallsyms] [k] ftrace_rec_iter_record >> >> perf record without your patch: >> >> 22.58% echo [kernel.kallsyms] [k] patch_instruction >> 17.85% echo [kernel.kallsyms] [k] ftrace_check_record >> 11.65% echo [kernel.kallsyms] [k] ftrace_replace_code >> 6.76% echo [kernel.kallsyms] [k] ftrace_make_call >> 6.68% echo [kernel.kallsyms] [k] __ftrace_hash_rec_update >> 3.50% echo [kernel.kallsyms] [k] ftrace_get_addr_curr >> 3.42% echo [kernel.kallsyms] [k] ftrace_get_addr_new >> 2.36% echo [kernel.kallsyms] [k] copy_page >> 1.22% echo [kernel.kallsyms] [k] >> __rb_reserve_next.constprop.0 >> 1.22% echo ld-2.23.so [.] do_lookup_x >> 1.06% echo [kernel.kallsyms] [k] ftrace_lookup_ip >> 0.73% echo ld-2.23.so [.] _dl_relocate_object >> 0.65% echo [kernel.kallsyms] [k] flush_dcache_icache_page >> 0.65% echo [kernel.kallsyms] [k] function_trace_call > > That suggests ftrace_test_record() as the likely cause. The below change > does improve performance on ppc64. Can you see if it makes a difference > on ppc32? > > Upstream/before the below change (ftrace activation): > 0.15266 +- 0.00215 seconds time elapsed ( +- 1.41% ) > With the below change: > 0.14170 +- 0.00396 seconds time elapsed ( +- 2.79% ) >
That's better, but still more time than original implementation: +20% to activate function tracer (was +40% with your RFC) +21% to activate nop tracer (was +24% with your RFC) perf record (without strict kernel rwx) : 17.75% echo [kernel.kallsyms] [k] ftrace_check_record 9.76% echo [kernel.kallsyms] [k] ftrace_replace_code 6.53% echo [kernel.kallsyms] [k] patch_instruction 5.21% echo [kernel.kallsyms] [k] __ftrace_hash_rec_update 4.26% echo [kernel.kallsyms] [k] ftrace_get_addr_curr 4.18% echo [kernel.kallsyms] [k] ftrace_get_call_inst.isra.0 3.45% echo [kernel.kallsyms] [k] ftrace_get_addr_new 3.08% echo [kernel.kallsyms] [k] function_trace_call 2.20% echo [kernel.kallsyms] [k] __rb_reserve_next.constprop.0 2.05% echo [kernel.kallsyms] [k] copy_page 1.91% echo [kernel.kallsyms] [k] ftrace_create_branch_inst.constprop.0 1.83% echo [kernel.kallsyms] [k] ftrace_rec_iter_next 1.83% echo [kernel.kallsyms] [k] rb_commit 1.69% echo [kernel.kallsyms] [k] ring_buffer_lock_reserve 1.54% echo [kernel.kallsyms] [k] trace_function 1.39% echo [kernel.kallsyms] [k] __call_rcu_common.constprop.0 1.25% echo ld-2.23.so [.] do_lookup_x 1.17% echo [kernel.kallsyms] [k] ftrace_rec_iter_record 1.03% echo [kernel.kallsyms] [k] unmap_page_range 0.95% echo [kernel.kallsyms] [k] flush_dcache_icache_page 0.95% echo [kernel.kallsyms] [k] ftrace_lookup_ip Christophe