On Sat, 2024-03-16 at 10:10 +0000, Christophe Leroy wrote: > > > Le 15/03/2024 à 09:38, Christophe Leroy a écrit : > > > > > > Le 15/03/2024 à 03:59, Benjamin Gray a écrit : > > > The existing patching alias page setup and teardown sections can > > > be > > > simplified to make use of the new open_patch_window() > > > abstraction. > > > > > > This eliminates the _mm variants of the helpers, consumers no > > > longer > > > need to check mm_patch_enabled(), and consumers no longer need to > > > worry > > > about synchronization and flushing beyond the changes they make > > > in the > > > patching window. > > > > With this patch, the time needed to activate or de-activate > > function > > tracer is approx 10% longer on powerpc 8xx. > > With the following changes, the performance is restored: > > diff --git a/arch/powerpc/lib/code-patching.c > b/arch/powerpc/lib/code-patching.c > index fd6f8576033a..bc92b85913d8 100644 > --- a/arch/powerpc/lib/code-patching.c > +++ b/arch/powerpc/lib/code-patching.c > @@ -282,13 +282,13 @@ struct patch_window { > * Interrupts must be disabled for the entire duration of the > patching. The PIDR > * is potentially changed during this time. > */ > -static int open_patch_window(void *addr, struct patch_window *ctx) > +static __always_inline int open_patch_window(void *addr, struct > patch_window *ctx) > { > unsigned long pfn = get_patch_pfn(addr); > > lockdep_assert_irqs_disabled(); > > - ctx->text_poke_addr = (unsigned > long)__this_cpu_read(cpu_patching_context.addr); > + ctx->text_poke_addr = (unsigned > long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK; > > if (!mm_patch_enabled()) { > ctx->ptep = > __this_cpu_read(cpu_patching_context.pte); > @@ -331,7 +331,7 @@ static int open_patch_window(void *addr, struct > patch_window *ctx) > return 0; > } > > -static void close_patch_window(struct patch_window *ctx) > +static __always_inline void close_patch_window(struct patch_window > *ctx) > { > lockdep_assert_irqs_disabled(); > >
Thanks for checking that. I did restore the page mask optimisation in a later patch while still developing, but the 64-bit assembly looked slightly worse for it. I didn't check the 32-bit; no way to benchmark it anyway.