On Wed, May 8, 2024 at 7:50 AM Kewen.Lin <li...@linux.ibm.com> wrote: > > Hi, > > As the discussion in PR112980, although the current > implementation for -fpatchable-function-entry* conforms > with the documentation (making N NOPs be consecutive), > it's inefficient for both kernel and userspace livepatching > (see comments in PR for the details). > > So this patch is to change the current implementation by > emitting the "before" NOPs before global entry point and > the "after" NOPs after local entry point. The new behavior > would not keep NOPs to be consecutive, so the documentation > is updated to emphasize this. > > Bootstrapped and regress-tested on powerpc64-linux-gnu > P8/P9 and powerpc64le-linux-gnu P9 and P10. > > Is it ok for trunk? And backporting to active branches > after burn-in time? I guess we should also mention this > change in changes.html? > > BR, > Kewen > ----- > PR target/112980 > > gcc/ChangeLog: > > * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue): > Adjust the handling on patch area emitting with dual entry, remove > the restriction on "before" NOPs count, not emit "before" NOPs any > more but only emit "after" NOPs. > * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry): > Adjust by respecting cfun->machine->stop_patch_area_print. > (rs6000_elf_declare_function_name): For ELFv2 with dual entry, set > cfun->machine->stop_patch_area_print as true. > * config/rs6000/rs6000.h (struct machine_function): Remove member > global_entry_emitted, add new member stop_patch_area_print. > * doc/invoke.texi (option -fpatchable-function-entry): Adjust the > documentation for PowerPC ELFv2 dual entry. > > gcc/testsuite/ChangeLog: > > * c-c++-common/patchable_function_entry-default.c: Adjust. > * gcc.target/powerpc/pr99888-4.c: Likewise. > * gcc.target/powerpc/pr99888-5.c: Likewise. > * gcc.target/powerpc/pr99888-6.c: Likewise. > --- > gcc/config/rs6000/rs6000-logue.cc | 40 +++++-------------- > gcc/config/rs6000/rs6000.cc | 15 +++++-- > gcc/config/rs6000/rs6000.h | 10 +++-- > gcc/doc/invoke.texi | 8 ++-- > .../patchable_function_entry-default.c | 3 -- > gcc/testsuite/gcc.target/powerpc/pr99888-4.c | 4 +- > gcc/testsuite/gcc.target/powerpc/pr99888-5.c | 4 +- > gcc/testsuite/gcc.target/powerpc/pr99888-6.c | 4 +- > 8 files changed, 33 insertions(+), 55 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-logue.cc > b/gcc/config/rs6000/rs6000-logue.cc > index 60ba15a8bc3..0eb019b44b3 100644 > --- a/gcc/config/rs6000/rs6000-logue.cc > +++ b/gcc/config/rs6000/rs6000-logue.cc > @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE *file) > fprintf (file, "\tadd 2,2,12\n"); > } > > - unsigned short patch_area_size = crtl->patch_area_size; > - unsigned short patch_area_entry = crtl->patch_area_entry; > - /* Need to emit the patching area. */ > - if (patch_area_size > 0) > - { > - cfun->machine->global_entry_emitted = true; > - /* As ELFv2 ABI shows, the allowable bytes between the global > - and local entry points are 0, 4, 8, 16, 32 and 64 when > - there is a local entry point. Considering there are two > - non-prefixed instructions for global entry point prologue > - (8 bytes), the count for patchable nops before local entry > - point would be 2, 6 and 14. It's possible to support those > - other counts of nops by not making a local entry point, but > - we don't have clear use cases for them, so leave them > - unsupported for now. */ > - if (patch_area_entry > 0) > - { > - if (patch_area_entry != 2 > - && patch_area_entry != 6 > - && patch_area_entry != 14) > - error ("unsupported number of nops before function entry > (%u)", > - patch_area_entry); > - rs6000_print_patchable_function_entry (file, patch_area_entry, > - true); > - patch_area_size -= patch_area_entry; > - } > - } > - > fputs ("\t.localentry\t", file); > assemble_name (file, name); > fputs (",.-", file); > assemble_name (file, name); > fputs ("\n", file); > /* Emit the nops after local entry. */ > - if (patch_area_size > 0) > - rs6000_print_patchable_function_entry (file, patch_area_size, > - patch_area_entry == 0); > + unsigned short patch_area_size = crtl->patch_area_size; > + unsigned short patch_area_entry = crtl->patch_area_entry; > + if (patch_area_size > patch_area_entry) > + { > + cfun->machine->stop_patch_area_print = false; > + patch_area_size -= patch_area_entry; > + rs6000_print_patchable_function_entry (file, patch_area_size, > + patch_area_entry == 0); > + } > } > > else if (rs6000_pcrel_p ()) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 6ba9df4f02e..7e70741d59f 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -15190,12 +15190,14 @@ rs6000_print_patchable_function_entry (FILE *file, > bool record_p) > { > bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p > (); > - /* For a function which needs global entry point, we will emit the > - patchable area before and after local entry point under the control of > - cfun->machine->global_entry_emitted, see the handling in function > + /* For a function which needs global entry point, we will only emit the > + patchable area after local entry point under the control of > + !cfun->machine->stop_patch_area_print, see the handling in functions > rs6000_output_function_prologue. */ > - if (!global_entry_needed_p || cfun->machine->global_entry_emitted) > + if (!cfun->machine->stop_patch_area_print) > default_print_patchable_function_entry (file, patch_area_size, record_p); > + else > + gcc_assert (global_entry_needed_p); > } > > > enum rtx_code > @@ -21381,6 +21383,11 @@ rs6000_elf_declare_function_name (FILE *file, const > char *name, tree decl) > fprintf (file, "\t.previous\n"); > } > ASM_OUTPUT_FUNCTION_LABEL (file, name, decl); > + /* At this time, the "before" NOPs have been already emitted, > + let's stop generic code from printing the "after" NOPs and > + emit just after local entry later. */ > + if (rs6000_global_entry_point_prologue_needed_p ()) > + cfun->machine->stop_patch_area_print = true; > } > > static void rs6000_elf_file_end (void) ATTRIBUTE_UNUSED; > diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h > index 68bc45d65ba..73767fdae06 100644 > --- a/gcc/config/rs6000/rs6000.h > +++ b/gcc/config/rs6000/rs6000.h > @@ -2434,10 +2434,12 @@ typedef struct GTY(()) machine_function > bool lr_is_wrapped_separately; > bool toc_is_wrapped_separately; > bool mma_return_type_error; > - /* Indicate global entry is emitted, only useful when the function requires > - global entry. It helps to control the patchable area before and after > - local entry. */ > - bool global_entry_emitted; > + /* With ELFv2 ABI dual entry points being adopted, generic framework > + targetm.asm_out.print_patchable_function_entry would generate "after" > + NOPs before local entry, it is wrong. This flag is to stop it from > + printing patch area before local entry, it is only useful when the > + function requires dual entry points. */ > + bool stop_patch_area_print; > } machine_function; > #endif > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index c584664e168..58e48f7dc55 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -18363,11 +18363,11 @@ If @code{N=0}, no pad location is recorded. > The NOP instructions are inserted at---and maybe before, depending on > @var{M}---the function entry address, even before the prologue. On > PowerPC with the ELFv2 ABI, for a function with dual entry points, > -the local entry point is this function entry address. > +@var{M} NOP instructions are inserted before the global entry point and > +@var{N} - @var{M} NOP instructions are inserted after the local entry > +point, which means the NOP instructions may not be consecutive.
Isn't it @var{M-1} NOP instructions before the global entry? I suppose the existing "... with the function entry point before the @var{M}th NOP. If @var{M} is omitted, it defaults to @code{0} so the function entry points to the address just at the first NOP." wording is self-contradicting in a way since before the 0th NOP (default) to me is the same as before the 1st NOP (M == 1). So maybe that should be _after_ the @var{M}th NOP instead which would be consistent with your ELFv2 docs? Maybe the sentence should be re-worded similar to your ELVv2 one, specifying the number of NOPs before and after the entry point. > -The maximum value of @var{N} and @var{M} is 65535. On PowerPC with the > -ELFv2 ABI, for a function with dual entry points, the supported values > -for @var{M} are 0, 2, 6 and 14. > +The maximum value of @var{N} and @var{M} is 65535. > @end table > > > diff --git a/gcc/testsuite/c-c++-common/patchable_function_entry-default.c > b/gcc/testsuite/c-c++-common/patchable_function_entry-default.c > index 3ccbafc87db..899938b4aa3 100644 > --- a/gcc/testsuite/c-c++-common/patchable_function_entry-default.c > +++ b/gcc/testsuite/c-c++-common/patchable_function_entry-default.c > @@ -1,9 +1,6 @@ > /* { dg-do compile { target { ! { nvptx*-*-* visium-*-* } } } } */ > /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */ > /* { dg-additional-options "-fno-pie" { target sparc*-*-* } } */ > -/* See PR99888, one single preceding nop isn't allowed on powerpc_elfv2, > - so overriding with two preceding nops to make it pass there. */ > -/* { dg-additional-options "-fpatchable-function-entry=3,2" { target > powerpc_elfv2 } } */ > /* { dg-final { scan-assembler-times "nop|NOP|SWYM" 3 { target { ! { > alpha*-*-* riscv*-*-* } } } } } */ > /* { dg-final { scan-assembler-times "bis" 3 { target alpha*-*-* } } } */ > /* { dg-final { scan-assembler-times "nop\n" 3 { target riscv*-*-* } } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr99888-4.c > b/gcc/testsuite/gcc.target/powerpc/pr99888-4.c > index 00a8d4d316e..6f23f2bb939 100644 > --- a/gcc/testsuite/gcc.target/powerpc/pr99888-4.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr99888-4.c > @@ -2,12 +2,10 @@ > /* There is no global entry point prologue with pcrel. */ > /* { dg-options "-mno-pcrel -fpatchable-function-entry=1,1" } */ > > -/* Verify one error emitted for unexpected 1 nop before local > - entry. */ > +/* Verify there is no error with 1 nop before local entry. */ > > extern int a; > > int test (int b) { > return a + b; > } > -/* { dg-error "unsupported number of nops before function entry \\(1\\)" "" > { target *-*-* } .-1 } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr99888-5.c > b/gcc/testsuite/gcc.target/powerpc/pr99888-5.c > index 39d3b4465f1..13f192ebd20 100644 > --- a/gcc/testsuite/gcc.target/powerpc/pr99888-5.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr99888-5.c > @@ -2,12 +2,10 @@ > /* There is no global entry point prologue with pcrel. */ > /* { dg-options "-mno-pcrel -fpatchable-function-entry=7,3" } */ > > -/* Verify one error emitted for unexpected 3 nops before local > - entry. */ > +/* Verify no error emitted for 3 nops before local entry. */ > > extern int a; > > int test (int b) { > return a + b; > } > -/* { dg-error "unsupported number of nops before function entry \\(3\\)" "" > { target *-*-* } .-1 } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr99888-6.c > b/gcc/testsuite/gcc.target/powerpc/pr99888-6.c > index c6c18dcc7ac..431c69cae9a 100644 > --- a/gcc/testsuite/gcc.target/powerpc/pr99888-6.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr99888-6.c > @@ -2,8 +2,7 @@ > /* There is no global entry point prologue with pcrel. */ > /* { dg-options "-mno-pcrel" } */ > > -/* Verify one error emitted for unexpected 4 nops before local > - entry. */ > +/* Verify no error emitted for 4 nops before local entry. */ > > extern int a; > > @@ -11,4 +10,3 @@ __attribute__ ((patchable_function_entry (20, 4))) > int test (int b) { > return a + b; > } > -/* { dg-error "unsupported number of nops before function entry \\(4\\)" "" > { target *-*-* } .-1 } */ > -- > 2.39.1