Re: PING: [Updated, PATCH] i386: Avoid stack realignment if possible
On Fri, Sep 1, 2017 at 11:48 AM, H.J. Luwrote: > On Sun, Aug 13, 2017 at 3:02 PM, H.J. Lu wrote: >> On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote: >>> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak wrote: >>> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu wrote: >>> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu wrote: >>> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu wrote: >>> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote: >>> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek >>> > wrote: >>> > > Hi! >>> > > >>> > > Honza recently changed the i?86 backend, so that it often doesn't >>> > > do -maccumulate-outgoing-args by default on x86_64. >>> > > Unfortunately, on some of the here included testcases this regressed >>> > > quite a bit the generated code. As AVX vectors are used, the >>> > > dynamic >>> > > realignment code needs to assume e.g. that some of them will need >>> > > to be >>> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set >>> > > need_drap early as well. But in when emitting the >>> > > prologue/epilogue, >>> > > if need_drap is set, we don't perform the optimization for leaf >>> > > functions >>> > > which have zero size stack frame, thus we end up with uselessly >>> > > doing >>> > > dynamic stack realignment, setting up DRAP that nothing uses and >>> > > later on >>> > > restore everything back. >>> > > >>> > > This patch improves it, if the DRAP register isn't live at the >>> > > start of >>> > > entry bb successor and we aren't going to realign the stack, we >>> > > don't >>> > > need DRAP at all, and even if we need DRAP register, that can't be >>> > > the sole >>> > > reason for doing stack realignment, the prologue code is able to >>> > > set up DRAP >>> > > even without dynamic stack realignment. >>> > > >>> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >>> > > >>> > > 2013-12-20 Jakub Jelinek >>> > > >>> > > PR target/59501 >>> > > * config/i386/i386.c (ix86_save_reg): Don't return true for >>> > > drap_reg >>> > > if !crtl->stack_realign_needed. >>> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live >>> > > on entry >>> > > and stack_realign_needed will be false, clear drap_reg and >>> > > need_drap. >>> > > Optimize leaf functions that don't need stack frame even if >>> > > crtl->need_drap. >>> > > >>> > > * gcc.target/i386/pr59501-1.c: New test. >>> > > * gcc.target/i386/pr59501-1a.c: New test. >>> > > * gcc.target/i386/pr59501-2.c: New test. >>> > > * gcc.target/i386/pr59501-2a.c: New test. >>> > > * gcc.target/i386/pr59501-3.c: New test. >>> > > * gcc.target/i386/pr59501-3a.c: New test. >>> > > * gcc.target/i386/pr59501-4.c: New test. >>> > > * gcc.target/i386/pr59501-4a.c: New test. >>> > > * gcc.target/i386/pr59501-5.c: New test. >>> > > * gcc.target/i386/pr59501-6.c: New test. >>> > >>> > LGTM, assuming Jakub is OK with the patch. >>> > >>> > Thanks, >>> > Uros. >>> >>> Jakub, can you take a look at this: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html >>> >> >> Here is the updated patch to fix >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769 >> >> OK for trunk? >> >> Thanks. >> >> H.J. >> --- >> ix86_finalize_stack_frame_flags has been extended to eliminate frame >> pointer when the new stack frame isn't needed with and without >> -maccumulate-outgoing-args as well as -fomit-frame-pointer. Since stack >> access with larger alignment may be optimized out, to decide if stack >> realignment is needed, we need to not only check for stack frame access, >> but also verify the alignment of stack frame access. Since alignment of >> memory access via arg_pointer is set up by caller, not by callee, we >> should find the maximum stack alignment from the stack frame access >> instructions via stack pointer and frame pointrer to avoid stack >> realignment when stack alignment needed is less than incoming stack >> boundary. >> >> gcc/ >> >> PR target/59501 >> PR target/81624 >> PR target/81769 >> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't >> realign stack if stack alignment needed is less than incoming >> stack boundary. >> >> gcc/testsuite/ >> >> PR target/59501 >> PR target/81624 >> PR target/81769 >> * gcc.target/i386/pr59501-4a.c: Remove xfail. >> * gcc.target/i386/pr81769-1a.c: New test. >> * gcc.target/i386/pr81769-1b.c:
PING: [Updated, PATCH] i386: Avoid stack realignment if possible
On Sun, Aug 13, 2017 at 3:02 PM, H.J. Luwrote: > On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote: >> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak wrote: >> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu wrote: >> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu wrote: >> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu wrote: >> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote: >> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek >> > wrote: >> > > Hi! >> > > >> > > Honza recently changed the i?86 backend, so that it often doesn't >> > > do -maccumulate-outgoing-args by default on x86_64. >> > > Unfortunately, on some of the here included testcases this regressed >> > > quite a bit the generated code. As AVX vectors are used, the dynamic >> > > realignment code needs to assume e.g. that some of them will need to >> > > be >> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set >> > > need_drap early as well. But in when emitting the prologue/epilogue, >> > > if need_drap is set, we don't perform the optimization for leaf >> > > functions >> > > which have zero size stack frame, thus we end up with uselessly doing >> > > dynamic stack realignment, setting up DRAP that nothing uses and >> > > later on >> > > restore everything back. >> > > >> > > This patch improves it, if the DRAP register isn't live at the start >> > > of >> > > entry bb successor and we aren't going to realign the stack, we don't >> > > need DRAP at all, and even if we need DRAP register, that can't be >> > > the sole >> > > reason for doing stack realignment, the prologue code is able to set >> > > up DRAP >> > > even without dynamic stack realignment. >> > > >> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >> > > >> > > 2013-12-20 Jakub Jelinek >> > > >> > > PR target/59501 >> > > * config/i386/i386.c (ix86_save_reg): Don't return true for >> > > drap_reg >> > > if !crtl->stack_realign_needed. >> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live >> > > on entry >> > > and stack_realign_needed will be false, clear drap_reg and >> > > need_drap. >> > > Optimize leaf functions that don't need stack frame even if >> > > crtl->need_drap. >> > > >> > > * gcc.target/i386/pr59501-1.c: New test. >> > > * gcc.target/i386/pr59501-1a.c: New test. >> > > * gcc.target/i386/pr59501-2.c: New test. >> > > * gcc.target/i386/pr59501-2a.c: New test. >> > > * gcc.target/i386/pr59501-3.c: New test. >> > > * gcc.target/i386/pr59501-3a.c: New test. >> > > * gcc.target/i386/pr59501-4.c: New test. >> > > * gcc.target/i386/pr59501-4a.c: New test. >> > > * gcc.target/i386/pr59501-5.c: New test. >> > > * gcc.target/i386/pr59501-6.c: New test. >> > >> > LGTM, assuming Jakub is OK with the patch. >> > >> > Thanks, >> > Uros. >> >> Jakub, can you take a look at this: >> >> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html >> > > Here is the updated patch to fix > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769 > > OK for trunk? > > Thanks. > > H.J. > --- > ix86_finalize_stack_frame_flags has been extended to eliminate frame > pointer when the new stack frame isn't needed with and without > -maccumulate-outgoing-args as well as -fomit-frame-pointer. Since stack > access with larger alignment may be optimized out, to decide if stack > realignment is needed, we need to not only check for stack frame access, > but also verify the alignment of stack frame access. Since alignment of > memory access via arg_pointer is set up by caller, not by callee, we > should find the maximum stack alignment from the stack frame access > instructions via stack pointer and frame pointrer to avoid stack > realignment when stack alignment needed is less than incoming stack > boundary. > > gcc/ > > PR target/59501 > PR target/81624 > PR target/81769 > * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't > realign stack if stack alignment needed is less than incoming > stack boundary. > > gcc/testsuite/ > > PR target/59501 > PR target/81624 > PR target/81769 > * gcc.target/i386/pr59501-4a.c: Remove xfail. > * gcc.target/i386/pr81769-1a.c: New test. > * gcc.target/i386/pr81769-1b.c: Likewise. > * gcc.target/i386/pr81769-2.c: Likewise. > --- > gcc/config/i386/i386.c | 143 > ++--- > gcc/testsuite/gcc.target/i386/pr59501-4a.c | 2 +- >