Re: PING: [Updated, PATCH] i386: Avoid stack realignment if possible
On Fri, Sep 1, 2017 at 11:48 AM, H.J. Lu wrote: > On Sun, Aug 13, 2017 at 3:02 PM, H.J. Lu wrote: >> On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote: >>> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak wrote: >>> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu wrote: >>> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu wrote: >>> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu wrote: >>> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote: >>> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek >>> > wrote: >>> > > Hi! >>> > > >>> > > Honza recently changed the i?86 backend, so that it often doesn't >>> > > do -maccumulate-outgoing-args by default on x86_64. >>> > > Unfortunately, on some of the here included testcases this regressed >>> > > quite a bit the generated code. As AVX vectors are used, the >>> > > dynamic >>> > > realignment code needs to assume e.g. that some of them will need >>> > > to be >>> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set >>> > > need_drap early as well. But in when emitting the >>> > > prologue/epilogue, >>> > > if need_drap is set, we don't perform the optimization for leaf >>> > > functions >>> > > which have zero size stack frame, thus we end up with uselessly >>> > > doing >>> > > dynamic stack realignment, setting up DRAP that nothing uses and >>> > > later on >>> > > restore everything back. >>> > > >>> > > This patch improves it, if the DRAP register isn't live at the >>> > > start of >>> > > entry bb successor and we aren't going to realign the stack, we >>> > > don't >>> > > need DRAP at all, and even if we need DRAP register, that can't be >>> > > the sole >>> > > reason for doing stack realignment, the prologue code is able to >>> > > set up DRAP >>> > > even without dynamic stack realignment. >>> > > >>> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >>> > > >>> > > 2013-12-20 Jakub Jelinek >>> > > >>> > > PR target/59501 >>> > > * config/i386/i386.c (ix86_save_reg): Don't return true for >>> > > drap_reg >>> > > if !crtl->stack_realign_needed. >>> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live >>> > > on entry >>> > > and stack_realign_needed will be false, clear drap_reg and >>> > > need_drap. >>> > > Optimize leaf functions that don't need stack frame even if >>> > > crtl->need_drap. >>> > > >>> > > * gcc.target/i386/pr59501-1.c: New test. >>> > > * gcc.target/i386/pr59501-1a.c: New test. >>> > > * gcc.target/i386/pr59501-2.c: New test. >>> > > * gcc.target/i386/pr59501-2a.c: New test. >>> > > * gcc.target/i386/pr59501-3.c: New test. >>> > > * gcc.target/i386/pr59501-3a.c: New test. >>> > > * gcc.target/i386/pr59501-4.c: New test. >>> > > * gcc.target/i386/pr59501-4a.c: New test. >>> > > * gcc.target/i386/pr59501-5.c: New test. >>> > > * gcc.target/i386/pr59501-6.c: New test. >>> > >>> > LGTM, assuming Jakub is OK with the patch. >>> > >>> > Thanks, >>> > Uros. >>> >>> Jakub, can you take a look at this: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html >>> >> >> Here is the updated patch to fix >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769 >> >> OK for trunk? >> >> Thanks. >> >> H.J. >> --- >> ix86_finalize_stack_frame_flags has been extended to eliminate frame >> pointer when the new stack frame isn't needed with and without >> -maccumulate-outgoing-args as well as -fomit-frame-pointer. Since stack >> access with larger alignment may be optimized out, to decide if stack >> realignment is needed, we need to not only check for stack frame access, >> but also verify the alignment of stack frame access. Since alignment of >> memory access via arg_pointer is set up by caller, not by callee, we >> should find the maximum stack alignment from the stack frame access >> instructions via stack pointer and frame pointrer to avoid stack >> realignment when stack alignment needed is less than incoming stack >> boundary. >> >> gcc/ >> >> PR target/59501 >> PR target/81624 >> PR target/81769 >> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't >> realign stack if stack alignment needed is less than incoming >> stack boundary. >> >> gcc/testsuite/ >> >> PR target/59501 >> PR target/81624 >> PR target/81769 >> * gcc.target/i386/pr59501-4a.c: Remove xfail. >> * gcc.target/i386/pr81769-1a.c: New test. >> * gcc.target/i386/pr81769-1b.c: Likewise. >> * gcc.target/i386/pr81769-2.c: Likewise. >> --- >> gcc/config/i386/i386.c | 143 >> ++--- >> gc
PING: [Updated, PATCH] i386: Avoid stack realignment if possible
On Sun, Aug 13, 2017 at 3:02 PM, H.J. Lu wrote: > On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote: >> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak wrote: >> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu wrote: >> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu wrote: >> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu wrote: >> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote: >> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek >> > wrote: >> > > Hi! >> > > >> > > Honza recently changed the i?86 backend, so that it often doesn't >> > > do -maccumulate-outgoing-args by default on x86_64. >> > > Unfortunately, on some of the here included testcases this regressed >> > > quite a bit the generated code. As AVX vectors are used, the dynamic >> > > realignment code needs to assume e.g. that some of them will need to >> > > be >> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set >> > > need_drap early as well. But in when emitting the prologue/epilogue, >> > > if need_drap is set, we don't perform the optimization for leaf >> > > functions >> > > which have zero size stack frame, thus we end up with uselessly doing >> > > dynamic stack realignment, setting up DRAP that nothing uses and >> > > later on >> > > restore everything back. >> > > >> > > This patch improves it, if the DRAP register isn't live at the start >> > > of >> > > entry bb successor and we aren't going to realign the stack, we don't >> > > need DRAP at all, and even if we need DRAP register, that can't be >> > > the sole >> > > reason for doing stack realignment, the prologue code is able to set >> > > up DRAP >> > > even without dynamic stack realignment. >> > > >> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >> > > >> > > 2013-12-20 Jakub Jelinek >> > > >> > > PR target/59501 >> > > * config/i386/i386.c (ix86_save_reg): Don't return true for >> > > drap_reg >> > > if !crtl->stack_realign_needed. >> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live >> > > on entry >> > > and stack_realign_needed will be false, clear drap_reg and >> > > need_drap. >> > > Optimize leaf functions that don't need stack frame even if >> > > crtl->need_drap. >> > > >> > > * gcc.target/i386/pr59501-1.c: New test. >> > > * gcc.target/i386/pr59501-1a.c: New test. >> > > * gcc.target/i386/pr59501-2.c: New test. >> > > * gcc.target/i386/pr59501-2a.c: New test. >> > > * gcc.target/i386/pr59501-3.c: New test. >> > > * gcc.target/i386/pr59501-3a.c: New test. >> > > * gcc.target/i386/pr59501-4.c: New test. >> > > * gcc.target/i386/pr59501-4a.c: New test. >> > > * gcc.target/i386/pr59501-5.c: New test. >> > > * gcc.target/i386/pr59501-6.c: New test. >> > >> > LGTM, assuming Jakub is OK with the patch. >> > >> > Thanks, >> > Uros. >> >> Jakub, can you take a look at this: >> >> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html >> > > Here is the updated patch to fix > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769 > > OK for trunk? > > Thanks. > > H.J. > --- > ix86_finalize_stack_frame_flags has been extended to eliminate frame > pointer when the new stack frame isn't needed with and without > -maccumulate-outgoing-args as well as -fomit-frame-pointer. Since stack > access with larger alignment may be optimized out, to decide if stack > realignment is needed, we need to not only check for stack frame access, > but also verify the alignment of stack frame access. Since alignment of > memory access via arg_pointer is set up by caller, not by callee, we > should find the maximum stack alignment from the stack frame access > instructions via stack pointer and frame pointrer to avoid stack > realignment when stack alignment needed is less than incoming stack > boundary. > > gcc/ > > PR target/59501 > PR target/81624 > PR target/81769 > * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't > realign stack if stack alignment needed is less than incoming > stack boundary. > > gcc/testsuite/ > > PR target/59501 > PR target/81624 > PR target/81769 > * gcc.target/i386/pr59501-4a.c: Remove xfail. > * gcc.target/i386/pr81769-1a.c: New test. > * gcc.target/i386/pr81769-1b.c: Likewise. > * gcc.target/i386/pr81769-2.c: Likewise. > --- > gcc/config/i386/i386.c | 143 > ++--- > gcc/testsuite/gcc.target/i386/pr59501-4a.c | 2 +- > gcc/testsuite/gcc.target/i386/pr81769-1a.c | 21 + > gcc/testsuite/gcc.target/i386/pr81769-1b.c | 7 ++ > gcc/testsuite/gcc.target/i386/pr81769-2.c | 21 ++
[Updated, PATCH] i386: Avoid stack realignment if possible
On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote: > On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak wrote: > > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu wrote: > >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu wrote: > >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu wrote: > On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote: > > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek wrote: > > > Hi! > > > > > > Honza recently changed the i?86 backend, so that it often doesn't > > > do -maccumulate-outgoing-args by default on x86_64. > > > Unfortunately, on some of the here included testcases this regressed > > > quite a bit the generated code. As AVX vectors are used, the dynamic > > > realignment code needs to assume e.g. that some of them will need to > > > be > > > spilled, and for -mno-accumulate-outgoing-args the code needs to set > > > need_drap early as well. But in when emitting the prologue/epilogue, > > > if need_drap is set, we don't perform the optimization for leaf > > > functions > > > which have zero size stack frame, thus we end up with uselessly doing > > > dynamic stack realignment, setting up DRAP that nothing uses and > > > later on > > > restore everything back. > > > > > > This patch improves it, if the DRAP register isn't live at the start > > > of > > > entry bb successor and we aren't going to realign the stack, we don't > > > need DRAP at all, and even if we need DRAP register, that can't be > > > the sole > > > reason for doing stack realignment, the prologue code is able to set > > > up DRAP > > > even without dynamic stack realignment. > > > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > > > 2013-12-20 Jakub Jelinek > > > > > > PR target/59501 > > > * config/i386/i386.c (ix86_save_reg): Don't return true for > > > drap_reg > > > if !crtl->stack_realign_needed. > > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live > > > on entry > > > and stack_realign_needed will be false, clear drap_reg and > > > need_drap. > > > Optimize leaf functions that don't need stack frame even if > > > crtl->need_drap. > > > > > > * gcc.target/i386/pr59501-1.c: New test. > > > * gcc.target/i386/pr59501-1a.c: New test. > > > * gcc.target/i386/pr59501-2.c: New test. > > > * gcc.target/i386/pr59501-2a.c: New test. > > > * gcc.target/i386/pr59501-3.c: New test. > > > * gcc.target/i386/pr59501-3a.c: New test. > > > * gcc.target/i386/pr59501-4.c: New test. > > > * gcc.target/i386/pr59501-4a.c: New test. > > > * gcc.target/i386/pr59501-5.c: New test. > > > * gcc.target/i386/pr59501-6.c: New test. > > > > LGTM, assuming Jakub is OK with the patch. > > > > Thanks, > > Uros. > > Jakub, can you take a look at this: > > https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html > Here is the updated patch to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769 OK for trunk? Thanks. H.J. --- ix86_finalize_stack_frame_flags has been extended to eliminate frame pointer when the new stack frame isn't needed with and without -maccumulate-outgoing-args as well as -fomit-frame-pointer. Since stack access with larger alignment may be optimized out, to decide if stack realignment is needed, we need to not only check for stack frame access, but also verify the alignment of stack frame access. Since alignment of memory access via arg_pointer is set up by caller, not by callee, we should find the maximum stack alignment from the stack frame access instructions via stack pointer and frame pointrer to avoid stack realignment when stack alignment needed is less than incoming stack boundary. gcc/ PR target/59501 PR target/81624 PR target/81769 * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't realign stack if stack alignment needed is less than incoming stack boundary. gcc/testsuite/ PR target/59501 PR target/81624 PR target/81769 * gcc.target/i386/pr59501-4a.c: Remove xfail. * gcc.target/i386/pr81769-1a.c: New test. * gcc.target/i386/pr81769-1b.c: Likewise. * gcc.target/i386/pr81769-2.c: Likewise. --- gcc/config/i386/i386.c | 143 ++--- gcc/testsuite/gcc.target/i386/pr59501-4a.c | 2 +- gcc/testsuite/gcc.target/i386/pr81769-1a.c | 21 + gcc/testsuite/gcc.target/i386/pr81769-1b.c | 7 ++ gcc/testsuite/gcc.target/i386/pr81769-2.c | 21 + 5 files changed, 138 insertions(+), 56 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr81769-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr81769-1b.c create mode 100644 gcc/testsu