Re: PING: [Updated, PATCH] i386: Avoid stack realignment if possible

2017-09-05 Thread H.J. Lu
On Fri, Sep 1, 2017 at 11:48 AM, H.J. Lu  wrote:
> On Sun, Aug 13, 2017 at 3:02 PM, H.J. Lu  wrote:
>> On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote:
>>> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak  wrote:
>>> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
>>> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
>>> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>>>  On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>>> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  
>>> > wrote:
>>> > > Hi!
>>> > >
>>> > > Honza recently changed the i?86 backend, so that it often doesn't
>>> > > do -maccumulate-outgoing-args by default on x86_64.
>>> > > Unfortunately, on some of the here included testcases this regressed
>>> > > quite a bit the generated code.  As AVX vectors are used, the 
>>> > > dynamic
>>> > > realignment code needs to assume e.g. that some of them will need 
>>> > > to be
>>> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>>> > > need_drap early as well.  But in when emitting the 
>>> > > prologue/epilogue,
>>> > > if need_drap is set, we don't perform the optimization for leaf 
>>> > > functions
>>> > > which have zero size stack frame, thus we end up with uselessly 
>>> > > doing
>>> > > dynamic stack realignment, setting up DRAP that nothing uses and 
>>> > > later on
>>> > > restore everything back.
>>> > >
>>> > > This patch improves it, if the DRAP register isn't live at the 
>>> > > start of
>>> > > entry bb successor and we aren't going to realign the stack, we 
>>> > > don't
>>> > > need DRAP at all, and even if we need DRAP register, that can't be 
>>> > > the sole
>>> > > reason for doing stack realignment, the prologue code is able to 
>>> > > set up DRAP
>>> > > even without dynamic stack realignment.
>>> > >
>>> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>> > >
>>> > > 2013-12-20  Jakub Jelinek  
>>> > >
>>> > > PR target/59501
>>> > > * config/i386/i386.c (ix86_save_reg): Don't return true for 
>>> > > drap_reg
>>> > > if !crtl->stack_realign_needed.
>>> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live 
>>> > > on entry
>>> > > and stack_realign_needed will be false, clear drap_reg and 
>>> > > need_drap.
>>> > > Optimize leaf functions that don't need stack frame even if
>>> > > crtl->need_drap.
>>> > >
>>> > > * gcc.target/i386/pr59501-1.c: New test.
>>> > > * gcc.target/i386/pr59501-1a.c: New test.
>>> > > * gcc.target/i386/pr59501-2.c: New test.
>>> > > * gcc.target/i386/pr59501-2a.c: New test.
>>> > > * gcc.target/i386/pr59501-3.c: New test.
>>> > > * gcc.target/i386/pr59501-3a.c: New test.
>>> > > * gcc.target/i386/pr59501-4.c: New test.
>>> > > * gcc.target/i386/pr59501-4a.c: New test.
>>> > > * gcc.target/i386/pr59501-5.c: New test.
>>> > > * gcc.target/i386/pr59501-6.c: New test.
>>> >
>>> > LGTM, assuming Jakub is OK with the patch.
>>> >
>>> > Thanks,
>>> > Uros.
>>>
>>> Jakub, can you take a look at this:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
>>>
>>
>> Here is the updated patch to fix
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769
>>
>> OK for trunk?
>>
>> Thanks.
>>
>> H.J.
>> ---
>> ix86_finalize_stack_frame_flags has been extended to eliminate frame
>> pointer when the new stack frame isn't needed with and without
>> -maccumulate-outgoing-args as well as -fomit-frame-pointer.  Since stack
>> access with larger alignment may be optimized out, to decide if stack
>> realignment is needed, we need to not only check for stack frame access,
>> but also verify the alignment of stack frame access.  Since alignment of
>> memory access via arg_pointer is set up by caller, not by callee, we
>> should find the maximum stack alignment from the stack frame access
>> instructions via stack pointer and frame pointrer to avoid stack
>> realignment when stack alignment needed is less than incoming stack
>> boundary.
>>
>> gcc/
>>
>> PR target/59501
>> PR target/81624
>> PR target/81769
>> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
>> realign stack if stack alignment needed is less than incoming
>> stack boundary.
>>
>> gcc/testsuite/
>>
>> PR target/59501
>> PR target/81624
>> PR target/81769
>> * gcc.target/i386/pr59501-4a.c: Remove xfail.
>> * gcc.target/i386/pr81769-1a.c: New test.
>> * gcc.target/i386/pr81769-1b.c: 

PING: [Updated, PATCH] i386: Avoid stack realignment if possible

2017-09-01 Thread H.J. Lu
On Sun, Aug 13, 2017 at 3:02 PM, H.J. Lu  wrote:
> On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote:
>> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak  wrote:
>> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
>> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
>> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>>  On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  
>> > wrote:
>> > > Hi!
>> > >
>> > > Honza recently changed the i?86 backend, so that it often doesn't
>> > > do -maccumulate-outgoing-args by default on x86_64.
>> > > Unfortunately, on some of the here included testcases this regressed
>> > > quite a bit the generated code.  As AVX vectors are used, the dynamic
>> > > realignment code needs to assume e.g. that some of them will need to 
>> > > be
>> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>> > > need_drap early as well.  But in when emitting the prologue/epilogue,
>> > > if need_drap is set, we don't perform the optimization for leaf 
>> > > functions
>> > > which have zero size stack frame, thus we end up with uselessly doing
>> > > dynamic stack realignment, setting up DRAP that nothing uses and 
>> > > later on
>> > > restore everything back.
>> > >
>> > > This patch improves it, if the DRAP register isn't live at the start 
>> > > of
>> > > entry bb successor and we aren't going to realign the stack, we don't
>> > > need DRAP at all, and even if we need DRAP register, that can't be 
>> > > the sole
>> > > reason for doing stack realignment, the prologue code is able to set 
>> > > up DRAP
>> > > even without dynamic stack realignment.
>> > >
>> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> > >
>> > > 2013-12-20  Jakub Jelinek  
>> > >
>> > > PR target/59501
>> > > * config/i386/i386.c (ix86_save_reg): Don't return true for 
>> > > drap_reg
>> > > if !crtl->stack_realign_needed.
>> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live 
>> > > on entry
>> > > and stack_realign_needed will be false, clear drap_reg and 
>> > > need_drap.
>> > > Optimize leaf functions that don't need stack frame even if
>> > > crtl->need_drap.
>> > >
>> > > * gcc.target/i386/pr59501-1.c: New test.
>> > > * gcc.target/i386/pr59501-1a.c: New test.
>> > > * gcc.target/i386/pr59501-2.c: New test.
>> > > * gcc.target/i386/pr59501-2a.c: New test.
>> > > * gcc.target/i386/pr59501-3.c: New test.
>> > > * gcc.target/i386/pr59501-3a.c: New test.
>> > > * gcc.target/i386/pr59501-4.c: New test.
>> > > * gcc.target/i386/pr59501-4a.c: New test.
>> > > * gcc.target/i386/pr59501-5.c: New test.
>> > > * gcc.target/i386/pr59501-6.c: New test.
>> >
>> > LGTM, assuming Jakub is OK with the patch.
>> >
>> > Thanks,
>> > Uros.
>>
>> Jakub, can you take a look at this:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
>>
>
> Here is the updated patch to fix
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769
>
> OK for trunk?
>
> Thanks.
>
> H.J.
> ---
> ix86_finalize_stack_frame_flags has been extended to eliminate frame
> pointer when the new stack frame isn't needed with and without
> -maccumulate-outgoing-args as well as -fomit-frame-pointer.  Since stack
> access with larger alignment may be optimized out, to decide if stack
> realignment is needed, we need to not only check for stack frame access,
> but also verify the alignment of stack frame access.  Since alignment of
> memory access via arg_pointer is set up by caller, not by callee, we
> should find the maximum stack alignment from the stack frame access
> instructions via stack pointer and frame pointrer to avoid stack
> realignment when stack alignment needed is less than incoming stack
> boundary.
>
> gcc/
>
> PR target/59501
> PR target/81624
> PR target/81769
> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
> realign stack if stack alignment needed is less than incoming
> stack boundary.
>
> gcc/testsuite/
>
> PR target/59501
> PR target/81624
> PR target/81769
> * gcc.target/i386/pr59501-4a.c: Remove xfail.
> * gcc.target/i386/pr81769-1a.c: New test.
> * gcc.target/i386/pr81769-1b.c: Likewise.
> * gcc.target/i386/pr81769-2.c: Likewise.
> ---
>  gcc/config/i386/i386.c | 143 
> ++---
>  gcc/testsuite/gcc.target/i386/pr59501-4a.c |   2 +-
>