Re: PING: [PATCH] i386: Avoid stack realignment if possible

2017-07-25 Thread Uros Bizjak
On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>>> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
 On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
 > Hi!
 >
 > Honza recently changed the i?86 backend, so that it often doesn't
 > do -maccumulate-outgoing-args by default on x86_64.
 > Unfortunately, on some of the here included testcases this regressed
 > quite a bit the generated code.  As AVX vectors are used, the dynamic
 > realignment code needs to assume e.g. that some of them will need to be
 > spilled, and for -mno-accumulate-outgoing-args the code needs to set
 > need_drap early as well.  But in when emitting the prologue/epilogue,
 > if need_drap is set, we don't perform the optimization for leaf functions
 > which have zero size stack frame, thus we end up with uselessly doing
 > dynamic stack realignment, setting up DRAP that nothing uses and later on
 > restore everything back.
 >
 > This patch improves it, if the DRAP register isn't live at the start of
 > entry bb successor and we aren't going to realign the stack, we don't
 > need DRAP at all, and even if we need DRAP register, that can't be the 
 > sole
 > reason for doing stack realignment, the prologue code is able to set up 
 > DRAP
 > even without dynamic stack realignment.
 >
 > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 >
 > 2013-12-20  Jakub Jelinek  
 >
 > PR target/59501
 > * config/i386/i386.c (ix86_save_reg): Don't return true for 
 > drap_reg
 > if !crtl->stack_realign_needed.
 > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on 
 > entry
 > and stack_realign_needed will be false, clear drap_reg and 
 > need_drap.
 > Optimize leaf functions that don't need stack frame even if
 > crtl->need_drap.
 >
 > * gcc.target/i386/pr59501-1.c: New test.
 > * gcc.target/i386/pr59501-1a.c: New test.
 > * gcc.target/i386/pr59501-2.c: New test.
 > * gcc.target/i386/pr59501-2a.c: New test.
 > * gcc.target/i386/pr59501-3.c: New test.
 > * gcc.target/i386/pr59501-3a.c: New test.
 > * gcc.target/i386/pr59501-4.c: New test.
 > * gcc.target/i386/pr59501-4a.c: New test.
 > * gcc.target/i386/pr59501-5.c: New test.
 > * gcc.target/i386/pr59501-6.c: New test.

LGTM, assuming Jakub is OK with the patch.

Thanks,
Uros.


 >
 > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
 > 12:19:20.603212859 +0100
 > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 
 > 12:23:33.647881672 +0100
 > @@ -0,0 +1,8 @@
 > +/* PR target/59501 */
 > +/* { dg-do compile { target { ! ia32 } } } */
 > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
 > +
 > +#include "pr59501-3a.c"
 > +
 > +/* Verify no dynamic realignment is performed.  */
 > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } 
 > */
 >

 Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
 xfailed due to stack frame access via frame pointer instead of DARP.
 This patch finds the maximum stack alignment from the stack frame access
 instructions and avoids stack realignment if stack alignment needed is
 less than incoming stack boundary.

 I am testing this patch.  OK for trunk if there is no regression?


>>>
>>> We need to keep the preferred stack alignment as the minimum stack
>>> alignment. Here is the updated patch.  Tested on x86-64.  OK for
>>> trunk?
>>>
>>> Thanks.
>>
>> Hi Jakub,
>>
>> This patch fixes the xfailed testcase in your patch:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01767.html
>>
>> Your original patch:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01058.html
>>
>> assumes that any instructions accessing stack require stack
>> realignment:
>>
>> +  FOR_EACH_BB (bb)
>> +{
>> +  rtx insn;
>> +  FOR_BB_INSNS (bb, insn)
>> +if (NONDEBUG_INSN_P (insn)
>> + && requires_stack_frame_p (insn, prologue_used,
>> +   set_up_by_prologue))
>> +  {
>> + crtl->stack_realign_needed = stack_realign;
>> + crtl->stack_realign_finalized = true;
>> + return;
>> +  }
>> + }
>>
>> This patch checks the actual alignment needed for any instructions
>> accessing stack.  It skips stack realignment if the actual stack alignment
>> needed is less than or equal to incoming stack alignment.
>>
>> Can you take look at it?
>>
>> Thanks.
>>
>
> PING
>
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
>
> --
> 

PING: [PATCH] i386: Avoid stack realignment if possible

2017-07-25 Thread H.J. Lu
On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>>> On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
>>> > Hi!
>>> >
>>> > Honza recently changed the i?86 backend, so that it often doesn't
>>> > do -maccumulate-outgoing-args by default on x86_64.
>>> > Unfortunately, on some of the here included testcases this regressed
>>> > quite a bit the generated code.  As AVX vectors are used, the dynamic
>>> > realignment code needs to assume e.g. that some of them will need to be
>>> > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>>> > need_drap early as well.  But in when emitting the prologue/epilogue,
>>> > if need_drap is set, we don't perform the optimization for leaf functions
>>> > which have zero size stack frame, thus we end up with uselessly doing
>>> > dynamic stack realignment, setting up DRAP that nothing uses and later on
>>> > restore everything back.
>>> >
>>> > This patch improves it, if the DRAP register isn't live at the start of
>>> > entry bb successor and we aren't going to realign the stack, we don't
>>> > need DRAP at all, and even if we need DRAP register, that can't be the 
>>> > sole
>>> > reason for doing stack realignment, the prologue code is able to set up 
>>> > DRAP
>>> > even without dynamic stack realignment.
>>> >
>>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>> >
>>> > 2013-12-20  Jakub Jelinek  
>>> >
>>> > PR target/59501
>>> > * config/i386/i386.c (ix86_save_reg): Don't return true for 
>>> > drap_reg
>>> > if !crtl->stack_realign_needed.
>>> > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on 
>>> > entry
>>> > and stack_realign_needed will be false, clear drap_reg and 
>>> > need_drap.
>>> > Optimize leaf functions that don't need stack frame even if
>>> > crtl->need_drap.
>>> >
>>> > * gcc.target/i386/pr59501-1.c: New test.
>>> > * gcc.target/i386/pr59501-1a.c: New test.
>>> > * gcc.target/i386/pr59501-2.c: New test.
>>> > * gcc.target/i386/pr59501-2a.c: New test.
>>> > * gcc.target/i386/pr59501-3.c: New test.
>>> > * gcc.target/i386/pr59501-3a.c: New test.
>>> > * gcc.target/i386/pr59501-4.c: New test.
>>> > * gcc.target/i386/pr59501-4a.c: New test.
>>> > * gcc.target/i386/pr59501-5.c: New test.
>>> > * gcc.target/i386/pr59501-6.c: New test.
>>> >
>>> >
>>> > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
>>> > 12:19:20.603212859 +0100
>>> > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 
>>> > 12:23:33.647881672 +0100
>>> > @@ -0,0 +1,8 @@
>>> > +/* PR target/59501 */
>>> > +/* { dg-do compile { target { ! ia32 } } } */
>>> > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
>>> > +
>>> > +#include "pr59501-3a.c"
>>> > +
>>> > +/* Verify no dynamic realignment is performed.  */
>>> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } 
>>> > */
>>> >
>>>
>>> Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
>>> xfailed due to stack frame access via frame pointer instead of DARP.
>>> This patch finds the maximum stack alignment from the stack frame access
>>> instructions and avoids stack realignment if stack alignment needed is
>>> less than incoming stack boundary.
>>>
>>> I am testing this patch.  OK for trunk if there is no regression?
>>>
>>>
>>
>> We need to keep the preferred stack alignment as the minimum stack
>> alignment. Here is the updated patch.  Tested on x86-64.  OK for
>> trunk?
>>
>> Thanks.
>
> Hi Jakub,
>
> This patch fixes the xfailed testcase in your patch:
>
> https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01767.html
>
> Your original patch:
>
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01058.html
>
> assumes that any instructions accessing stack require stack
> realignment:
>
> +  FOR_EACH_BB (bb)
> +{
> +  rtx insn;
> +  FOR_BB_INSNS (bb, insn)
> +if (NONDEBUG_INSN_P (insn)
> + && requires_stack_frame_p (insn, prologue_used,
> +   set_up_by_prologue))
> +  {
> + crtl->stack_realign_needed = stack_realign;
> + crtl->stack_realign_finalized = true;
> + return;
> +  }
> + }
>
> This patch checks the actual alignment needed for any instructions
> accessing stack.  It skips stack realignment if the actual stack alignment
> needed is less than or equal to incoming stack alignment.
>
> Can you take look at it?
>
> Thanks.
>

PING

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html

-- 
H.J.