[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot

2017-09-05 Thread hjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

--- Comment #7 from hjl at gcc dot gnu.org  ---
Author: hjl
Date: Tue Sep  5 16:39:24 2017
New Revision: 251718

URL: https://gcc.gnu.org/viewcvs?rev=251718=gcc=rev
Log:
i386: Avoid stack realignment if possible

ix86_finalize_stack_frame_flags has been extended to eliminate frame
pointer when the new stack frame isn't needed with and without
-maccumulate-outgoing-args as well as -fomit-frame-pointer.  Since stack
access with larger alignment may be optimized out, to decide if stack
realignment is needed, we need to not only check for stack frame access,
but also verify the alignment of stack frame access.  Since alignment of
memory access via arg_pointer is set up by caller, not by callee, we
should find the maximum stack alignment from the stack frame access
instructions via stack pointer and frame pointrer to avoid stack
realignment when stack alignment needed is less than incoming stack
boundary.

gcc/

PR target/59501
PR target/81624
PR target/81769
* config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
realign stack if stack alignment needed is less than incoming
stack boundary.

gcc/testsuite/

PR target/59501
PR target/81624
PR target/81769
* gcc.target/i386/pr59501-4a.c: Remove xfail.
* gcc.target/i386/pr81769-1a.c: New test.
* gcc.target/i386/pr81769-1b.c: Likewise.
* gcc.target/i386/pr81769-2.c: Likewise.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr81769-1a.c
trunk/gcc/testsuite/gcc.target/i386/pr81769-1b.c
trunk/gcc/testsuite/gcc.target/i386/pr81769-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/pr59501-4a.c

[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot

2013-12-30 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org ---
Author: jakub
Date: Mon Dec 30 08:53:10 2013
New Revision: 206243

URL: http://gcc.gnu.org/viewcvs?rev=206243root=gccview=rev
Log:
PR target/59501
* config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg
if !crtl-stack_realign_needed.
(ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry
and stack_realign_needed will be false, clear drap_reg and need_drap.
Optimize leaf functions that don't need stack frame even if
crtl-need_drap.

* gcc.target/i386/pr59501-1.c: New test.
* gcc.target/i386/pr59501-1a.c: New test.
* gcc.target/i386/pr59501-2.c: New test.
* gcc.target/i386/pr59501-2a.c: New test.
* gcc.target/i386/pr59501-3.c: New test.
* gcc.target/i386/pr59501-3a.c: New test.
* gcc.target/i386/pr59501-4.c: New test.
* gcc.target/i386/pr59501-4a.c: New test.
* gcc.target/i386/pr59501-5.c: New test.
* gcc.target/i386/pr59501-6.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr59501-1.c
trunk/gcc/testsuite/gcc.target/i386/pr59501-1a.c
trunk/gcc/testsuite/gcc.target/i386/pr59501-2.c
trunk/gcc/testsuite/gcc.target/i386/pr59501-2a.c
trunk/gcc/testsuite/gcc.target/i386/pr59501-3.c
trunk/gcc/testsuite/gcc.target/i386/pr59501-3a.c
trunk/gcc/testsuite/gcc.target/i386/pr59501-4.c
trunk/gcc/testsuite/gcc.target/i386/pr59501-4a.c
trunk/gcc/testsuite/gcc.target/i386/pr59501-5.c
trunk/gcc/testsuite/gcc.target/i386/pr59501-6.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/testsuite/ChangeLog


[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot

2013-12-30 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org ---
Fixed.


[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot

2013-12-19 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
  Known to work||4.8.2
   Target Milestone|--- |4.9.0
Summary|Vector Gather with GCC 4.9  |[4.9 Regression] Vector
   |2013-12-08 Snapshot |Gather with GCC 4.9
   ||2013-12-08 Snapshot


[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot

2013-12-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-12-19
 CC||hjl at gcc dot gnu.org,
   ||hubicka at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org ---
This regressed with r203171.  Before that change, -maccumulate-outgoing-args
was true, but now it isn't.  The changes I see in the RTL dumps is that there
is a (dead) load from r10 register into a pseudo from expand to jump pass, then
the RTL is pretty much the same (different insn numbers) until
pro_and_epilogue, which creates all the garbage.
The reason why the load from r10 is created and supposedly for the different
pro_and_epilogue behavior is ix86_get_drap_rtx:
  if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS)
crtl-need_drap = true;
But in the function in question, LRA has not spilled anything to the stack, the
stack actually isn't used at all, and neither is the drap reg live at the start
of the function (that would be another reason why we'd need to emit some
setting of the drap reg, but probably wouldn't need to dynamically realign the
stack).


[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot

2013-12-19 Thread hjl.tools at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

--- Comment #2 from H.J. Lu hjl.tools at gmail dot com ---
(In reply to Jakub Jelinek from comment #1)

   if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS)
 crtl-need_drap = true;

They are needed for -m32.  Otherwise, we got

FAIL: g++.dg/torture/stackalign/eh-fastcall-1.C  -Os -fpic execution test
FAIL: g++.dg/torture/stackalign/eh-global-1.C  -Os -fpic execution test
FAIL: g++.dg/torture/stackalign/eh-inline-1.C  -Os -fpic execution test
FAIL: g++.dg/torture/stackalign/eh-thiscall-1.C  -Os -fpic execution test


[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot

2013-12-19 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

--- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org ---
(In reply to H.J. Lu from comment #2)
 (In reply to Jakub Jelinek from comment #1)
 
if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS)
  crtl-need_drap = true;
 
 They are needed for -m32.  Otherwise, we got
 
 FAIL: g++.dg/torture/stackalign/eh-fastcall-1.C  -Os -fpic execution test
 FAIL: g++.dg/torture/stackalign/eh-global-1.C  -Os -fpic execution test
 FAIL: g++.dg/torture/stackalign/eh-inline-1.C  -Os -fpic execution test
 FAIL: g++.dg/torture/stackalign/eh-thiscall-1.C  -Os -fpic execution test

I'm not saying that ix86_get_drap_rtx should be changed.
But perhaps:
  /* If the only reason for frame_pointer_needed is that we conservatively
 assumed stack realignment might be needed, but in the end nothing that
 needed the stack alignment had been spilled, clear frame_pointer_needed
 and say we don't need stack realignment.  */
  if (stack_realign
   !crtl-need_drap
   frame_pointer_needed
   crtl-is_leaf
   flag_omit_frame_pointer
   crtl-sp_is_unchanging
   !ix86_current_function_calls_tls_descriptor
   !crtl-accesses_prior_frames
   !cfun-calls_alloca
   !crtl-calls_eh_return
   !(flag_stack_check  STACK_CHECK_MOVING_SP)
   !ix86_frame_pointer_required ()
   get_frame_size () == 0
   ix86_nsaved_sseregs () == 0
   ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0)
in ix86_finalize_stack_realign_flags could be tweaked, not to bail out always
if we have !crtl-need_drap, because then it will be set pretty much for all
leaf functions.  I wonder if we can e.g. ask DF whether the drap reg is live at
entry, if it isn't live, supposedly we can clear crtl-need_drap or ignore it
for this purpose?  Also, I wonder even if we actually need the drap register we
can't for the leaf functions just avoid the dynamic realignment and simply let
the prologue set the drap reg to the right value.


[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot

2013-12-19 Thread hjl.tools at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

--- Comment #4 from H.J. Lu hjl.tools at gmail dot com ---
(In reply to Jakub Jelinek from comment #3)
 
 I'm not saying that ix86_get_drap_rtx should be changed.
 But perhaps:
   /* If the only reason for frame_pointer_needed is that we conservatively
  assumed stack realignment might be needed, but in the end nothing that
  needed the stack alignment had been spilled, clear frame_pointer_needed
  and say we don't need stack realignment.  */
   if (stack_realign
!crtl-need_drap
frame_pointer_needed
crtl-is_leaf
flag_omit_frame_pointer
crtl-sp_is_unchanging
!ix86_current_function_calls_tls_descriptor
!crtl-accesses_prior_frames
!cfun-calls_alloca
!crtl-calls_eh_return
!(flag_stack_check  STACK_CHECK_MOVING_SP)
!ix86_frame_pointer_required ()
get_frame_size () == 0
ix86_nsaved_sseregs () == 0
ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0)
 in ix86_finalize_stack_realign_flags could be tweaked, not to bail out
 always if we have !crtl-need_drap, because then it will be set pretty much
 for all leaf functions.  I wonder if we can e.g. ask DF whether the drap reg
 is live at entry, if it isn't live, supposedly we can clear crtl-need_drap
 or ignore it
 for this purpose?  Also, I wonder even if we actually need the drap register
 we can't for the leaf functions just avoid the dynamic realignment and
 simply let the prologue set the drap reg to the right value.

It sounds a good idea.  BTW, I think we have very decent drap
coverage in gcc testsuite, as long as both -m32 and -m64 are
tested.