[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501 --- Comment #7 from hjl at gcc dot gnu.org --- Author: hjl Date: Tue Sep 5 16:39:24 2017 New Revision: 251718 URL: https://gcc.gnu.org/viewcvs?rev=251718=gcc=rev Log: i386: Avoid stack realignment if possible ix86_finalize_stack_frame_flags has been extended to eliminate frame pointer when the new stack frame isn't needed with and without -maccumulate-outgoing-args as well as -fomit-frame-pointer. Since stack access with larger alignment may be optimized out, to decide if stack realignment is needed, we need to not only check for stack frame access, but also verify the alignment of stack frame access. Since alignment of memory access via arg_pointer is set up by caller, not by callee, we should find the maximum stack alignment from the stack frame access instructions via stack pointer and frame pointrer to avoid stack realignment when stack alignment needed is less than incoming stack boundary. gcc/ PR target/59501 PR target/81624 PR target/81769 * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't realign stack if stack alignment needed is less than incoming stack boundary. gcc/testsuite/ PR target/59501 PR target/81624 PR target/81769 * gcc.target/i386/pr59501-4a.c: Remove xfail. * gcc.target/i386/pr81769-1a.c: New test. * gcc.target/i386/pr81769-1b.c: Likewise. * gcc.target/i386/pr81769-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr81769-1a.c trunk/gcc/testsuite/gcc.target/i386/pr81769-1b.c trunk/gcc/testsuite/gcc.target/i386/pr81769-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr59501-4a.c
[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org --- Author: jakub Date: Mon Dec 30 08:53:10 2013 New Revision: 206243 URL: http://gcc.gnu.org/viewcvs?rev=206243root=gccview=rev Log: PR target/59501 * config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg if !crtl-stack_realign_needed. (ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry and stack_realign_needed will be false, clear drap_reg and need_drap. Optimize leaf functions that don't need stack frame even if crtl-need_drap. * gcc.target/i386/pr59501-1.c: New test. * gcc.target/i386/pr59501-1a.c: New test. * gcc.target/i386/pr59501-2.c: New test. * gcc.target/i386/pr59501-2a.c: New test. * gcc.target/i386/pr59501-3.c: New test. * gcc.target/i386/pr59501-3a.c: New test. * gcc.target/i386/pr59501-4.c: New test. * gcc.target/i386/pr59501-4a.c: New test. * gcc.target/i386/pr59501-5.c: New test. * gcc.target/i386/pr59501-6.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr59501-1.c trunk/gcc/testsuite/gcc.target/i386/pr59501-1a.c trunk/gcc/testsuite/gcc.target/i386/pr59501-2.c trunk/gcc/testsuite/gcc.target/i386/pr59501-2a.c trunk/gcc/testsuite/gcc.target/i386/pr59501-3.c trunk/gcc/testsuite/gcc.target/i386/pr59501-3a.c trunk/gcc/testsuite/gcc.target/i386/pr59501-4.c trunk/gcc/testsuite/gcc.target/i386/pr59501-4a.c trunk/gcc/testsuite/gcc.target/i386/pr59501-5.c trunk/gcc/testsuite/gcc.target/i386/pr59501-6.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org --- Fixed.
[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Known to work||4.8.2 Target Milestone|--- |4.9.0 Summary|Vector Gather with GCC 4.9 |[4.9 Regression] Vector |2013-12-08 Snapshot |Gather with GCC 4.9 ||2013-12-08 Snapshot
[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P1 Status|UNCONFIRMED |NEW Last reconfirmed||2013-12-19 CC||hjl at gcc dot gnu.org, ||hubicka at gcc dot gnu.org, ||jakub at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- This regressed with r203171. Before that change, -maccumulate-outgoing-args was true, but now it isn't. The changes I see in the RTL dumps is that there is a (dead) load from r10 register into a pseudo from expand to jump pass, then the RTL is pretty much the same (different insn numbers) until pro_and_epilogue, which creates all the garbage. The reason why the load from r10 is created and supposedly for the different pro_and_epilogue behavior is ix86_get_drap_rtx: if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS) crtl-need_drap = true; But in the function in question, LRA has not spilled anything to the stack, the stack actually isn't used at all, and neither is the drap reg live at the start of the function (that would be another reason why we'd need to emit some setting of the drap reg, but probably wouldn't need to dynamically realign the stack).
[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501 --- Comment #2 from H.J. Lu hjl.tools at gmail dot com --- (In reply to Jakub Jelinek from comment #1) if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS) crtl-need_drap = true; They are needed for -m32. Otherwise, we got FAIL: g++.dg/torture/stackalign/eh-fastcall-1.C -Os -fpic execution test FAIL: g++.dg/torture/stackalign/eh-global-1.C -Os -fpic execution test FAIL: g++.dg/torture/stackalign/eh-inline-1.C -Os -fpic execution test FAIL: g++.dg/torture/stackalign/eh-thiscall-1.C -Os -fpic execution test
[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501 --- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org --- (In reply to H.J. Lu from comment #2) (In reply to Jakub Jelinek from comment #1) if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS) crtl-need_drap = true; They are needed for -m32. Otherwise, we got FAIL: g++.dg/torture/stackalign/eh-fastcall-1.C -Os -fpic execution test FAIL: g++.dg/torture/stackalign/eh-global-1.C -Os -fpic execution test FAIL: g++.dg/torture/stackalign/eh-inline-1.C -Os -fpic execution test FAIL: g++.dg/torture/stackalign/eh-thiscall-1.C -Os -fpic execution test I'm not saying that ix86_get_drap_rtx should be changed. But perhaps: /* If the only reason for frame_pointer_needed is that we conservatively assumed stack realignment might be needed, but in the end nothing that needed the stack alignment had been spilled, clear frame_pointer_needed and say we don't need stack realignment. */ if (stack_realign !crtl-need_drap frame_pointer_needed crtl-is_leaf flag_omit_frame_pointer crtl-sp_is_unchanging !ix86_current_function_calls_tls_descriptor !crtl-accesses_prior_frames !cfun-calls_alloca !crtl-calls_eh_return !(flag_stack_check STACK_CHECK_MOVING_SP) !ix86_frame_pointer_required () get_frame_size () == 0 ix86_nsaved_sseregs () == 0 ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0) in ix86_finalize_stack_realign_flags could be tweaked, not to bail out always if we have !crtl-need_drap, because then it will be set pretty much for all leaf functions. I wonder if we can e.g. ask DF whether the drap reg is live at entry, if it isn't live, supposedly we can clear crtl-need_drap or ignore it for this purpose? Also, I wonder even if we actually need the drap register we can't for the leaf functions just avoid the dynamic realignment and simply let the prologue set the drap reg to the right value.
[Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501 --- Comment #4 from H.J. Lu hjl.tools at gmail dot com --- (In reply to Jakub Jelinek from comment #3) I'm not saying that ix86_get_drap_rtx should be changed. But perhaps: /* If the only reason for frame_pointer_needed is that we conservatively assumed stack realignment might be needed, but in the end nothing that needed the stack alignment had been spilled, clear frame_pointer_needed and say we don't need stack realignment. */ if (stack_realign !crtl-need_drap frame_pointer_needed crtl-is_leaf flag_omit_frame_pointer crtl-sp_is_unchanging !ix86_current_function_calls_tls_descriptor !crtl-accesses_prior_frames !cfun-calls_alloca !crtl-calls_eh_return !(flag_stack_check STACK_CHECK_MOVING_SP) !ix86_frame_pointer_required () get_frame_size () == 0 ix86_nsaved_sseregs () == 0 ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0) in ix86_finalize_stack_realign_flags could be tweaked, not to bail out always if we have !crtl-need_drap, because then it will be set pretty much for all leaf functions. I wonder if we can e.g. ask DF whether the drap reg is live at entry, if it isn't live, supposedly we can clear crtl-need_drap or ignore it for this purpose? Also, I wonder even if we actually need the drap register we can't for the leaf functions just avoid the dynamic realignment and simply let the prologue set the drap reg to the right value. It sounds a good idea. BTW, I think we have very decent drap coverage in gcc testsuite, as long as both -m32 and -m64 are tested.