[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #51 from hjl dot tools at gmail dot com 2009-03-13 17:10 --- *** Bug 39449 has been marked as a duplicate of this bug. *** -- hjl dot tools at gmail dot com changed: What|Removed |Added CC||danglin at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #50 from janis at gcc dot gnu dot org 2009-03-13 17:05 --- Subject: Bug 39137 Author: janis Date: Fri Mar 13 17:05:08 2009 New Revision: 144841 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144841 Log: 2009-03-13 Jack Howarth PR target/39137 * testsuite/gcc.target/i386/stackalign/longlong-2.c: Skip on darwin. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/stackalign/longlong-2.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #49 from hjl dot tools at gmail dot com 2009-03-12 13:30 --- (In reply to comment #48) > If it ignores -mpreferred-stack-boundary it shouldn't end up setting > ix86_preferred_stack_boundary to the ignored value. > i386/darwin.h has /* On Darwin, the stack is 128-bit aligned at the point of every call. Failure to ensure this will lead to a crash in the system libraries or dynamic loader. */ #undef STACK_BOUNDARY #define STACK_BOUNDARY 128 #undef MAIN_STACK_BOUNDARY #define MAIN_STACK_BOUNDARY 128 /* Since we'll never want a stack boundary less aligned than 128 bits we need the extra work here otherwise bits of gcc get very grumpy when we ask for lower alignment. We could just reject values less than 128 bits for Darwin, but it's easier to up the alignment if it's below the minimum. */ #undef PREFERRED_STACK_BOUNDARY #define PREFERRED_STACK_BOUNDARY\ MAX (STACK_BOUNDARY, ix86_preferred_stack_boundary) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #48 from rguenth at gcc dot gnu dot org 2009-03-12 09:43 --- If it ignores -mpreferred-stack-boundary it shouldn't end up setting ix86_preferred_stack_boundary to the ignored value. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #47 from Joey dot ye at intel dot com 2009-03-12 06:51 --- (In reply to comment #46) > Created an attachment (id=17444) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17444&action=view) [edit] > gcc.target/i386/stackalign/longlong-2.c for -mnostackalign on darwin10 > /sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/xgcc > -B/sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/ > /sw/src/fink.build/gcc44-4.3.999-20090311/gcc-4.4-20090311/gcc/testsuite/gcc.target/i386/stackalign/longlong-2.c > -mstackrealign -O2 -mpreferred-stack-boundary=2 -S -m32 -o longlong-2.s That's because MacOS require stack alignment to 16 byte when making call and ignores -mpreferred-stack-boundary=2. These cases should skipped for MacOS. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #46 from howarth at nitro dot med dot uc dot edu 2009-03-12 00:46 --- Created an attachment (id=17444) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17444&action=view) gcc.target/i386/stackalign/longlong-2.c for -mnostackalign on darwin10 /sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/ /sw/src/fink.build/gcc44-4.3.999-20090311/gcc-4.4-20090311/gcc/testsuite/gcc.target/i386/stackalign/longlong-2.c -mstackrealign -O2 -mpreferred-stack-boundary=2 -S -m32 -o longlong-2.s -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #45 from howarth at nitro dot med dot uc dot edu 2009-03-12 00:45 --- Created an attachment (id=17443) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17443&action=view) gcc.target/i386/stackalign/longlong-2.c for -mstackalign on darwin10 /sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/ /sw/src/fink.build/gcc44-4.3.999-20090311/gcc-4.4-20090311/gcc/testsuite/gcc.target/i386/stackalign/longlong-2.c -mstackrealign -O2 -mpreferred-stack-boundary=2 -S -m32 -o longlong-2.s -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #44 from hjl dot tools at gmail dot com 2009-03-12 00:44 --- Those tests should be skipped on MacOS since it ignores -mpreferred-stack-boundary=2. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #43 from howarth at nitro dot med dot uc dot edu 2009-03-12 00:41 --- On darwin10, I am seeing... Executing on host: /sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/ /sw/src/fink.build/gcc44-4.3.999-20090311/gcc-4.4-20090311/gcc/testsuite/gcc.target/i386/stackalign/longlong-2.c -mstackrealign -O2 -mpreferred-stack-boundary=2 -S -m32 -o longlong-2.s (timeout = 300) PASS: gcc.target/i386/stackalign/longlong-2.c -mstackrealign (test for excess errors) FAIL: gcc.target/i386/stackalign/longlong-2.c scan-assembler-times and[lq]?[^\n]*-8,[^\n]*sp 2 FAIL: gcc.target/i386/stackalign/longlong-2.c scan-assembler-times and[lq]?[^\n]*-16,[^\n]*sp 2 and Executing on host: /sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20090311/darwin_objdir/gcc/ /sw/src/fink.build/gcc44-4.3.999-20090311/gcc-4.4-20090311/gcc/testsuite/gcc.target/i386/stackalign/longlong-2.c -mno-stackrealign -O2 -mpreferred-stack-boundary=2 -S -m32 -o longlong-2.s (timeout = 300) PASS: gcc.target/i386/stackalign/longlong-2.c -mno-stackrealign (test for excess errors) FAIL: gcc.target/i386/stackalign/longlong-2.c scan-assembler-times and[lq]?[^\n]*-8,[^\n]*sp 2 FAIL: gcc.target/i386/stackalign/longlong-2.c scan-assembler-times and[lq]?[^\n]*-16,[^\n]*sp 2 The assembly from each is attached. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #42 from jakub at gcc dot gnu dot org 2009-03-11 21:23 --- Fixed. -- jakub at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #41 from jakub at gcc dot gnu dot org 2009-03-11 21:12 --- Subject: Bug 39137 Author: jakub Date: Wed Mar 11 21:12:33 2009 New Revision: 144792 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144792 Log: PR target/39137 * cfgexpand.c (get_decl_align_unit): Use LOCAL_DECL_ALIGNMENT macro. * defaults.h (LOCAL_DECL_ALIGNMENT): Define if not yet defined. * config/i386/i386.h (LOCAL_DECL_ALIGNMENT): Define. * config/i386/i386.c (ix86_local_alignment): For -m32 -mpreferred-stack-boundary=2 use 32-bit alignment for long long variables on the stack to avoid dynamic realignment. Allow the first argument to be a decl rather than type. * doc/tm.texi (LOCAL_DECL_ALIGNMENT): Document. * gcc.target/i386/stackalign/longlong-1.c: New test. * gcc.target/i386/stackalign/longlong-2.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/stackalign/longlong-1.c trunk/gcc/testsuite/gcc.target/i386/stackalign/longlong-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/defaults.h trunk/gcc/doc/tm.texi trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #40 from rguenth at gcc dot gnu dot org 2009-03-11 20:04 --- Both patches look ok to me. For 4.5 we might want to consider merging some of the target hooks though. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #39 from hjl dot tools at gmail dot com 2009-03-11 17:17 --- (In reply to comment #38) > Created an attachment (id=17442) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17442&action=view) [edit] > gcc44-pr39137-2.patch > > Alternative patch which does stack realignment in f[2345] rather than just in > f[345]. > That is very nice. Can you add testcases for f[2345]? Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #38 from jakub at gcc dot gnu dot org 2009-03-11 17:14 --- Created an attachment (id=17442) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17442&action=view) gcc44-pr39137-2.patch Alternative patch which does stack realignment in f[2345] rather than just in f[345]. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #37 from jakub at gcc dot gnu dot org 2009-03-11 16:26 --- Created an attachment (id=17440) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17440&action=view) gcc44-pr39137.patch Patch I'm going to bootstrap/regtest now. I've re-added the TYPE_USER_ALIGN check, because it fixes the f3 function in: /* { dg-options "-Os -m32 -mpreferred-stack-boundary=2" } */ void fn (void *); void f1 (void) { unsigned long long a; fn (&a); } void f2 (void) { unsigned long long a __attribute__((aligned (8))); fn (&a); } void f3 (void) { typedef unsigned long long L __attribute__((aligned (8))); L a; fn (&a); } void f4 (void) { unsigned long long a __attribute__((aligned (16))); fn (&a); } void f5 (void) { typedef unsigned long long L __attribute__((aligned (16))); L a; fn (&a); } To cure even f2, we'd have to invent a new macro (LOCAL_DATA_ALIGNMENT), which would be passed the DECL instead of type (in i386 case both could just call ix86_local_alignment and if the first argument is non-NULL, if could just use DECL_P vs. TYPE_P to find out if it is a decl or type). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #36 from rguenth at gcc dot gnu dot org 2009-03-11 14:47 --- In reply to comment #34, we should be able to fixup alignment in get_decl_align_unit if DECL_USER_ALIGN is set (or change the prototype for LOCAL_ALIGNMENT to take a decl and/or a type). I wonder why we have both LOCAL_ALIGNMENT and STACK_SLOT_ALIGNMENT anyway... but, Jakub, will you test & submit your patch from comment #32? Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #35 from Joey dot ye at intel dot com 2009-03-04 01:41 --- (In reply to comment #32) > I don't see the reason for && optimize_function_for_size_p (cfun), care to > back > up with benchmarks that forcing dynamic realignment for long long variables > with -mpreferred-stack-boundary=2 improves performance rather than slows > things > down (because of the dynamic realignment)? Checking optimize_function_for_size_p is to avoid prologue/epilogue code size increase when -Os is used, which is initially complained by Jakub. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #34 from jakub at gcc dot gnu dot org 2009-03-03 16:01 --- Yeah, unsigned long long l __attribute__ ((aligned(8))); won't be 64-bit aligned with -m32 -mpreferred-stack-boundary=2, but I think that's not a big deal and isn't a regression from 4.3 and earlier anyway. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #33 from hjl dot tools at gmail dot com 2009-03-03 14:48 --- (In reply to comment #32) > I don't see the reason for && optimize_function_for_size_p (cfun), care to > back > up with benchmarks that forcing dynamic realignment for long long variables > with -mpreferred-stack-boundary=2 improves performance rather than slows > things > down (because of the dynamic realignment)? > > Also, I fail to see why 2 hunks in ix86_local_alignment are needed instead of > just one. The second hunk won't catch !type case, where we have just mode > (but > no need to test type && there, type is always non-NULL). > > I think: > --- i386.c2009-03-02 09:45:43.0 +0100 > +++ i386.c2009-03-03 11:35:21.0 +0100 > @@ -19351,6 +19351,14 @@ unsigned int > ix86_local_alignment (tree type, enum machine_mode mode, >unsigned int align) > { > + /* Don't do dynamic stack realignment for long long objects with > + -mpreferred-stack-boundary=2. */ > + if (!TARGET_64BIT > + && align == 64 > + && ix86_preferred_stack_boundary < 64 > + && (mode == DImode || (type && TYPE_MODE (type) == DImode))) > +align = 32; > + >/* If TYPE is NULL, we are allocating a stack slot for caller-save > register in MODE. We will return the largest alignment of XF > and DF. */ > > should be sufficient. > I am not against this patch. I just want to mention that --- void foo (unsigned long long *); void bar (void) { unsigned long long l __attribute__ ((aligned(8))); foo (&l); } --- won't work with -m32 -Os -mpreferred-stack-boundary=2 when this patch is applied. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #32 from jakub at gcc dot gnu dot org 2009-03-03 10:36 --- I don't see the reason for && optimize_function_for_size_p (cfun), care to back up with benchmarks that forcing dynamic realignment for long long variables with -mpreferred-stack-boundary=2 improves performance rather than slows things down (because of the dynamic realignment)? Also, I fail to see why 2 hunks in ix86_local_alignment are needed instead of just one. The second hunk won't catch !type case, where we have just mode (but no need to test type && there, type is always non-NULL). I think: --- i386.c2009-03-02 09:45:43.0 +0100 +++ i386.c2009-03-03 11:35:21.0 +0100 @@ -19351,6 +19351,14 @@ unsigned int ix86_local_alignment (tree type, enum machine_mode mode, unsigned int align) { + /* Don't do dynamic stack realignment for long long objects with + -mpreferred-stack-boundary=2. */ + if (!TARGET_64BIT + && align == 64 + && ix86_preferred_stack_boundary < 64 + && (mode == DImode || (type && TYPE_MODE (type) == DImode))) +align = 32; + /* If TYPE is NULL, we are allocating a stack slot for caller-save register in MODE. We will return the largest alignment of XF and DF. */ should be sufficient. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #31 from Joey dot ye at intel dot com 2009-02-23 03:15 --- How about this patch? 1. Only reduce DI mode when -Os 2. Ignore TYPE_USER_ALIGN, so that stack realign happens for case in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137#c28, which IMHO is acceptable. Index: config/i386/i386.c === --- config/i386/i386.c (revision 5221) +++ config/i386/i386.c (working copy) @@ -19607,6 +19607,13 @@ ix86_local_alignment (tree type, enum machine_mode mode, unsigned int align) { + /* We don't want to align DImode to 64bit for compilation with + -mpreferred-stack-boundary=2 to not enforce dynamic stack alignment + prologue. */ + if (mode == DImode && !TARGET_64BIT && ix86_preferred_stack_boundary < 64 + && optimize_function_for_size_p (cfun)) +align = 32; + /* If TYPE is NULL, we are allocating a stack slot for caller-save register in MODE. We will return the largest alignment of XF and DF. */ @@ -19616,6 +19623,12 @@ align = GET_MODE_ALIGNMENT (DFmode); return align; } + if (!TARGET_64BIT + && optimize_function_for_size_p (cfun) + && align == 64 + && ix86_preferred_stack_boundary < 64 + && (mode == DImode || (type && TYPE_MODE (type) == DImode))) +align = 32; /* x86-64 ABI requires arrays greater than 16 bytes to be aligned to 16byte boundary. */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #30 from hjl dot tools at gmail dot com 2009-02-22 19:28 --- (In reply to comment #29) > I mean aligned(64). > I guess something like this then? > Index: config/i386/i386.c > === > --- config/i386/i386.c (revision 144373) > +++ config/i386/i386.c (working copy) > @@ -19351,6 +19351,12 @@ unsigned int > ix86_local_alignment (tree type, enum machine_mode mode, > unsigned int align) > { > + /* We don't want to align DImode to 64bit for compilation with > + -mpreferred-stack-boundary=2 to not enforce dynamic stack alignment > + prologue. */ > + if (mode == DImode && !TARGET_64BIT && ix86_preferred_stack_boundary < 64) > +align = 32; > + It will always align DI to 4 byte in 32bit mode. Did you mean to replace it with the code below? >/* If TYPE is NULL, we are allocating a stack slot for caller-save > register in MODE. We will return the largest alignment of XF > and DF. */ > @@ -19360,6 +19366,12 @@ ix86_local_alignment (tree type, enum ma > align = GET_MODE_ALIGNMENT (DFmode); >return align; > } > + if (!TARGET_64BIT > + && align == 64 > + && ix86_preferred_stack_boundary < 64 > + && (mode == DImode || (type && TYPE_MODE (type) == DImode)) > + && (!type || !TYPE_USER_ALIGN (type))) > +align = 32; > TYPE_USER_ALIGN isn't set on DImode due to canonical type: Breakpoint 5, ix86_local_alignment (type=0x77ed2900, mode=VOIDmode, align=256) at /export/gnu/src/gcc-work/gcc/gcc/config/i386/i386.c:19433 19433 if (!type) (gdb) call debug_tree (type) constant 64> unit size constant 8> align 64 symtab 0 alias set -1 canonical type 0x77ed2900 precision 64 min max pointer_to_this > (gdb) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #29 from hubicka at gcc dot gnu dot org 2009-02-22 18:46 --- I mean aligned(64). I guess something like this then? Index: config/i386/i386.c === --- config/i386/i386.c (revision 144373) +++ config/i386/i386.c (working copy) @@ -19351,6 +19351,12 @@ unsigned int ix86_local_alignment (tree type, enum machine_mode mode, unsigned int align) { + /* We don't want to align DImode to 64bit for compilation with + -mpreferred-stack-boundary=2 to not enforce dynamic stack alignment + prologue. */ + if (mode == DImode && !TARGET_64BIT && ix86_preferred_stack_boundary < 64) +align = 32; + /* If TYPE is NULL, we are allocating a stack slot for caller-save register in MODE. We will return the largest alignment of XF and DF. */ @@ -19360,6 +19366,12 @@ ix86_local_alignment (tree type, enum ma align = GET_MODE_ALIGNMENT (DFmode); return align; } + if (!TARGET_64BIT + && align == 64 + && ix86_preferred_stack_boundary < 64 + && (mode == DImode || (type && TYPE_MODE (type) == DImode)) + && (!type || !TYPE_USER_ALIGN (type))) +align = 32; /* x86-64 ABI requires arrays greater than 16 bytes to be aligned to 16byte boundary. */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #28 from hjl dot tools at gmail dot com 2009-02-21 15:38 --- (In reply to comment #26) > I had somehting along this lines in mind: > Index: config/i386/i386.c > === > *** config/i386/i386.c (revision 144352) > --- config/i386/i386.c (working copy) > *** unsigned int > *** 19332,19337 > --- 19332,19343 > ix86_local_alignment (tree type, enum machine_mode mode, > unsigned int align) > { > + /* We don't want to align DImode to 64bit for compilation with > + -mpreferred-stack-boundary=2 to not enforce dynamic stack alignment > + prologue. */ > + if (mode == DImode && !TARGET_64BIT && ix86_preferred_stack_boundary < 64) > + align = 32; > + > /* If TYPE is NULL, we are allocating a stack slot for caller-save >register in MODE. We will return the largest alignment of XF >and DF. */ > Will it work with --- void foo (unsigned long long *); void bar (void) { unsigned long long l __attribute__ ((aligned(32))); foo (&l); } --- -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #27 from rguenth at gcc dot gnu dot org 2009-02-21 13:05 --- That patch looks reasonable. Care to bootstrap/test it to settle this last P1? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #26 from hubicka at gcc dot gnu dot org 2009-02-21 13:00 --- I had somehting along this lines in mind: Index: config/i386/i386.c === *** config/i386/i386.c (revision 144352) --- config/i386/i386.c (working copy) *** unsigned int *** 19332,19337 --- 19332,19343 ix86_local_alignment (tree type, enum machine_mode mode, unsigned int align) { + /* We don't want to align DImode to 64bit for compilation with + -mpreferred-stack-boundary=2 to not enforce dynamic stack alignment + prologue. */ + if (mode == DImode && !TARGET_64BIT && ix86_preferred_stack_boundary < 64) + align = 32; + /* If TYPE is NULL, we are allocating a stack slot for caller-save register in MODE. We will return the largest alignment of XF and DF. */ this will reduce alignment of long long as stack local variable. Because we want to keep it aligned for DI->DF, I can do that only for -mpreferred-stack-boundary=2 for now and perhaps on mainline we can invent new macro PREFERRED_LOCAL_ALIGN that will return 64 and cfgexpand will allocate 64byte aligned slot but not increase alignment_needed (i.e. control slot alignment via PREFERRED_LOCAL_ALIGN and stack alignment otherwise) and we will increase the alignment when expanding full sized DImode instruction? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #25 from hjl dot tools at gmail dot com 2009-02-17 17:29 --- Created an attachment (id=17311) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17311&action=view) A patch to add a new -malign-long-long= option Here is a patch to add a new option, -malign-long-long=. -- hjl dot tools at gmail dot com changed: What|Removed |Added Attachment #17277|0 |1 is obsolete|| Attachment #17279|0 |1 is obsolete|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #24 from hjl dot tools at gmail dot com 2009-02-17 15:40 --- (In reply to comment #23) > I guess I can live with a switch, the kernel folks will just need to add it. > Now, should that switch be about disabling all dynamic stack realignments, or > do that for DImode only, or decrease DImode alignment when on stack to > 32-bits? > We shouldn't tell compiler to disable dynamic stack alignment while asking compiler to align variables on stack. We should add a new command line option to align DImode to 4byte on stack for 32bit host. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #23 from jakub at gcc dot gnu dot org 2009-02-17 15:25 --- I guess I can live with a switch, the kernel folks will just need to add it. Now, should that switch be about disabling all dynamic stack realignments, or do that for DImode only, or decrease DImode alignment when on stack to 32-bits? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #22 from hjl dot tools at gmail dot com 2009-02-17 14:52 --- -mpreferred-stack-boundary=2 can also be used to generate psABI conforming code. I think we need a new option to align DImode to 4 byte on stack if we want to change DImode alignment on stack. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #21 from jakub at gcc dot gnu dot org 2009-02-17 09:29 --- Unless you consider that option being -mpreferred-stack-boundary=2. By default stack boundary is bigger and so DImode is aligned naturally, it is only when you want very compat code that you use this option. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #20 from Joey dot ye at intel dot com 2009-02-17 09:18 --- (In reply to comment #19) > Just for the record, here is an unsuccessful attempt to avoid stack > realignment > just because of DImode for -m32 or because of DFmode at -m32 -Os. This patch > unfortunately caused a handful regressions, like 20020220-1.c. Is it OK to enable this patch with a new option? Defaultly not to realign a mode (DImode) to its nature boundary is confusing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #19 from jakub at gcc dot gnu dot org 2009-02-13 19:15 --- Created an attachment (id=17296) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17296&action=view) Unsuccessful attempt to avoid stack realignment for DImode and for DFmode at -Os Just for the record, here is an unsuccessful attempt to avoid stack realignment just because of DImode for -m32 or because of DFmode at -m32 -Os. This patch unfortunately caused a handful regressions, like 20020220-1.c. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #18 from jakub at gcc dot gnu dot org 2009-02-12 21:17 --- In that case people probably wouldn't use -mpreferred-stack-boundary=2 though... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #17 from ubizjak at gmail dot com 2009-02-12 18:23 --- (In reply to comment #15) > The DFmode and DImodes are different. Aligning DFmode on stack is very > performance critical, while DImodes on 32bit machine can quite safely be > misaligned (if we ignore their possible use in MMX intrincisc). I would just point out that on 32bit targets we use XMM registers in conversion from long long and unsigned int to FP value. I have found, that when DImode value is stored from intermediate XMM register to memory and then read from this location via fildll, it is very important that DImode memory is aligned to its natural boundary (8 bytes). Misaligned DImode memory probably defeats store forwarding - we are talking about 5-6 _times_ longer execution times on short loops (please grep for "store forwarding" in i386.md for comments on this matter). I think that due to this performance hit, DImode locations should remain to be aligned on 32bit targets, see PR 13958. -- ubizjak at gmail dot com changed: What|Removed |Added BugsThisDependsOn||13958 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #16 from hjl dot tools at gmail dot com 2009-02-12 17:01 --- (In reply to comment #15) > The DFmode and DImodes are different. Aligning DFmode on stack is very > performance critical, while DImodes on 32bit machine can quite safely be > misaligned (if we ignore their possible use in MMX intrincisc). > I think we ought to be able to handle this case without any extra options: > when > function use DFmode variables trigger stack realign at > -mpreferred-stack-boundary=2 and -O2, while do not trigger it with DImode. That may break Ada or others. See comment #11. I still think a new command line option to control DImode alignment is better. It won't be the first command line option which changes ABI. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #15 from hubicka at gcc dot gnu dot org 2009-02-12 16:47 --- The DFmode and DImodes are different. Aligning DFmode on stack is very performance critical, while DImodes on 32bit machine can quite safely be misaligned (if we ignore their possible use in MMX intrincisc). I think we ought to be able to handle this case without any extra options: when function use DFmode variables trigger stack realign at -mpreferred-stack-boundary=2 and -O2, while do not trigger it with DImode. (I would bet that this behaviour would be SPECfp win for -mpreferred-stack-boundary=2 -O2 compared tot he bahviour where we misalign both by default. it would perhaps even be win for 32bit distro build compared to current default of preferred-stack-boundary=3) I will try to look deeper into this tomorrow, but I guess we will need new target macro to declare this? (i.e. macro that would declare for DImode the fact "we like to align this type to given boundary, but we don't want to work too hard, in particular we don't want to trigger stack realignment"). This could be achievable by ADJUST_ALIGNMENT trick if it wasn't breaking Ada unfortunately. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #14 from jakub at gcc dot gnu dot org 2009-02-12 08:03 --- Won't this break Ada again (with -malign-double=2)? I think we should reject -malign-double= for -m64. Alternatively, what about making MAX_STACK_ALIGNMENT a parameter instead, so kernel could use -mmax-stack-boundary=2 -mpreferred-stack-boundary=2? i386.c would ensure this variable is at least as big as ix86_preferred_stack_boundary. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #13 from pinskia at gcc dot gnu dot org 2009-02-11 22:11 --- Is there a reason why this is a P1? I don't see why this should be a P1. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #12 from hjl dot tools at gmail dot com 2009-02-11 14:39 --- (In reply to comment #11) > (In reply to comment #9) > > Created an attachment (id=17279) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17279&action=view) [edit] > > A patch to add a new -malign-double= option > > HJ, there were lots of problems with similar approach, see this revert: > Another reason for the new option. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #11 from ubizjak at gmail dot com 2009-02-11 08:42 --- (In reply to comment #9) > Created an attachment (id=17279) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17279&action=view) [edit] > A patch to add a new -malign-double= option HJ, there were lots of problems with similar approach, see this revert: 2008-03-23 Uros Bizjak Revert: 2008-03-05 H.J. Lu * config/i386/i386-modes.def: Use 4 byte alignment on DI for 32bit host. 2008-03-19 Uros Bizjak PR target/35496 * stor-layout.c (update_alignment_for_field): Set minimum alignment of the underlying type of a MS bitfield layout to the natural alignment of the type. 2008-03-22 Uros Bizjak * config/i386/i386.c (assign_386_stack_local): Align DImode slots to their natural alignment to avoid store forwarding stalls. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #10 from Joey dot ye at intel dot com 2009-02-11 01:03 --- (In reply to comment #9) > Created an attachment (id=17279) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17279&action=view) [edit] > A patch to add a new -malign-double= option This patch looks OK to me. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #9 from hjl dot tools at gmail dot com 2009-02-10 22:29 --- Created an attachment (id=17279) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17279&action=view) A patch to add a new -malign-double= option -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #8 from hjl dot tools at gmail dot com 2009-02-10 21:15 --- (In reply to comment #6) > This would mean -Os vs. -O2 gives different __alignof__(long long) values, I __alignof__(type) isn't that useful. The alignment of double changes depending on 1. If it is on stack. 2. If -malign-double is used. 3. If it is used in structure. 4. If it is passed on stack. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #7 from hjl dot tools at gmail dot com 2009-02-10 21:02 --- (In reply to comment #6) > This would mean -Os vs. -O2 gives different __alignof__(long long) values, I > think that's a bad idea. I think a new option to disable dynamic realignment > or at least do that if estimated stack size is <= 64 bits would be better. > We do stack alignment to satisfy variable alignment requirement. You don't want to disable it blindly. The proper way to avoid stack alignment is to tell compiler not to align variable on stack. We can add a new option, -malign-double=4, to align DI/DF to 4 bytes on stack. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #6 from jakub at gcc dot gnu dot org 2009-02-10 20:48 --- This would mean -Os vs. -O2 gives different __alignof__(long long) values, I think that's a bad idea. I think a new option to disable dynamic realignment or at least do that if estimated stack size is <= 64 bits would be better. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #5 from hjl dot tools at gmail dot com 2009-02-10 20:32 --- Created an attachment (id=17277) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17277&action=view) A patch Does this patch look OK? If yes, I will submit it with a couple of testcases. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #4 from hjl dot tools at gmail dot com 2009-02-09 15:21 --- (In reply to comment #3) > How can #1 cause a problem with -ftree-vectorize (especially when it hasn't I don't believe "-mpreferred-stack-boundary=2 -ftree-vectorize" works well in gcc 4.3. > been problem in 4.3 and earlier)? We'd do realignment for V[1248]* modes, > just > not for DImode/DFmode... > Xuepeng, Joey, can you verify align DImode/DFmode to 4 bytes on stack, when "-Os -mpreferred-stack-boundary=2" is used, won't cause prolems with -ftree-vectorize? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #3 from jakub at gcc dot gnu dot org 2009-02-09 15:06 --- How can #1 cause a problem with -ftree-vectorize (especially when it hasn't been problem in 4.3 and earlier)? We'd do realignment for V[1248]* modes, just not for DImode/DFmode... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #2 from hjl dot tools at gmail dot com 2009-02-09 14:57 --- (In reply to comment #1) > Confirmed. I think with -Os or even more with -mpreferred-stack-boundary > dynamic stack alignment should _not_ be used. > That will cause core dump on programs with __m128/__m256. We have a few choices for -Os. 1. Don't align DFmode/DImode on stacks. Or 2. Add a new switch to disable DFmode/DImode alignment on stack. #1 may also cause problems for -Os -ftree-vectorize. I think we should add a new switch to disable DFmode/DImode alignment on stack. -- hjl dot tools at gmail dot com changed: What|Removed |Added CC|hjl at gcc dot gnu dot org |hjl dot tools at gmail dot ||com, Joey dot ye at intel ||dot com, xuepeng dot guo at ||intel dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-02-09 14:39 --- Confirmed. I think with -Os or even more with -mpreferred-stack-boundary dynamic stack alignment should _not_ be used. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Keywords||missed-optimization Priority|P3 |P1 Last reconfirmed|-00-00 00:00:00 |2009-02-09 14:39:13 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137
[Bug target/39137] [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign
-- jakub at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|--- |4.4.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137