On i?86, Linux kernel (or e.g. valgrind) are compiled with -Os -m32 -mpreferred-stack-boundary=2. AFAIK this is used primarily to make generated code small. But when compiled with gcc 4.4, lots of functions at least in valgrind (haven't checked kernel, but I assume even more so there) now newly uses dynamic realignment. Short testcase: void foo (unsigned long long *);
int bar (void) { unsigned long long l; foo (&l); return 0; } The problem is that without -malign-double, long long and double have 32-bit alignment only in aggregate fields, when used standalone have for performance reasons 64-bit alignment. When used in .data/.rodata etc. this is not a problem, but when DImode/DFmode vars are automatic, this means crtl->stack_alignment_estimated is 64 and so stack is dynamically realigned, which increases generated code size quite a bit and performance wise slows stuff down as well. Could we not count DFmode/DImode vars into crtl->stack_alignment_estimated on i386 using some target macro, or at least not count them conditionally (on -Os + -mpreferred-stack-boundary=2, or perhaps based on some other option)? I'm pretty sure the kernel people will be very unhappy about this. -- Summary: [4.4 Regression] -mpreferred-stack-boundary=2 causes lots of dynamic realign Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jakub at gcc dot gnu dot org GCC target triplet: i?86-*-linux* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137