https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
--- Comment #21 from Andy Lutomirski <luto at mit dot edu> --- (In reply to H.J. Lu from comment #20) > (In reply to Andy Lutomirski from comment #19) > > I don't think the fix is correct. > > > > This works: > > > > gcc -mno-sse -mpreferred-stack-boundary=3 ... > > > > This does not: > > > > gcc -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3 ... > > > > Please provide a testcase. No code needed: $ touch foo.c $ gcc -c -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3 foo.c foo.c:1:0: error: -mincoming-stack-boundary=3 is not between 4 and 12 $ gcc -c -mno-sse -mpreferred-stack-boundary=3 foo.c > > > This makes no sense, since they should be equivalent. > > > > Also, I find the docs to be unclear as to what different values of the > > incoming and preferred stack boundaries mean. > > > > Finally, why is -mno-sse required in order to set a low stack boundary? > > Couldn't gcc figure out that the existence of a stack variable (SSE, > > alignas, __attribute__((aligned(32))), etc) should force dynamic stack > > alignment? > > Since the x86-86 psABI says that stack must be 16 byte aligned, if the stack > isn't 16-byte aligned, the code with SSE insn, which follows the psABI, > will crash when called with 8-byte aligned stack. I'm confused here. I agree in principle, but I don't actually think that gcc works this way, or, if it does, it shouldn't. If I compile with -mpreferred-stack-boundary=3 and create an aligned(32) local variable, then gcc will dynamically align the stack and the variable will have correct alignment even if the incoming stack was not 16-byte aligned. Shouldn't an SSE variable work exactly the same way? That is, if gcc is generating an SSE instruction with a memory reference to an on-stack variable that requires 16-byte alignment (movdqa, for example), wouldn't that variable be effectively aligned(16) or greater and thus trigger dynamic stack alignment. Sure, the generated SSE code will be less efficient with -mpreferred-stack-boundary=3 (because neither "and $-16,%rsp" nor the required frame pointer is free), but it should still work, right?