https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383

--- Comment #21 from Andy Lutomirski <luto at mit dot edu> ---
(In reply to H.J. Lu from comment #20)
> (In reply to Andy Lutomirski from comment #19)
> > I don't think the fix is correct.
> > 
> > This works:
> > 
> > gcc -mno-sse -mpreferred-stack-boundary=3 ...
> > 
> > This does not:
> > 
> > gcc -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3 ...
> >
> 
> Please provide a testcase.

No code needed:

$ touch foo.c
$ gcc -c -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3
foo.c
foo.c:1:0: error: -mincoming-stack-boundary=3 is not between 4 and 12
$ gcc -c -mno-sse -mpreferred-stack-boundary=3 foo.c

> 
> > This makes no sense, since they should be equivalent.
> > 
> > Also, I find the docs to be unclear as to what different values of the
> > incoming and preferred stack boundaries mean.
> > 
> > Finally, why is -mno-sse required in order to set a low stack boundary? 
> > Couldn't gcc figure out that the existence of a stack variable (SSE,
> > alignas, __attribute__((aligned(32))), etc) should force dynamic stack
> > alignment? 
> 
> Since the x86-86 psABI says that stack must be 16 byte aligned, if the stack
> isn't 16-byte aligned,  the code with SSE insn, which follows the psABI,
> will crash when called with 8-byte aligned stack.

I'm confused here.  I agree in principle, but I don't actually think that gcc
works this way, or, if it does, it shouldn't.

If I compile with -mpreferred-stack-boundary=3 and create an aligned(32) local
variable, then gcc will dynamically align the stack and the variable will have
correct alignment even if the incoming stack was not 16-byte aligned.

Shouldn't an SSE variable work exactly the same way?  That is, if gcc is
generating an SSE instruction with a memory reference to an on-stack variable
that requires 16-byte alignment (movdqa, for example), wouldn't that variable
be effectively aligned(16) or greater and thus trigger dynamic stack alignment.

Sure, the generated SSE code will be less efficient with
-mpreferred-stack-boundary=3 (because neither "and $-16,%rsp" nor the required
frame pointer is free), but it should still work, right?

Reply via email to