https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #91 from Peter Cordes <peter at cordes dot ca> ---
This bug should be closed as "resolved fixed".  The "fix" was to change the ABI
doc and break existing hand-written asm, and old binaries.  This was
intentional and resulted in some pain, but at this point it's a done deal.

----

My attempt at a summary of the current state of affairs for 32-bit x86 calling
conventions (on Linux and elsewhere):

Yes, the version of the i386 System V ABI used on Linux really did change
between gcc2.8 and gcc8.  Those compilers are not ABI-compatible with each
other.  This is a known fact.  Hand-written asm that makes function calls with
misaligned stack pointers is violating the (updated) ABI, and was also
knowingly broken by this change.


(Perhaps unintentionally at first, with stack alignment intended to just
provide a performance benefit, not a correctness issue.  But the resolution
ended up being to standardize on 16-byte alignment matching x86-64 System V.  
Instead of reverting to the old ABI and breaking compat with new binaries that
had started to rely on 16-byte incoming alignment, or to add significant
overhead to every function that didn't know how both its caller and callee were
compiled, i.e. most functions.  Using MOVUPS instead of MOVAPS everywhere
wouldn't work well because it would mean no folding of memory operands into ALU
instructions: without AVX's VEX encoding,  paddd xmm0, [mem] requires aligned
mem.  And existing binaries that rely on incoming 16-byte alignment weren't
doing that.)


An earlier comment also mentioned common arrays: the ABI also requires arrays
larger than 16 bytes to have 16-byte alignment.

----

Perhaps unnecessary pain for little real benefit: i386 on Linux has been mostly
obsolete for a long time, and the inefficient stack-args calling convention was
never changed.  It's ironic that Linux broke ABI compat for i386 in the name of
more efficient SSE-usage despite not caring to introduce anything like Windows
fastcall or vectorcall (efficient register-args calling conventions).

(GCC does have ABI-changing -mregparm=3 and -msseregparm to pass integers in
regs, and pass/return FP values in XMM registers (instead of passing on the
stack / returning in x87 st0).  But no distros have switched over to using that
calling convention for i386 binaries, AFAIK.  The Linux kernel does use regparm
for 32-bit kernel builds.)

Even more ironic, probably a lot of 32-bit code is compiled without -msse2
(because one of the main reasons for using 32-bit code is CPUs too old for
x86-64, which is about the same vintage as SSE2).  SSE usage can still happen
with runtime dispatching in binaries that are compatible with old machines
while still being able to take advantage of new ones.


But in most cases, if you want performance you use x86-64 kernel + user-space,
or maybe x32 user-space (ILP32 in 64-bit mode) to get modern calling
conventions and the benefit of twice as many registers.  x86-64 System V has
mandated 16-byte stack alignment from the start.  (I don't know the history,
but perhaps i386 code-gen started assuming / depending on it for correctness,
not just performance, by accident because of devs being used to x86-64?)

The 32-bit ABI on some other OSes, including i386 *BSD and 32-bit Windows, has
*not* changed; presumably gcc there doesn't rely on incoming stack alignment. 
(It might try to propagate 16-byte alignment for performance benefits, though.)

My understanding is that i386 MacOS still uses a version of i386 System V that
doesn't include the 16-byte stack alignment update, like other *BSDs.


(In reply to Harald van Dijk from comment #90)
> compile
> 
>   void exit(int);
>   int main(void) { exit(0); }
> 
> with GCC 2.8, compile current glibc with GCC 8, and there will be a segfault
> in glibc's __run_exit_handlers because GCC 2.8 never kept the stack
> 16-byte-aligned, but GCC 8 does now generate code which assumes it.
>
> For the moment, I've rebuilt glibc with -mincoming-stack-boundary=2 to handle 
> the problem well enough for my current needs, but it's not a complete 
> solution.

Yes, you need workarounds like this to change modern GCC's ABI back to legacy
4-byte.

Note that you might break atomicity of C11 _Atomic 8-byte objects even outside
structs by doing this, if they split across a cache line (Intel) or possibly
narrower (AMD) boundary.  But only if they were stack allocated.

Reply via email to