https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146
--- Comment #25 from Peter Cordes <peter at cordes dot ca> --- (In reply to CVS Commits from comment #24) > The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>: > > https://gcc.gnu.org/g:04df5e7de2f3dd652a9cddc1c9adfbdf45947ae6 > > commit r11-2909-g04df5e7de2f3dd652a9cddc1c9adfbdf45947ae6 > Author: Jakub Jelinek <ja...@redhat.com> > Date: Thu Aug 27 18:44:40 2020 +0200 > > ia32: Fix alignment of _Atomic fields [PR65146] > > For _Atomic fields, lowering the alignment of long long or double etc. > fields on ia32 is undesirable, because then one really can't perform > atomic > operations on those using cmpxchg8b. Just for the record, the description of this bugfix incorrectly mentioned cmpxchg8b being a problem. lock cmpxchg8b is *always* atomic, even if that means the CPU has to take a bus lock (disastrously expensive affecting all cores system-wide) instead of just delaying MESI response for one line exclusively owned in this core's private cache (aka cache lock). The correctness problem is __atomic_load_n / __atomic_store_n compiling to actual 8-byte pure loads / pure stores using SSE2 movq, SSE1 movlps, or x87 fild/fistp (bouncing through the stack), such as movq %xmm0, (%eax) That's where correctness depends on Intel and AMD's atomicity guarantees which are conditional on alignment. (And if AVX is supported, same deal for 16-byte load/store. Although we can and should use movaps for that, which bakes alignment checking into the instruction. Intel did recently document that CPUs with AVX guarantee atomicity of 16-byte aligned loads/stores, retroactive to all CPUs with AVX. It's about time, but yay.) > Not sure about iamcu_alignment change, I know next to nothing about IA > MCU, > but unless it doesn't have cmpxchg8b instruction, it would surprise me > if we > don't want to do it as well. I had to google iamcu. Apparently it's Pentium-like, but only has soft-FP (so I assume no MMX or SSE as well as no x87). If that leaves it no way to do 8-byte load/store except (lock) cmpxchg8b, that may mean there's no need for alignment, unless cache-line-split lock is still a performance issue. If it's guaranteed unicore as well, we can even omit the lock prefix and cmpxchg8b will still be an atomic RMW (or load or store) wrt. interrupts. (And being unicore would likely mean much less system-wide overhead for a split lock.)