https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146

--- Comment #25 from Peter Cordes <peter at cordes dot ca> ---
(In reply to CVS Commits from comment #24)
> The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:04df5e7de2f3dd652a9cddc1c9adfbdf45947ae6
> 
> commit r11-2909-g04df5e7de2f3dd652a9cddc1c9adfbdf45947ae6
> Author: Jakub Jelinek <ja...@redhat.com>
> Date:   Thu Aug 27 18:44:40 2020 +0200
> 
>     ia32: Fix alignment of _Atomic fields [PR65146]
>     
>     For _Atomic fields, lowering the alignment of long long or double etc.
>     fields on ia32 is undesirable, because then one really can't perform
> atomic
>     operations on those using cmpxchg8b.


Just for the record, the description of this bugfix incorrectly mentioned
cmpxchg8b being a problem.  lock cmpxchg8b is *always* atomic, even if that
means the CPU has to take a bus lock (disastrously expensive affecting all
cores system-wide) instead of just delaying MESI response for one line
exclusively owned in this core's private cache (aka cache lock).

The correctness problem is __atomic_load_n / __atomic_store_n compiling to
actual 8-byte pure loads / pure stores using SSE2 movq, SSE1 movlps, or x87
fild/fistp (bouncing through the stack), such as

  movq  %xmm0, (%eax)

That's where correctness depends on Intel and AMD's atomicity guarantees which
are conditional on alignment.

(And if AVX is supported, same deal for 16-byte load/store.  Although we can
and should use movaps for that, which bakes alignment checking into the
instruction.  Intel did recently document that CPUs with AVX guarantee
atomicity of 16-byte aligned loads/stores, retroactive to all CPUs with AVX. 
It's about time, but yay.)

>     Not sure about iamcu_alignment change, I know next to nothing about IA
> MCU,
>     but unless it doesn't have cmpxchg8b instruction, it would surprise me
> if we
>     don't want to do it as well.


I had to google iamcu.  Apparently it's Pentium-like, but only has soft-FP (so
I assume no MMX or SSE as well as no x87).

If that leaves it no way to do 8-byte load/store except (lock) cmpxchg8b, that
may mean there's no need for alignment, unless cache-line-split lock is still a
performance issue.  If it's guaranteed unicore as well, we can even omit the
lock prefix and cmpxchg8b will still be an atomic RMW (or load or store) wrt.
interrupts.  (And being unicore would likely mean much less system-wide
overhead for a split lock.)

Reply via email to