On 10/20/22 13:01, Andrea Parri wrote:
On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote:
     +Andrea, in case he has time to look at the memory model / ABI
     issues.
+Jeff, who was offering to help when the threads got crossed.  I'd punted on
a lot of this in the hope Andrea could help out, as I'm not really a memory
model guy and this is pretty far down the rabbit hole.  Happy to have the
help if you're offering, though, as what's there is likely a pretty big
performance issue for anyone with a reasonable memory system.
Thanks for linking me to the discussion and the remarks, Palmer.  I'm
happy to help (and synchronized with Jeff/the community) as possible,
building a better understanding of the 'issues' at stake.
Summarizing here some findings from looking at the currently-implemented
and the proposed [1] mappings:

   - Current mapping is missing synchronization, notably

        atomic_compare_exchange_weak_explicit(-, -, -,
                                              memory_order_release,
                                              memory_order_relaxed);

     is unable to provide the (required) release ordering guarantees; for
     reference, I've reported a litmus test illustrating it at the bottom
     of this email, cf. c-cmpxchg.

   - [1] addressed the "memory_order_release" problem/bug mentioned above
     (as well as other quirks of the current mapping I won't detail here),
     but it doesn't address other problems present in the current mapping;
     in particular, both mappings translate the following

        atomic_compare_exchange_weak_explicit(-, -, -,
                                              memory_order_acquire,
                                              memory_order_relaxed);

     to a sequence

        lr.w
        bne
        sc.w.aq

     (withouth any other synchronization/fences), which contrasts with the
     the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional
     Instructions":

       "Software should not set the 'rl' bit on an LR instruction unless
       the 'aq' bit is also set, nor should software set the 'aq' bit on
       an SC instruction unless the 'rl' bit is also set.  LR.rl and SC.aq
       instructions are not guaranteed to provide any stronger ordering
       than those with both bits clear [...]"

So it sounds like Christoph's patch is an improvement, but isn't complete.  Given the pain in this space, I'd be hesitant to put in an incomplete fix, even if it moves things in the right direction as it creates another compatibility headache if we don't get the complete solution in place for gcc-13.


Christoph, thoughts on the case Andrea pointed out?


Jeff


Reply via email to