Hi Kim,

Thanks for the feedback. I'll make these changes.


Is there a reason to conditionalize use of __atomic_add_fetch for
clang and continue to use __sync_add_and_fetch for gcc, rather than
using __atomic_add_fetch unconditionally?

I think I did it that way because I didn't want to affect the build with gcc, but actually gcc compiles this to the same instructions with either variant so we can get rid of the #ifdef.


Is __ATOMIC_RELEASE and a following FULL_MEM_BARRIER the right usage
to get the equivalent of __sync_add_and_fetch?  Or should
__ATOMIC_SEQ_CST be used?  Or should the latter be actively avoided?
I remember some discussion and questions in that area, about how to
implement memory_order_seq_cst on ARM processors, but wasn't paying
close attention since I don't deal with ARM much.


It generates the same instruction sequence as the existing __sync_fetch_and_add.

The __sync_XXX builtins have a full barrier at the end [1], so GCC compiles this to:

.L2:
        ldxr    w2, [x1]
        add     w2, w2, w0
        stlxr   w3, w2, [x1]    ; Store release
        cbnz    w3, .L2
        dmb     ish             ; Full barrier
        mov     w0, w2
        ret

Whereas using the __atomic builtin with __ATOMIC_SEQ_CST we get two half-barriers like this:

.L2:
        ldaxr   w2, [x1]        ; Load acquire
        add     w2, w2, w0
        stlxr   w3, w2, [x1]    ; Store release
        cbnz    w3, .L2
        mov     w0, w2
        ret


Thanks,
Nick


[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html

Reply via email to