Hi,

I have been comparing the stock gcc 5.2 and the Linaro 5.2 (Linaro GCC
5.2-2015.11-1) and have noticed a difference with the __sync
intrinsics.

Here is the simple test case

--- cut here ---
int add_int(int add_value, int *dest)
{
  return __sync_add_and_fetch(dest, add_value);
}
--- cut here ---

Compiling with the stock gcc 5.2 (-S -O3) I get

---------
add_int:
.L2:
        ldaxr   w2, [x1]
        add     w2, w2, w0
        stlxr   w3, w2, [x1]
        cbnz    w3, .L2
        mov     w0, w2
        ret
---------

Wheras with Linaro gcc 5.2 I get

---------
add_int:
.L2:
        ldxr    w2, [x1]
        add     w2, w2, w0
        stlxr   w3, w2, [x1]
        cbnz    w3, .L2
        dmb     ish
        mov     w0, w2
        ret
---------

Why the extra (unnecessary?) memory barrier?

Also, is it worthwhile putting a prfm before the ldaxr. EG

add_int:
        prfm   pst1strm, [x1]
.L2:
       ldaxr   w2, [x1]

See the following thread

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html

All the best,
Ed
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to