Hi, I have been comparing the stock gcc 5.2 and the Linaro 5.2 (Linaro GCC 5.2-2015.11-1) and have noticed a difference with the __sync intrinsics.
Here is the simple test case --- cut here --- int add_int(int add_value, int *dest) { return __sync_add_and_fetch(dest, add_value); } --- cut here --- Compiling with the stock gcc 5.2 (-S -O3) I get --------- add_int: .L2: ldaxr w2, [x1] add w2, w2, w0 stlxr w3, w2, [x1] cbnz w3, .L2 mov w0, w2 ret --------- Wheras with Linaro gcc 5.2 I get --------- add_int: .L2: ldxr w2, [x1] add w2, w2, w0 stlxr w3, w2, [x1] cbnz w3, .L2 dmb ish mov w0, w2 ret --------- Why the extra (unnecessary?) memory barrier? Also, is it worthwhile putting a prfm before the ldaxr. EG add_int: prfm pst1strm, [x1] .L2: ldaxr w2, [x1] See the following thread http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html All the best, Ed _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain