This is the ARM version of the patches to strengthen memory barriers for the __sync builtins on ARMv8 targets (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).
The problem is that the barriers generated for the __sync builtins for ARMv8 targets are too weak. This affects the full and the acquire barriers in the __sync fetch-and-op, compare-and-swap functions and __sync_lock_test_and_set. This patch series changes the code to strengthen the barriers by replacing initial load-acquires with a simple load and adding a final memory barrier to prevent code hoisting. - Full barriers: __sync_fetch_and_op, __sync_op_and_fetch __sync_*_compare_and_swap [load-acquire; code; store-release] becomes [load; code ; store-release; barrier]. - Acquire barriers: __sync_lock_test_and_set [load-acquire; code; store] becomes [load; code; store; barrier] This patch changes the code generated for __sync_fetch_and_<op> and __sync_<op>_and_fetch builtins. Tested as part of a series for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/ 2015-06-22 Matthew Wahab <matthew.wa...@arm.com> PR Target/65697 * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an initial acquire barrier with a final full barrier.
From 3e9f71c04dba20ba66b5c9bae284fcac5fdd91ec Mon Sep 17 00:00:00 2001 From: Matthew Wahab <matthew.wa...@arm.com> Date: Fri, 22 May 2015 13:31:58 +0100 Subject: [PATCH 1/3] [ARM] Strengthen barriers for sync-fetch-op builtin. Change-Id: I18f5af5ba4b2e74b5866009d3a090e251eff4a45 --- gcc/config/arm/arm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e79a369..94118f4 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -27668,6 +27668,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, rtx_code_label *label; rtx x; + bool is_armv8_sync = arm_arch8 && is_mm_sync (model); + bool use_acquire = TARGET_HAVE_LDACQ && !(is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release (model)); @@ -27676,6 +27678,11 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, && !(is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire (model)); + /* For ARMv8, a load-acquire is too weak for __sync memory orders. Instead, + a full barrier is emitted after the store-release. */ + if (is_armv8_sync) + use_acquire = false; + /* Checks whether a barrier is needed and emits one accordingly. */ if (!(use_acquire || use_release)) arm_pre_atomic_barrier (model); @@ -27746,7 +27753,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, emit_unlikely_jump (gen_cbranchsi4 (x, cond, const0_rtx, label)); /* Checks whether a barrier is needed and emits one accordingly. */ - if (!(use_acquire || use_release)) + if (is_armv8_sync + || !(use_acquire || use_release)) arm_post_atomic_barrier (model); } -- 1.9.1