[Added PR number and updated patches] On Aarch64, the __sync builtins are implemented using the __atomic operations and barriers. This makes the the __sync builtins inconsistent with their documentation which requires stronger barriers than those for the __atomic builtins.
The difference between __sync and __atomic builtins is that the restrictions imposed by a __sync operation's barrier apply to all memory references while the restrictions of an __atomic operation's barrier only need to apply to a subset. This affects Aarch64 in particular because, although its implementation of __atomic builtins is correct, the barriers generated are too weak for the __sync builtins. The affected __sync builtins are the __sync_fetch_and_op (and __sync_op_and_fetch) functions, __sync_compare_and_swap and __sync_lock_test_and_set. This and a following patch modifies the code generated for these functions to weaken initial load-acquires to a simple load and to add a final fence to prevent code-hoisting. The last patch will add tests for the code generated by the Aarch64 backend for the __sync builtins. - Full barriers: __sync_fetch_and_op, __sync_op_and_fetch __sync_*_compare_and_swap [load-acquire; code; store-release] becomes [load; code ; store-release; fence]. - Acquire barriers: __sync_lock_test_and_set [load-acquire; code; store] becomes [load; code; store; fence] The code generated for release barriers and for the __atomic builtins is unchanged. This patch changes the code generated for __sync_fetch_and_<op> and __sync_<op>_and_fetch builtins. Tested with check-gcc for aarch64-none-linux-gnu. Ok for trunk? Matthew gcc/ 2015-05-22 Matthew Wahab <matthew.wa...@arm.com> PR target/65697 * config/aarch64/aarch64.c (aarch64_emit_post_barrier): New. (aarch64_split_atomic_op): Check for __sync memory models, emit appropriate initial and final barriers.
From 5a2d546359f78cd3f304a62617f0fc385664374e Mon Sep 17 00:00:00 2001 From: Matthew Wahab <matthew.wa...@arm.com> Date: Fri, 15 May 2015 09:26:28 +0100 Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin. Change-Id: I3342a572d672163ffc703e4e51603744680334fc --- gcc/config/aarch64/aarch64.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 8c25d75..182dbad 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9407,6 +9407,22 @@ aarch64_expand_compare_and_swap (rtx operands[]) emit_insn (gen_rtx_SET (bval, x)); } +/* Emit a post-operation barrier. */ + +static void +aarch64_emit_post_barrier (enum memmodel model) +{ + const enum memmodel base_model = memmodel_base (model); + + if (is_mm_sync (model) + && (base_model == MEMMODEL_ACQUIRE + || base_model == MEMMODEL_ACQ_REL + || base_model == MEMMODEL_SEQ_CST)) + { + emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST))); + } +} + /* Split a compare and swap pattern. */ void @@ -9469,12 +9485,20 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, { machine_mode mode = GET_MODE (mem); machine_mode wmode = (mode == DImode ? DImode : SImode); + const enum memmodel model = memmodel_from_int (INTVAL (model_rtx)); + const bool is_sync = is_mm_sync (model); + rtx load_model_rtx = model_rtx; rtx_code_label *label; rtx x; label = gen_label_rtx (); emit_label (label); + /* A __sync operation will emit a final fence to stop code hoisting, so the + load can be relaxed. */ + if (is_sync) + load_model_rtx = GEN_INT (MEMMODEL_RELAXED); + if (new_out) new_out = gen_lowpart (wmode, new_out); if (old_out) @@ -9483,7 +9507,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, old_out = new_out; value = simplify_gen_subreg (wmode, value, mode, 0); - aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx); + aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx); switch (code) { @@ -9519,6 +9543,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, label), pc_rtx); aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + + /* Emit any fence needed for a __sync operation. */ + if (is_sync) + aarch64_emit_post_barrier (model); } static void -- 1.9.1