On 12/2/20 7:57 PM, Jason Merrill wrote: > On 12/1/20 1:28 PM, Bernd Edlinger wrote: >> On 11/24/20 11:10 PM, Jason Merrill wrote: >>> On 11/22/20 3:05 AM, Bernd Edlinger wrote: >>>> Hi, >>>> >>>> this avoids the need to use -fno-threadsafe-statics on >>>> arm-none-eabi or working around that problem by supplying >>>> a dummy __sync_synchronize function which might >>>> just lead to silent code failure of the worst kind >>>> (non-reproducable, racy) at runtime, as was pointed out >>>> on previous discussions here. >>>> >>>> When the atomic access involves a call to __sync_synchronize >>>> it is better to call __cxa_guard_acquire unconditionally, >>>> since it handles the atomics too, or is a non-threaded >>>> implementation when there is no gthread support for this target. >>>> >>>> This fixes also a bug for the ARM EABI big-endian target, >>>> that is, previously the wrong bit was checked. >>> >>> Instead of a new target macro, can't you follow >>> fold_builtin_atomic_always_lock_free/can_atomic_load_p? >> >> Yes, thanks, that should work too. >> Would you like this better? > >> +is_atomic_expensive_p (machine_mode mode) >> +{ >> + if (!flag_inline_atomics) >> + return false; > > Why not true? >
Ooops... Yes, I ought to return true here. I must have made a mistake when I tested the last version of this patch, sorry for the confusion. >> + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) >> + return false; > > This also seems backwards; I'd think we want to return false if either of > those tests are true. Or maybe just can_atomic_load_p, and not bother about > compare-and-swap. > Yes, you are right. Unfortuately can_atomic_load_p is too weak, since it does not cover the memory barrier. And can_compare_and_swap_p (..., false) is actually a bit too strong, but if it returns true, we should be able to use any atomic without need for a library call. >> + tree type = targetm.cxx.guard_mask_bit () >> + ? TREE_TYPE (guard) : char_type_node; >> + >> + if (is_atomic_expensive_p (TYPE_MODE (type))) >> + guard = integer_zero_node; >> + else >> + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); > > It should still work to load a single byte, it just needs to be the > least-significant byte. And this isn't an EABI issue; it looks like the > non-EABI code is also broken for big-endian targets, both the atomic load and > the normal load in get_guard_bits. > I think the non-EABI code is always using bit 0 in the first byte, by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1). Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise. For all other targets when _GLIBCXX_USE_FUTEX is defined, __cxa_guard_XXX accesses the value as int* while the memory is a 64-bit long, so I could imagine that is an aliasing violation. But nothing that needs to be fixed immediately. Attached is the corrected patch. Tested again on arm-none-eabi with arm-sim. Is it OK for trunk? Thanks Bernd. > Jason >
From c1a6dcaa906113cba0dc88e36460a65aba35ec38 Mon Sep 17 00:00:00 2001 From: Bernd Edlinger <bernd.edlin...@hotmail.de> Date: Tue, 1 Dec 2020 18:54:48 +0100 Subject: [PATCH] Avoid atomic for guard acquire when that is expensive When the atomic access involves a call to __sync_synchronize it is better to call __cxa_guard_acquire unconditionally, since it handles the atomics too, or is a non-threaded implementation when there is no gthread support for this target. This fixes also a bug for the ARM EABI big-endian target, that is, previously the wrong bit was checked. 2020-11-22 Bernd Edlinger <bernd.edlin...@hotmail.de> * decl2.c: (is_atomic_expensive_p): New helper function. (build_atomic_load_byte): Rename to... (build_atomic_load_type): ... and add new parameter type. (get_guard_cond): Skip the atomic here if that is expensive. Use the correct type for the atomic load on certain targets. --- gcc/cp/decl2.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 1bc7b7e..c7ddcc9 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. If not see #include "intl.h" #include "c-family/c-ada-spec.h" #include "asan.h" +#include "optabs-query.h" /* Id for dumping the raw trees. */ int raw_dump_id; @@ -3297,18 +3298,34 @@ get_guard (tree decl) return guard; } +/* Returns true if accessing the GUARD atomic is expensive, + i.e. involves a call to __sync_synchronize or similar. + In this case let __cxa_guard_acquire handle the atomics. */ + +static bool +is_atomic_expensive_p (machine_mode mode) +{ + if (!flag_inline_atomics) + return true; + + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) + return true; + + return false; +} + /* Return an atomic load of src with the appropriate memory model. */ static tree -build_atomic_load_byte (tree src, HOST_WIDE_INT model) +build_atomic_load_type (tree src, HOST_WIDE_INT model, tree type) { - tree ptr_type = build_pointer_type (char_type_node); + tree ptr_type = build_pointer_type (type); tree mem_model = build_int_cst (integer_type_node, model); tree t, addr, val; unsigned int size; int fncode; - size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node)); + size = tree_to_uhwi (TYPE_SIZE_UNIT (type)); fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1; t = builtin_decl_implicit ((enum built_in_function) fncode); @@ -3351,7 +3368,15 @@ get_guard_cond (tree guard, bool thread_safe) if (!thread_safe) guard = get_guard_bits (guard); else - guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE); + { + tree type = targetm.cxx.guard_mask_bit () + ? TREE_TYPE (guard) : char_type_node; + + if (is_atomic_expensive_p (TYPE_MODE (type))) + guard = integer_zero_node; + else + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); + } /* Mask off all but the low bit. */ if (targetm.cxx.guard_mask_bit ()) -- 1.9.1