The code coverage support uses counters to determine which edges in the control
flow graph were executed.  If a counter overflows, then the code coverage
information is invalid.  Therefore the counter type should be a 64-bit integer.
In multi-threaded applications, it is important that the counter increments are
atomic.  This is not the case by default.  The user can enable atomic counter
increments through the -fprofile-update=atomic and
-fprofile-update=prefer-atomic options.

If the target supports 64-bit atomic operations, then everything is fine.  If
not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
counter increments will be used.  However, if the target does not support the
required atomic operations and -fprofile-atomic=update was chosen by the user,
then a warning was issued and as a forced fallback to non-atomic operations was
done.  This is probably not what a user wants.  There is still hardware on the
market which does not have atomic operations and is used for multi-threaded
applications.  A user which selects -fprofile-update=atomic wants consistent
code coverage data and not random data.

This patch removes the fallback to non-atomic operations for
-fprofile-update=atomic the target platform supports libatomic.  To
mitigate potential performance issues an optimization for systems which
only support 32-bit atomic operations is provided.  Here, the edge
counter increments are done like this:

  low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED);
  high_inc = low == 0 ? 1 : 0;
  __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED);

In gimple_gen_time_profiler() this split operation cannot be used, since the
updated counter value is also required.  Here, a library call is emitted.  This
is not a performance issue since the update is only done if counters[0] == 0.

gcc/c-family/ChangeLog:

        * c-cppbuiltin.cc (c_cpp_builtins):  Define
        __LIBGCC_HAVE_LIBATOMIC for libgcov.

gcc/ChangeLog:

        * doc/invoke.texi (-fprofile-update): Clarify default method.  Document
        the atomic method behaviour.
        * tree-profile.cc (enum counter_update_method): New.
        (counter_update): Likewise.
        (gen_counter_update): Use counter_update_method.  Split the
        atomic counter update in two 32-bit atomic operations if
        necessary.
        (tree_profiling): Select counter_update_method.

libgcc/ChangeLog:

        * libgcov.h (GCOV_SUPPORTS_ATOMIC): Always define it.
        Set it also to 1, if __LIBGCC_HAVE_LIBATOMIC is defined.
---
 gcc/c-family/c-cppbuiltin.cc |  2 +
 gcc/doc/invoke.texi          | 19 ++++++-
 gcc/tree-profile.cc          | 99 +++++++++++++++++++++++++++++++++---
 libgcc/libgcov.h             | 10 ++--
 4 files changed, 114 insertions(+), 16 deletions(-)

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index cdf9850cb19e..e8576737fafb 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1538,6 +1538,8 @@ c_cpp_builtins (cpp_reader *pfile)
       /* For libgcov.  */
       builtin_define_with_int_value ("__LIBGCC_VTABLE_USES_DESCRIPTORS__",
                                     TARGET_VTABLE_USES_DESCRIPTORS);
+      builtin_define_with_int_value ("__LIBGCC_HAVE_LIBATOMIC",
+                                    TARGET_HAVE_LIBATOMIC);
     }
 
   /* For use in assembly language.  */
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index de40f62e219c..8fe3c86ad419 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16603,7 +16603,24 @@ while the second one prevents profile corruption by 
emitting thread-safe code.
 Using @samp{prefer-atomic} would be transformed either to @samp{atomic},
 when supported by a target, or to @samp{single} otherwise.  The GCC driver
 automatically selects @samp{prefer-atomic} when @option{-pthread}
-is present in the command line.
+is present in the command line, otherwise the default method is @samp{single}.
+
+If @samp{atomic} is selected, then the profile information is updated using
+atomic operations on a best-effort basis.  Ideally, the profile information is
+updated through atomic operations in hardware.  If the target platform does not
+support the required atomic operations in hardware, however, @file{libatomic}
+is available, then the profile information is updated through calls to
+@file{libatomic}.  If the target platform neither supports the required atomic
+operations in hardware nor @file{libatomic}, then the profile information is
+not atomically updated and a warning is issued.  In this case, the obtained
+profiling information may be corrupt for multi-threaded applications.
+
+For performance reasons, if 64-bit counters are used for the profiling
+information and the target platform only supports 32-bit atomic operations in
+hardware, then the performance critical profiling updates are done using two
+32-bit atomic operations for each counter update.  If a signal interrupts these
+two operations updating a counter, then the profiling information may be in an
+inconsistent state.
 
 @opindex fprofile-filter-files
 @item -fprofile-filter-files=@var{regex}
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index 24805ff905c7..12255f06f992 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -73,6 +73,41 @@ static GTY(()) tree ic_tuple_var;
 static GTY(()) tree ic_tuple_counters_field;
 static GTY(()) tree ic_tuple_callee_field;
 
+/* Types of counter update methods.
+
+   By default, the counter updates are done for a single threaded system
+   (COUNTER_UPDATE_SINGLE_THREAD).
+
+   If the user selected atomic profile counter updates
+   (-fprofile-update=atomic), then the counter updates will be done atomically
+   on a best-effort basis.  One of three methods to do the counter updates is
+   selected according to the target capabilities.
+
+   Ideally, the counter updates are done through atomic operations in hardware
+   (COUNTER_UPDATE_ATOMIC_BUILTIN).
+
+   If the target supports only 32-bit atomic increments and gcov_type_node is a
+   64-bit integer type, then for the profile edge counters the increment is
+   performed through two separate 32-bit atomic increments
+   (COUNTER_UPDATE_ATOMIC_SPLIT or COUNTER_UPDATE_ATOMIC_PARTIAL).  If the
+   target supports libatomic (TARGET_HAVE_LIBATOMIC), then other counter
+   updates are carried out by libatomic calls (COUNTER_UPDATE_ATOMIC_SPLIT).
+   If the target does not support libatomic, then the other counter updates are
+   not done atomically (COUNTER_UPDATE_ATOMIC_PARTIAL) and a warning is
+   issued.
+
+   If the target does not support atomic operations in hardware, however,  it
+   supports libatomic, then all updates are carried out by libatomic calls
+   (COUNTER_UPDATE_ATOMIC_BUILTIN).  */
+enum counter_update_method {
+  COUNTER_UPDATE_SINGLE_THREAD,
+  COUNTER_UPDATE_ATOMIC_BUILTIN,
+  COUNTER_UPDATE_ATOMIC_SPLIT,
+  COUNTER_UPDATE_ATOMIC_PARTIAL
+};
+
+static counter_update_method counter_update = COUNTER_UPDATE_SINGLE_THREAD;
+
 /* Do initialization work for the edge profiler.  */
 
 /* Add code:
@@ -269,7 +304,8 @@ gen_counter_update (gimple_stmt_iterator *gsi, tree 
counter, tree result,
   tree one = build_int_cst (type, 1);
   tree relaxed = build_int_cst (integer_type_node, MEMMODEL_RELAXED);
 
-  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+  if (counter_update == COUNTER_UPDATE_ATOMIC_BUILTIN ||
+      (result && counter_update == COUNTER_UPDATE_ATOMIC_SPLIT))
     {
       /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
       tree f = builtin_decl_explicit (TYPE_PRECISION (type) > 32
@@ -278,6 +314,38 @@ gen_counter_update (gimple_stmt_iterator *gsi, tree 
counter, tree result,
       gcall *call = gimple_build_call (f, 3, addr, one, relaxed);
       gen_assign_counter_update (gsi, call, f, result, name);
     }
+  else if (!result && (counter_update == COUNTER_UPDATE_ATOMIC_SPLIT ||
+                      counter_update == COUNTER_UPDATE_ATOMIC_PARTIAL))
+    {
+      /* low = __atomic_add_fetch_4 (addr, 1, MEMMODEL_RELAXED);
+        high_inc = low == 0 ? 1 : 0;
+        __atomic_add_fetch_4 (addr_high, high_inc, MEMMODEL_RELAXED); */
+      tree zero32 = build_zero_cst (uint32_type_node);
+      tree one32 = build_one_cst (uint32_type_node);
+      tree addr_high = make_temp_ssa_name (TREE_TYPE (addr), NULL, name);
+      tree four = build_int_cst (size_type_node, 4);
+      gassign *assign1 = gimple_build_assign (addr_high, POINTER_PLUS_EXPR,
+                                             addr, four);
+      gsi_insert_after (gsi, assign1, GSI_NEW_STMT);
+      if (WORDS_BIG_ENDIAN)
+       std::swap (addr, addr_high);
+      tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4);
+      gcall *call1 = gimple_build_call (f, 3, addr, one, relaxed);
+      tree low = make_temp_ssa_name (uint32_type_node, NULL, name);
+      gimple_call_set_lhs (call1, low);
+      gsi_insert_after (gsi, call1, GSI_NEW_STMT);
+      tree is_zero = make_temp_ssa_name (boolean_type_node, NULL, name);
+      gassign *assign2 = gimple_build_assign (is_zero, EQ_EXPR, low,
+                                             zero32);
+      gsi_insert_after (gsi, assign2, GSI_NEW_STMT);
+      tree high_inc = make_temp_ssa_name (uint32_type_node, NULL, name);
+      gassign *assign3 = gimple_build_assign (high_inc, COND_EXPR,
+                                             is_zero, one32, zero32);
+      gsi_insert_after (gsi, assign3, GSI_NEW_STMT);
+      gcall *call2 = gimple_build_call (f, 3, addr_high, high_inc,
+                                       relaxed);
+      gsi_insert_after (gsi, call2, GSI_NEW_STMT);
+    }
   else
     {
       tree tmp1 = make_temp_ssa_name (type, NULL, name);
@@ -689,15 +757,20 @@ tree_profiling (void)
   struct cgraph_node *node;
 
   /* Verify whether we can utilize atomic update operations.  */
-  bool can_support_atomic = false;
+  bool can_support_atomic = TARGET_HAVE_LIBATOMIC;
   unsigned HOST_WIDE_INT gcov_type_size
     = tree_to_uhwi (TYPE_SIZE_UNIT (get_gcov_type ()));
-  if (gcov_type_size == 4)
-    can_support_atomic
-      = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
-  else if (gcov_type_size == 8)
-    can_support_atomic
-      = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+  bool have_atomic_4
+    = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
+  bool have_atomic_8
+    = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+  if (!can_support_atomic)
+    {
+      if (gcov_type_size == 4)
+       can_support_atomic = have_atomic_4;
+      else if (gcov_type_size == 8)
+       can_support_atomic = have_atomic_8;
+    }
 
   if (flag_profile_update == PROFILE_UPDATE_ATOMIC
       && !can_support_atomic)
@@ -710,6 +783,16 @@ tree_profiling (void)
     flag_profile_update = can_support_atomic
       ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE;
 
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+    {
+      if (gcov_type_size == 8 && !have_atomic_8 && have_atomic_4)
+       counter_update = COUNTER_UPDATE_ATOMIC_SPLIT;
+      else
+       counter_update = COUNTER_UPDATE_ATOMIC_BUILTIN;
+    }
+  else if (gcov_type_size == 8 && have_atomic_4)
+      counter_update = COUNTER_UPDATE_ATOMIC_PARTIAL;
+
   /* This is a small-ipa pass that gets called only once, from
      cgraphunit.cc:ipa_passes().  */
   gcc_assert (symtab->state == IPA_SSA);
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 763118ea5b52..d04c070d0cfa 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -95,18 +95,14 @@ typedef unsigned gcov_type_unsigned __attribute__ ((mode 
(QI)));
 #define GCOV_LOCKED_WITH_LOCKING 0
 #endif
 
-#ifndef GCOV_SUPPORTS_ATOMIC
 /* Detect whether target can support atomic update of profilers.  */
-#if __SIZEOF_LONG_LONG__ == 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
-#define GCOV_SUPPORTS_ATOMIC 1
-#else
-#if __SIZEOF_LONG_LONG__ == 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
+#if (__SIZEOF_LONG_LONG__ == 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) || \
+    (__SIZEOF_LONG_LONG__ == 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8) || \
+    defined (__LIBGCC_HAVE_LIBATOMIC)
 #define GCOV_SUPPORTS_ATOMIC 1
 #else
 #define GCOV_SUPPORTS_ATOMIC 0
 #endif
-#endif
-#endif
 
 /* In libgcov we need these functions to be extern, so prefix them with
    __gcov.  In libgcov they must also be hidden so that the instance in
-- 
2.35.3

Reply via email to