Re: [PATCH] gcov: Fix -fprofile-update=atomic

2022-12-16 Thread Sebastian Huber




On 16.12.22 13:09, Richard Biener wrote:

On Fri, Dec 16, 2022 at 11:39 AM Sebastian Huber
  wrote:

On 16.12.22 10:47, Richard Biener wrote:

No, if you select -fprofile-update=atomic, then the updates shall be
atomic from my point of view. If a fallback is acceptable, then you can
use -fprofile-update=prefer-atomic. Using the fallback in
-fprofile-update=atomic is a bug which prevents the use of gcov for
multi-threaded applications on the lower end targets which do not have
atomic operations in hardware.

Ah, OK.  So the fallback there is libatomic calls as well then?  Note
not all targets support libatomic, for those the failure mode is likely
a link error (which might be fine, but we eventually might want to
amend documentation to indicate this failure mode).

It seems these library calls caused issues in the past:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77466

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378

Hmm, those are testsuite-isms in some way but of course
users would run into the same issue, needing explicit
-latomic (where available).  I suppose target specs could
automatically add that for -fprofile-update=atomic but this
would need to be specified at link time as well then.


Why can't we provide libatomic for all targets? Is gthr.h not always 
available?





One option could be to emit calls to a new libgcov function:

__gcov_inc_counter(counter) -> updated value

This function could use a __gthread_mutex_t mutex for updates. This ends
up probably with quite a bad performance.

But that's eventually what libatomic will do as well as fallback.


For libatomic, hosts can implement a protect_start() and protect_end() 
function. On RTEMS, this is implemented like this:


#include 

static inline UWORD
protect_start (void *ptr)
{
  return _Libatomic_Protect_start (ptr);
}

static inline void
protect_end (void *ptr, UWORD isr_level)
{
  _Libatomic_Protect_end (ptr, isr_level);
}

On single-core systems, this is mapped to interrupt disable/enable. This 
is quite efficient (compared to a mutex).




I don't have a good idea here.

Do you have to explicitely link -latomic on RISCV?


For RTEMS, libatomic is always added to the linker command line:

gcc/config/rtems.h:"%{qrtems:--start-group -lrtemsbsp -lrtemscpu 
-latomic -lc -lgcc --end-group}"


For riscv, there seems to be also a linux special case:

gcc/config/riscv/linux.h:  " %{pthread:" LD_AS_NEEDED_OPTION " -latomic 
" LD_NO_AS_NEEDED_OPTION "}"
gcc/config/riscv/linux.h:#define LIB_SPEC GNU_USER_TARGET_LIB_SPEC " 
-latomic "



--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] gcov: Fix -fprofile-update=atomic

2022-12-16 Thread Richard Biener via Gcc-patches
On Fri, Dec 16, 2022 at 11:39 AM Sebastian Huber
 wrote:
>
> On 16.12.22 10:47, Richard Biener wrote:
> >> No, if you select -fprofile-update=atomic, then the updates shall be
> >> atomic from my point of view. If a fallback is acceptable, then you can
> >> use -fprofile-update=prefer-atomic. Using the fallback in
> >> -fprofile-update=atomic is a bug which prevents the use of gcov for
> >> multi-threaded applications on the lower end targets which do not have
> >> atomic operations in hardware.
> > Ah, OK.  So the fallback there is libatomic calls as well then?  Note
> > not all targets support libatomic, for those the failure mode is likely
> > a link error (which might be fine, but we eventually might want to
> > amend documentation to indicate this failure mode).
>
> It seems these library calls caused issues in the past:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77466
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378

Hmm, those are testsuite-isms in some way but of course
users would run into the same issue, needing explicit
-latomic (where available).  I suppose target specs could
automatically add that for -fprofile-update=atomic but this
would need to be specified at link time as well then.

> One option could be to emit calls to a new libgcov function:
>
> __gcov_inc_counter(counter) -> updated value
>
> This function could use a __gthread_mutex_t mutex for updates. This ends
> up probably with quite a bad performance.

But that's eventually what libatomic will do as well as fallback.

I don't have a good idea here.

Do you have to explicitely link -latomic on RISCV?

Richard.

> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] gcov: Fix -fprofile-update=atomic

2022-12-16 Thread Sebastian Huber

On 16.12.22 10:47, Richard Biener wrote:

No, if you select -fprofile-update=atomic, then the updates shall be
atomic from my point of view. If a fallback is acceptable, then you can
use -fprofile-update=prefer-atomic. Using the fallback in
-fprofile-update=atomic is a bug which prevents the use of gcov for
multi-threaded applications on the lower end targets which do not have
atomic operations in hardware.

Ah, OK.  So the fallback there is libatomic calls as well then?  Note
not all targets support libatomic, for those the failure mode is likely
a link error (which might be fine, but we eventually might want to
amend documentation to indicate this failure mode).


It seems these library calls caused issues in the past:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77466

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378

One option could be to emit calls to a new libgcov function:

__gcov_inc_counter(counter) -> updated value

This function could use a __gthread_mutex_t mutex for updates. This ends 
up probably with quite a bad performance.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH] gcov: Fix -fprofile-update=atomic

2022-12-16 Thread Richard Biener via Gcc-patches
On Thu, Dec 15, 2022 at 9:34 AM Sebastian Huber
 wrote:
>
> On 13/12/2022 15:30, Richard Biener wrote:
> > On Fri, Dec 9, 2022 at 2:56 PM Sebastian Huber
> >   wrote:
> >> The code coverage support uses counters to determine which edges in the 
> >> control
> >> flow graph were executed.  If a counter overflows, then the code coverage
> >> information is invalid.  Therefore the counter type should be a 64-bit 
> >> integer.
> >> In multithreaded applications, it is important that the counter increments 
> >> are
> >> atomic.  This is not the case by default.  The user can enable atomic 
> >> counter
> >> increments through the -fprofile-update=atomic and
> >> -fprofile-update=prefer-atomic options.
> >>
> >> If the hardware supports 64-bit atomic operations, then everything is 
> >> fine.  If
> >> not and -fprofile-update=prefer-atomic was chosen by the user, then 
> >> non-atomic
> >> counter increments will be used.  However, if the hardware does not 
> >> support the
> >> required atomic operations and -fprofile-atomic=update was chosen by the 
> >> user,
> >> then a warning was issued and as a forced fall-back to non-atomic 
> >> operations
> >> was done.  This is probably not what a user wants.  There is still 
> >> hardware on
> >> the market which does not have atomic operations and is used for 
> >> multithreaded
> >> applications.  A user which selects -fprofile-update=atomic wants 
> >> consistent
> >> code coverage data and not random data.
> >>
> >> This patch removes the fall-back to non-atomic operations for
> >> -fprofile-update=atomic.  If atomic operations in hardware are not 
> >> available,
> >> then a library call to libatomic is emitted.  To mitigate potential 
> >> performance
> >> issues an optimization for systems which only support 32-bit atomic 
> >> operations
> >> is provided.  Here, the edge counter increments are done like this:
> >>
> >>low = __atomic_add_fetch_4 (, 1, MEMMODEL_RELAXED);
> >>high_inc = low == 0 ? 1 : 0;
> >>__atomic_add_fetch_4 (, high_inc, MEMMODEL_RELAXED);
> > You check for compare_and_swapsi and the old code checks
> > TYPE_PRECISION (gcov_type_node) > 32 to determine whether 8 byte or 4 byte
> > gcov_type is used.  But you do not seem to handle the case where
> > neither SImode nor DImode atomic operations are available?  Can we instead
> > do
> >
> >if (gcov_type_size == 4)
> >  can_support_atomic4
> >= HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
> >else if (gcov_type_size == 8)
> >  can_support_atomic8
> >= HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
> >
> >if (flag_profile_update == PROFILE_UPDATE_ATOMIC
> >&& !can_support_atomic4 && !can_support_atomic8)
> >  {
> >warning (0, "target does not support atomic profile update, "
> > "single mode is selected");
> >flag_profile_update = PROFILE_UPDATE_SINGLE;
> >  }
> >
> > thus retain the diagnostic & fallback for this case?
>
> No, if you select -fprofile-update=atomic, then the updates shall be
> atomic from my point of view. If a fallback is acceptable, then you can
> use -fprofile-update=prefer-atomic. Using the fallback in
> -fprofile-update=atomic is a bug which prevents the use of gcov for
> multi-threaded applications on the lower end targets which do not have
> atomic operations in hardware.

Ah, OK.  So the fallback there is libatomic calls as well then?  Note
not all targets support libatomic, for those the failure mode is likely
a link error (which might be fine, but we eventually might want to
amend documentation to indicate this failure mode).

> > The existing
> > code also suggests
> > there might be targets with HImode or TImode counters?
>
> Sorry, I don't know.

can you conditionalize the split_atomic_increment init on
gcov_type_size == 8 please?

The patch is missing adjusments to the -fprofile-update documentation.
I think the prefer-atomic vs. atomic needs more explanation with this
change.

otherwise the patch looks OK to me.

Thanks,
Richard.

> >
> > In another mail you mentioned that for gen_time_profiler this doesn't
> > work but its
> > code relies on flag_profile_update as well.  So do we need to split the flag
> > somehow, or continue using the PROFILE_UPDATE_SINGLE fallback when
> > we are doing more than just edge profiling?
>
> If atomic operations are not available in hardware, then with this patch
> calls to libatomic are emitted. In gen_time_profiler() this is not an
> issue at all, since the atomic increment is only done if counters[0] ==
> 0 (if I read the code correctly).
>
> For example for
>
> void f(void) {}
>
> we get on riscv:
>
> gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic
>
>  lui a4,%hi(__gcov0.f)
>  li  a3,1
>  addia4,a4,%lo(__gcov0.f)
>  amoadd.w a5,a3,0(a4)
>  lui a4,%hi(__gcov0.f+4)
>  addia5,a5,1
>  seqza5,a5
>  addi

Re: [PATCH] gcov: Fix -fprofile-update=atomic

2022-12-15 Thread Sebastian Huber

On 13/12/2022 15:30, Richard Biener wrote:

On Fri, Dec 9, 2022 at 2:56 PM Sebastian Huber
  wrote:

The code coverage support uses counters to determine which edges in the control
flow graph were executed.  If a counter overflows, then the code coverage
information is invalid.  Therefore the counter type should be a 64-bit integer.
In multithreaded applications, it is important that the counter increments are
atomic.  This is not the case by default.  The user can enable atomic counter
increments through the -fprofile-update=atomic and
-fprofile-update=prefer-atomic options.

If the hardware supports 64-bit atomic operations, then everything is fine.  If
not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
counter increments will be used.  However, if the hardware does not support the
required atomic operations and -fprofile-atomic=update was chosen by the user,
then a warning was issued and as a forced fall-back to non-atomic operations
was done.  This is probably not what a user wants.  There is still hardware on
the market which does not have atomic operations and is used for multithreaded
applications.  A user which selects -fprofile-update=atomic wants consistent
code coverage data and not random data.

This patch removes the fall-back to non-atomic operations for
-fprofile-update=atomic.  If atomic operations in hardware are not available,
then a library call to libatomic is emitted.  To mitigate potential performance
issues an optimization for systems which only support 32-bit atomic operations
is provided.  Here, the edge counter increments are done like this:

   low = __atomic_add_fetch_4 (, 1, MEMMODEL_RELAXED);
   high_inc = low == 0 ? 1 : 0;
   __atomic_add_fetch_4 (, high_inc, MEMMODEL_RELAXED);

You check for compare_and_swapsi and the old code checks
TYPE_PRECISION (gcov_type_node) > 32 to determine whether 8 byte or 4 byte
gcov_type is used.  But you do not seem to handle the case where
neither SImode nor DImode atomic operations are available?  Can we instead
do

   if (gcov_type_size == 4)
 can_support_atomic4
   = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
   else if (gcov_type_size == 8)
 can_support_atomic8
   = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;

   if (flag_profile_update == PROFILE_UPDATE_ATOMIC
   && !can_support_atomic4 && !can_support_atomic8)
 {
   warning (0, "target does not support atomic profile update, "
"single mode is selected");
   flag_profile_update = PROFILE_UPDATE_SINGLE;
 }

thus retain the diagnostic & fallback for this case?  


No, if you select -fprofile-update=atomic, then the updates shall be 
atomic from my point of view. If a fallback is acceptable, then you can 
use -fprofile-update=prefer-atomic. Using the fallback in 
-fprofile-update=atomic is a bug which prevents the use of gcov for 
multi-threaded applications on the lower end targets which do not have 
atomic operations in hardware.



The existing
code also suggests
there might be targets with HImode or TImode counters?


Sorry, I don't know.



In another mail you mentioned that for gen_time_profiler this doesn't
work but its
code relies on flag_profile_update as well.  So do we need to split the flag
somehow, or continue using the PROFILE_UPDATE_SINGLE fallback when
we are doing more than just edge profiling?


If atomic operations are not available in hardware, then with this patch 
calls to libatomic are emitted. In gen_time_profiler() this is not an 
issue at all, since the atomic increment is only done if counters[0] == 
0 (if I read the code correctly).


For example for

void f(void) {}

we get on riscv:

gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic

lui a4,%hi(__gcov0.f)
li  a3,1
addia4,a4,%lo(__gcov0.f)
amoadd.w a5,a3,0(a4)
lui a4,%hi(__gcov0.f+4)
addia5,a5,1
seqza5,a5
addia4,a4,%lo(__gcov0.f+4)
amoadd.w zero,a5,0(a4)
ret

gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -mbig-endian

lui a4,%hi(__gcov0.f+4)
li  a3,1
addia4,a4,%lo(__gcov0.f+4)
amoadd.w a5,a3,0(a4)
lui a4,%hi(__gcov0.f)
addia5,a5,1
seqza5,a5
addia4,a4,%lo(__gcov0.f)
amoadd.w zero,a5,0(a4)
ret

gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv64g 
-mabi=lp64


lui a5,%hi(__gcov0.f)
li  a4,1
addia5,a5,%lo(__gcov0.f)
amoadd.d zero,a4,0(a5)
ret

gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv32id

lui a0,%hi(__gcov0.f)
li  a3,0
li  a1,1
li  a2,0
addia0,a0,%lo(__gcov0.f)
tail__atomic_fetch_add_8

gcc --coverage -O2 -S -o - test.c -fprofile-update=prefer-atomic 
-march=rv32id


lui 

Re: [PATCH] gcov: Fix -fprofile-update=atomic

2022-12-13 Thread Richard Biener via Gcc-patches
On Fri, Dec 9, 2022 at 2:56 PM Sebastian Huber
 wrote:
>
> The code coverage support uses counters to determine which edges in the 
> control
> flow graph were executed.  If a counter overflows, then the code coverage
> information is invalid.  Therefore the counter type should be a 64-bit 
> integer.
> In multithreaded applications, it is important that the counter increments are
> atomic.  This is not the case by default.  The user can enable atomic counter
> increments through the -fprofile-update=atomic and
> -fprofile-update=prefer-atomic options.
>
> If the hardware supports 64-bit atomic operations, then everything is fine.  
> If
> not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
> counter increments will be used.  However, if the hardware does not support 
> the
> required atomic operations and -fprofile-atomic=update was chosen by the user,
> then a warning was issued and as a forced fall-back to non-atomic operations
> was done.  This is probably not what a user wants.  There is still hardware on
> the market which does not have atomic operations and is used for multithreaded
> applications.  A user which selects -fprofile-update=atomic wants consistent
> code coverage data and not random data.
>
> This patch removes the fall-back to non-atomic operations for
> -fprofile-update=atomic.  If atomic operations in hardware are not available,
> then a library call to libatomic is emitted.  To mitigate potential 
> performance
> issues an optimization for systems which only support 32-bit atomic operations
> is provided.  Here, the edge counter increments are done like this:
>
>   low = __atomic_add_fetch_4 (, 1, MEMMODEL_RELAXED);
>   high_inc = low == 0 ? 1 : 0;
>   __atomic_add_fetch_4 (, high_inc, MEMMODEL_RELAXED);

You check for compare_and_swapsi and the old code checks
TYPE_PRECISION (gcov_type_node) > 32 to determine whether 8 byte or 4 byte
gcov_type is used.  But you do not seem to handle the case where
neither SImode nor DImode atomic operations are available?  Can we instead
do

  if (gcov_type_size == 4)
can_support_atomic4
  = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
  else if (gcov_type_size == 8)
can_support_atomic8
  = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;

  if (flag_profile_update == PROFILE_UPDATE_ATOMIC
  && !can_support_atomic4 && !can_support_atomic8)
{
  warning (0, "target does not support atomic profile update, "
   "single mode is selected");
  flag_profile_update = PROFILE_UPDATE_SINGLE;
}

thus retain the diagnostic & fallback for this case?  The existing
code also suggests
there might be targets with HImode or TImode counters?

In another mail you mentioned that for gen_time_profiler this doesn't
work but its
code relies on flag_profile_update as well.  So do we need to split the flag
somehow, or continue using the PROFILE_UPDATE_SINGLE fallback when
we are doing more than just edge profiling?

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-profile.cc (split_atomic_increment): New.
> (gimple_gen_edge_profiler): Split the atomic edge counter increment in
> two 32-bit atomic operations if necessary.
> (tree_profiling): Remove profile update warning and fall-back.  Set
> split_atomic_increment if necessary.
> ---
>  gcc/tree-profile.cc | 81 +
>  1 file changed, 59 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index 2beb49241f2..1d326dde59a 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -73,6 +73,17 @@ static GTY(()) tree ic_tuple_var;
>  static GTY(()) tree ic_tuple_counters_field;
>  static GTY(()) tree ic_tuple_callee_field;
>
> +/* If the user selected atomic profile counter updates
> +   (-fprofile-update=atomic), then the counter updates will be done 
> atomically.
> +   Ideally, this is done through atomic operations in hardware.  If the
> +   hardware supports only 32-bit atomic increments and gcov_type_node is a
> +   64-bit integer type, then for the profile edge counters the increment is
> +   performed through two separate 32-bit atomic increments.  This case is
> +   indicated by the split_atomic_increment variable begin true.  If the
> +   hardware does not support atomic operations at all, then a library call to
> +   libatomic is emitted.  */
> +static bool split_atomic_increment;
> +
>  /* Do initialization work for the edge profiler.  */
>
>  /* Add code:
> @@ -242,30 +253,59 @@ gimple_init_gcov_profiler (void)
>  void
>  gimple_gen_edge_profiler (int edgeno, edge e)
>  {
> -  tree one;
> -
> -  one = build_int_cst (gcov_type_node, 1);
> +  const char *name = "PROF_edge_counter";
> +  tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
> +  tree one = build_int_cst (gcov_type_node, 1);
>
>if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
>  {
> -  /*