On 1/20/21 11:39 PM, Jan Hubicka wrote:
On 1/20/21 5:00 PM, Jan Hubicka wrote:
There are two thinks that I would like to discuss first
   1) I think the option is stil used for value profiling of divisors

It's not. Right now the only usage lives in get_nth_most_common_value which
is an entry point being used for stringops, indirect call and divmod histograms.

I see, you replaced the implementation here as well (not only for
indirect calls).

The code path these 2 is shared for some ttime.

I am attaching testcase that demonstrates the
reproducibility issue on divmod tracking.

Script cmd compiles t.c and trains it twice with same inputs but
different order.  On GCC10 in tumbleweed I get:

run 1
run 2
         cmpl    $257, %esi

So the first build does not specialize mod function for 257 while second
does.

cmd11 script does the same for trunk:

run 1
         cmpl    $257, %esi
run 2

This also made me to notice that the following is currently broken:

hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build3/gcc$ ./xgcc -B ./ -O2 
-fprofile-generate t.c
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build3/gcc$ ./a.out 10 10
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build3/gcc$ ./xgcc -B ./ -O2 
-fprofile-use t.c -S

I know one can create a counter example :) Anyway, I'm suggesting the following 
patch to restore
the behavior.

Plus I'm planning to send one more patch that will ignore time profile when 
-fprofile-reproduce != serial.

int
__attribute__((noipa))
foo()
{
  return 2;
}

int
__attribute__((noipa))
bar()
{
  return 3;
}

int main(int argc, char **argv)
{
  if (argc == 1)
  {
    foo();
    bar();
  }
  else
  {
    bar();
    foo();
  }

  return 0;
}

$ rm *gcda ; gcc foo.c -fprofile-generate -O2 && ./a.out && gcc foo.c -fprofile-use -O2 
&& nm a.out -n | egrep "foo|bar"
0000000000401140 T foo
0000000000401150 T bar

$ rm *gcda ; gcc foo.c -fprofile-generate -O2 && ./a.out 1 && gcc foo.c -fprofile-use -O2 
&& nm a.out -n | egrep "foo|bar"
0000000000401140 T bar
0000000000401150 T foo


This is becuase we now call the dump file a-t.gcda instead of t.gcda
that seems unintentional change that probably can break some user
scripts.

This one is related to auxiliary file renaming done by Alexander Oliva.


I think this testcase is not so unrealistic especially in the context of
value profiling for divmod or stringops size.

   2) it is not clear to me how the new counter establishes
   reproducibility for indiect calls that have more than 32 targets during
   the train run.

We cannot ensure, but I would say that it's very unlikely to happen.
In case of Firefox, there are definitely other reasons why the build is not 
reproducible.
I would expect arc counters to differ in between multiple training runs.

Even in the context of indirect call profiling, I do not think the
situation is limited to firefox.   With virtual function calls and large
object hiearchy one can end up with many possible targets quite easily.
The code pattern seems quite sane here.  Basic datastructures in firefox
has refcounting and have virutal call to decrement the count.  Also
there is get_id method associated with almost everything and it is what
we get currently wrong (the get_id benchmarks).  I will try to debug
why.

While with tracking 32 values instead of 4 we made the problem less
frequent (at the cost of adding new issues with malloc recursion),
reproducibility is still not guaranteed.

If it's really a problem we can come up with other approaches:
- GCOV run-time control over # of tracked values (32 right now)
- similarly we can save more values in .gcda files

For divmod profiling you can easilly have very large number of values,
so I do not think we can count on fact that it is always possible to
track them all.

In general I see the point of reproducible builds, but it is also fragile
feature. If we commit to it (which I think we did), we should support it
carefully and not just patch around most frequent issues.

I would expect that relatively soon we will see need for reproducible
builds of multthreaded programs (such as when building clang, or if we
add threads to gcc). In this case the 32bit targets are going to be even
less useful.  I think we can only accept the profile if no value got
dropped.

Yes. Can you please explain the point with 32bit targets?

I'm testing the following patch, is it fine after testing?
Thanks,
Martin


Honza

I'm sending updated version of the patch.
Martin

 From ffbd2fe90dd3bfceecfeefb679572ca2f6b5f333 Mon Sep 17 00:00:00 2001
From: Martin Liska <mli...@suse.cz>
Date: Tue, 19 Jan 2021 15:43:31 +0100
Subject: [PATCH] Drop profile reproducibility flag as it is not used.

gcc/ChangeLog:

        PR gcov-profile/98739
        * common.opt: Drop -fprofile-reproducibility flag.
        * coretypes.h (enum profile_reproducibility): Drop the enum.
        * doc/invoke.texi: Remove documentation.
        * value-prof.c (get_nth_most_common_value): Drop usage of
        flag_profile_reproducible.
---
  gcc/common.opt      | 16 ++--------------
  gcc/coretypes.h     |  7 -------
  gcc/doc/invoke.texi | 30 ------------------------------
  gcc/value-prof.c    | 15 +++------------
  4 files changed, 5 insertions(+), 63 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index bde1711870d..863e9681aea 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2236,21 +2236,9 @@ fprofile-exclude-files=
  Common Joined RejectNegative Var(flag_profile_exclude_files)
  Instrument only functions from files whose name does not match any of the 
regular expressions (separated by semi-colons).
-Enum
-Name(profile_reproducibility) Type(enum profile_reproducibility) 
UnknownError(unknown profile reproducibility method %qs)
-
-EnumValue
-Enum(profile_reproducibility) String(serial) 
Value(PROFILE_REPRODUCIBILITY_SERIAL)
-
-EnumValue
-Enum(profile_reproducibility) String(parallel-runs) 
Value(PROFILE_REPRODUCIBILITY_PARALLEL_RUNS)
-
-EnumValue
-Enum(profile_reproducibility) String(multithreaded) 
Value(PROFILE_REPRODUCIBILITY_MULTITHREADED)
-
  fprofile-reproducible
-Common Joined RejectNegative Var(flag_profile_reproducible) 
Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL)
--fprofile-reproducible=[serial|parallel-runs|multithreaded]    Control level 
of reproducibility of profile gathered by -fprofile-generate.
+Common Ignore Joined RejectNegative
+Does nothing. Preserved for backward compatibility.
Enum
  Name(profile_update) Type(enum profile_update) UnknownError(unknown profile 
update method %qs)
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 406572e947d..3a67061a483 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -212,13 +212,6 @@ enum profile_update {
    PROFILE_UPDATE_PREFER_ATOMIC
  };
-/* Type of profile reproducibility methods. */
-enum profile_reproducibility {
-    PROFILE_REPRODUCIBILITY_SERIAL,
-    PROFILE_REPRODUCIBILITY_PARALLEL_RUNS,
-    PROFILE_REPRODUCIBILITY_MULTITHREADED
-};
-
  /* Type of -fstack-protector-*.  */
  enum stack_protector {
    SPCT_FLAG_DEFAULT = 1,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5f4a06625eb..c66ba2a62ad 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -582,7 +582,6 @@ Objective-C and Objective-C++ Dialects}.
  -fprofile-note=@var{path} -fprofile-prefix-path=@var{path} @gol
  -fprofile-update=@var{method} -fprofile-filter-files=@var{regex} @gol
  -fprofile-exclude-files=@var{regex} @gol
--fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]} 
@gol
  -fsanitize=@var{style}  -fsanitize-recover  -fsanitize-recover=@var{style} 
@gol
  -fasan-shadow-offset=@var{number}  -fsanitize-sections=@var{s1},@var{s2},... 
@gol
  -fsanitize-undefined-trap-on-error  -fbounds-check @gol
@@ -14640,35 +14639,6 @@ any of the regular expressions (separated by 
semi-colons).
  For example, @option{-fprofile-exclude-files=/usr/.*} will prevent 
instrumentation
  of all files that are located in the @file{/usr/} folder.
-@item -fprofile-reproducible=@r{[}multithreaded@r{|}parallel-runs@r{|}serial@r{]}
-@opindex fprofile-reproducible
-Control level of reproducibility of profile gathered by
-@code{-fprofile-generate}.  This makes it possible to rebuild program
-with same outcome which is useful, for example, for distribution
-packages.
-
-With @option{-fprofile-reproducible=serial} the profile gathered by
-@option{-fprofile-generate} is reproducible provided the trained program
-behaves the same at each invocation of the train run, it is not
-multi-threaded and profile data streaming is always done in the same
-order.  Note that profile streaming happens at the end of program run but
-also before @code{fork} function is invoked.
-
-Note that it is quite common that execution counts of some part of
-programs depends, for example, on length of temporary file names or
-memory space randomization (that may affect hash-table collision rate).
-Such non-reproducible part of programs may be annotated by
-@code{no_instrument_function} function attribute. @command{gcov-dump} with
-@option{-l} can be used to dump gathered data and verify that they are
-indeed reproducible.
-
-With @option{-fprofile-reproducible=parallel-runs} collected profile
-stays reproducible regardless the order of streaming of the data into
-gcda files.  This setting makes it possible to run multiple instances of
-instrumented program in parallel (such as with @code{make -j}). This
-reduces quality of gathered data, in particular of indirect call
-profiling.
-
  @item -fsanitize=address
  @opindex fsanitize=address
  Enable AddressSanitizer, a fast memory error detector.
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 4c916f4994f..1b174006941 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -745,13 +745,10 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, 
profile_probability prob,
Counters have the following meaning. - abs (counters[0]) is the number of executions
+   counters[0] is the number of executions
     for i in 0 ... TOPN-1
       counters[2 * i + 1] is target
-     abs (counters[2 * i + 2]) is corresponding hitrate counter.
-
-   Value of counters[0] negative when counter became
-   full during merging and some values are lost.  */
+     counters[2 * i + 2] is corresponding hitrate counter.  */
bool
  get_nth_most_common_value (gimple *stmt, const char *counter_type,
@@ -765,17 +762,11 @@ get_nth_most_common_value (gimple *stmt, const char 
*counter_type,
    *count = 0;
    *value = 0;
- gcov_type read_all = abs_hwi (hist->hvalue.counters[0]);
+  gcov_type read_all = hist->hvalue.counters[0];
gcov_type v = hist->hvalue.counters[2 * n + 2];
    gcov_type c = hist->hvalue.counters[2 * n + 3];
- if (hist->hvalue.counters[0] < 0
-      && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS
-         || (flag_profile_reproducible
-             == PROFILE_REPRODUCIBILITY_MULTITHREADED)))
-    return false;
-
    /* Indirect calls can't be verified.  */
    if (stmt
        && check_counter (stmt, counter_type, &c, &read_all,
--
2.30.0



>From 22bbf5342f2b73fad6c0286541bba6699c617380 Mon Sep 17 00:00:00 2001
From: Martin Liska <mli...@suse.cz>
Date: Thu, 21 Jan 2021 09:02:31 +0100
Subject: [PATCH 1/2] Restore -fprofile-reproducibility flag.

gcc/ChangeLog:

	PR gcov-profile/98739
	* common.opt: Add missing equal symbol.
	* value-prof.c (get_nth_most_common_value): Fix comment
	and skip TOP N counter properly when -fprofile-reproducibility
	is not serial.
---
 gcc/common.opt   |  2 +-
 gcc/value-prof.c | 18 ++++++++----------
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index bde1711870d..a8a2b67a99d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2248,7 +2248,7 @@ Enum(profile_reproducibility) String(parallel-runs) Value(PROFILE_REPRODUCIBILIT
 EnumValue
 Enum(profile_reproducibility) String(multithreaded) Value(PROFILE_REPRODUCIBILITY_MULTITHREADED)
 
-fprofile-reproducible
+fprofile-reproducible=
 Common Joined RejectNegative Var(flag_profile_reproducible) Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL)
 -fprofile-reproducible=[serial|parallel-runs|multithreaded]	Control level of reproducibility of profile gathered by -fprofile-generate.
 
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 4c916f4994f..fafe9d8d0f1 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -747,11 +747,8 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, profile_probability prob,
 
    abs (counters[0]) is the number of executions
    for i in 0 ... TOPN-1
-     counters[2 * i + 1] is target
-     abs (counters[2 * i + 2]) is corresponding hitrate counter.
-
-   Value of counters[0] negative when counter became
-   full during merging and some values are lost.  */
+     counters[2 * i + 2] is target
+     counters[2 * i + 3] is corresponding hitrate counter.  */
 
 bool
 get_nth_most_common_value (gimple *stmt, const char *counter_type,
@@ -765,15 +762,16 @@ get_nth_most_common_value (gimple *stmt, const char *counter_type,
   *count = 0;
   *value = 0;
 
-  gcov_type read_all = abs_hwi (hist->hvalue.counters[0]);
+  gcov_type read_all = hist->hvalue.counters[0];
+  gcov_type covered = 0;
+  for (unsigned i = 0; i < counters; ++i)
+    covered += hist->hvalue.counters[2 * i + 3];
 
   gcov_type v = hist->hvalue.counters[2 * n + 2];
   gcov_type c = hist->hvalue.counters[2 * n + 3];
 
-  if (hist->hvalue.counters[0] < 0
-      && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS
-	  || (flag_profile_reproducible
-	      == PROFILE_REPRODUCIBILITY_MULTITHREADED)))
+  if (read_all != covered
+      && flag_profile_reproducible != PROFILE_REPRODUCIBILITY_SERIAL)
     return false;
 
   /* Indirect calls can't be verified.  */
-- 
2.30.0

Reply via email to