> On 1/21/21 9:25 AM, Richard Biener wrote:
> > On Wed, Jan 20, 2021 at 5:25 PM Martin Liška <mli...@suse.cz> wrote:
> > > 
> > > On 1/20/21 5:00 PM, Jan Hubicka wrote:
> > > > There are two thinks that I would like to discuss first
> > > >    1) I think the option is stil used for value profiling of divisors
> > > 
> > > It's not. Right now the only usage lives in get_nth_most_common_value 
> > > which
> > > is an entry point being used for stringops, indirect call and divmod 
> > > histograms.
> > > 
> > > >    2) it is not clear to me how the new counter establishes
> > > >    reproducibility for indiect calls that have more than 32 targets 
> > > > during
> > > >    the train run.
> > > 
> > > We cannot ensure, but I would say that it's very unlikely to happen.
> > > In case of Firefox, there are definitely other reasons why the build is 
> > > not reproducible.
> > > I would expect arc counters to differ in between multiple training runs.
> > > 
> > > If it's really a problem we can come up with other approaches:
> > > - GCOV run-time control over # of tracked values (32 right now)
> > > - similarly we can save more values in .gcda files
> > > 
> > > I'm sending updated version of the patch.
> > 
> > So the discussion tells me that we want the option and of course want to 
> > have
> > it work.
> 
> Yep, I've just sent patch for that.
> 
> > In the patch I see the =multithreaded enum part was not documented
> > (I don't see how it differs from =parallel-runs?), so I wonder if we really 
> > need
> > three states.
> 
> It's a reserved option value Honza though will be useful for the future (:

Not exactly - I intended it to work already in gcc10 as follows.

 - With =serial one can trust all counters coming from gcda file
   (I looked again in details to gcc10 implementation and I think the
   handling of sign bit is correct, contrary to my previous claim)
 - With =paralel-runs we can use the sign bit trick to signalize that
   merging was lossy
 - With =multithreaded we can only use the counter if sum of individual
   targets matches the total number of executions (so we know no target
   got lost).

Basically =serial means that you get reproducible profile only if the
events (profile counter invocation, profile streaming) come in precisely
same order in both train runs. (Such as profiledbootstrap running with
make -j1)

With =parallel-run you get reproducible profile under the assumption
that train run consist of multiple invocations (or does forking), each
invocation is reproducible but streaming happens in random order
(profiledbootstrap with make -j16).

With =multithreaded you get reproducible profile if the profile
counter invocations match in both train runs, but they can happen in any
order (profiledbootstrap with make -j16  once we make gcc multithreaded,
or build of clang with FDO).

> 
> > 
> > That said, the option handling is indeed broken at the moment.  While
> > the implementation is not perfect, does it have some pieces that help?
> > 
> > That said, why not simply fix option handling by adding the missing =
> > to the option?  Using -fprofile-reproducibleserial etc. work but before
> > adding backwards compatibility aliases I'd say we change w/o them
> > for GCC 11 and only if there are complaints introduce them (and eventually
> > backport the option handling fix to GCC 10).
> 
> Yes, I would like to backport the option fix for GCC 10. But apparently, 
> there's
> nobody using the option.

I think easy way to get users of this option is to make profile not
reproducible by default and modify packages to use right reproducibility
option when reproducible builds are intended.  It is not feature that
comes for free and I think most users of PGO does not care, so I think
it should be opt in.

In general getting profile reroducible one needs to make train
reproducible that is hard when you look at details (such as /tmp/ file
name generation issue in gcc) and may lead to need for user to annotate
such code.

This will become more common problem for multithreaded profiles where
one needs to annotate locking and busy waiting loops in them for example
(or the scheduler responsible for executing paralle tasks).

I can see this to be practically achievable but we probably want to
produce some guidelines for doing that and probably teach gcov-tool to
compare profiles and say to which degree they match (i.e. which
functions match for each of levels of reproducibility).

The problem is that profiles are continuous and the errors too, but
optimizaitons looks for certain thresholds, so small errors may lead to
code changes, so I think our current method of looking at relatively few
packages and patching errors when they appear is not very good long term
strategy... Especially if it makes us to drop useful transformations by
default with -fprofile-use and no additional option.

Honza
> 
> Martin
> 
> > 
> > Richard.
> > 
> > > Martin
> 

Reply via email to