> >
> > Really appreciate for your detailed explanation.  BTW, My previous patch
> > for PGO build on exchange2 takes this similar method by setting each cloned
> > node to 1/10th of the frequency several month agao :)
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546926.html

I was looking again into the patch and I think we could adjust it for
mainline.  I wanted to discuss this with Martin Jambor (added to CC)
after he returns from vacation.

I think the idea of scaling all copies by number of copies is a
reasonable thing to do (as discussed in earlier mail) since generally we
do not have information about where the hot recursion levels lie.
By master theorem it is easy to see that this is actually very subtle
problem and our profile info is not precise enough to be useful here.
Making all copies even is thus a good conservative solution.  I did not
quite understand this point earlier, so the patch did not seem to make
much sense to me and I saw it as an odd exchange only hack. I apologize
for that.

What I think is wrong about the patch is that the scale is not set
accoridng to number of actual copies produced, but according to the
--param controlling maximal recursion depth (since with exchage we
manage to meet it).  We do not need to cap recursion depth so low as
patch does (as shown by the factorial example that seems easy enough so
we should handle it eventualy by default). 

I think we need to work out how to scale by actual number of clones.
Problem is that we clone one by one and each time we update the
profile.  So we do not really know how many times we will clone at that
time.  I was thinking of simply keeping the chain of recursively cloned
nodes and each time rescale profile of all of them so it stays the same.
It would cause some roundoff issues and also will affect the ipa-cp
decisions itself since it will use the intermediate values.
This is probably not to bad, but perhaps Martin may have better idea.

Honza
> 
> Does it seem likely that we'll reach a resolution on this soon?
> I take the point that the patch that introduced the exchange regression
> [https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551757.html]
> was just uncovering a latent issue, but still, this is a large regression
> in an important benchmark to be carrying around.  For those of us doing
> regular benchmark CI, the longer the performance trough gets, the harder
> it is to spot other unrelated regressions in the “properly optimised” code.
> 
> So if we don't have a specific plan for fixing the regression soon,
> I think we should consider reverting the patch until we have something
> that avoids the exchange regression, even though the patch wasn't
> technically wrong.

The problem here is that IRA needs to be lied to to get good results.
One direction of attack is to ignore this problem, get IPA-CP to work by
default and convince inliner to not inline on large loop depths. This
seems to get good performance on x86-64, I did not test ARM.

We may fix IRA issue, but Vladimir does not seem to know what to do
according to the PR log.

We may lie to compiler again and produce wrong profile.

diff --git a/gcc/predict.def b/gcc/predict.def
index 2d8d4958b6d..d8e502152b8 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -177,7 +177,7 @@ DEF_PREDICTOR (PRED_LOOP_GUARD, "loop guard", HITRATE (73), 
0)
 
 /* Same but for loops containing recursion.  */
 DEF_PREDICTOR (PRED_LOOP_GUARD_WITH_RECURSION, "loop guard with recursion",
-              HITRATE (85), 0)
+              HITRATE (95), 0)
 
 /* The following predictors are used in Fortran. */
 
Gets perofrmance back for me.
I do not really like this alternative.  One option I would still like to
look into is that IRA seems to work better with profile feedback than
with currently guessed profile. It work even better with the wrongly
guessed profile with the patch above, but perhaps we could fix profile
estimate somewhere else to get profile more realistic and IRA happier.

Honza
> 
> Thanks,
> Richard

Reply via email to