> > > > Really appreciate for your detailed explanation. BTW, My previous patch > > for PGO build on exchange2 takes this similar method by setting each cloned > > node to 1/10th of the frequency several month agao :) > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546926.html
I was looking again into the patch and I think we could adjust it for mainline. I wanted to discuss this with Martin Jambor (added to CC) after he returns from vacation. I think the idea of scaling all copies by number of copies is a reasonable thing to do (as discussed in earlier mail) since generally we do not have information about where the hot recursion levels lie. By master theorem it is easy to see that this is actually very subtle problem and our profile info is not precise enough to be useful here. Making all copies even is thus a good conservative solution. I did not quite understand this point earlier, so the patch did not seem to make much sense to me and I saw it as an odd exchange only hack. I apologize for that. What I think is wrong about the patch is that the scale is not set accoridng to number of actual copies produced, but according to the --param controlling maximal recursion depth (since with exchage we manage to meet it). We do not need to cap recursion depth so low as patch does (as shown by the factorial example that seems easy enough so we should handle it eventualy by default). I think we need to work out how to scale by actual number of clones. Problem is that we clone one by one and each time we update the profile. So we do not really know how many times we will clone at that time. I was thinking of simply keeping the chain of recursively cloned nodes and each time rescale profile of all of them so it stays the same. It would cause some roundoff issues and also will affect the ipa-cp decisions itself since it will use the intermediate values. This is probably not to bad, but perhaps Martin may have better idea. Honza > > Does it seem likely that we'll reach a resolution on this soon? > I take the point that the patch that introduced the exchange regression > [https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551757.html] > was just uncovering a latent issue, but still, this is a large regression > in an important benchmark to be carrying around. For those of us doing > regular benchmark CI, the longer the performance trough gets, the harder > it is to spot other unrelated regressions in the “properly optimised” code. > > So if we don't have a specific plan for fixing the regression soon, > I think we should consider reverting the patch until we have something > that avoids the exchange regression, even though the patch wasn't > technically wrong. The problem here is that IRA needs to be lied to to get good results. One direction of attack is to ignore this problem, get IPA-CP to work by default and convince inliner to not inline on large loop depths. This seems to get good performance on x86-64, I did not test ARM. We may fix IRA issue, but Vladimir does not seem to know what to do according to the PR log. We may lie to compiler again and produce wrong profile. diff --git a/gcc/predict.def b/gcc/predict.def index 2d8d4958b6d..d8e502152b8 100644 --- a/gcc/predict.def +++ b/gcc/predict.def @@ -177,7 +177,7 @@ DEF_PREDICTOR (PRED_LOOP_GUARD, "loop guard", HITRATE (73), 0) /* Same but for loops containing recursion. */ DEF_PREDICTOR (PRED_LOOP_GUARD_WITH_RECURSION, "loop guard with recursion", - HITRATE (85), 0) + HITRATE (95), 0) /* The following predictors are used in Fortran. */ Gets perofrmance back for me. I do not really like this alternative. One option I would still like to look into is that IRA seems to work better with profile feedback than with currently guessed profile. It work even better with the wrongly guessed profile with the patch above, but perhaps we could fix profile estimate somewhere else to get profile more realistic and IRA happier. Honza > > Thanks, > Richard