On Thu, 4 Aug 2022, Aldy Hernandez wrote:

> On Wed, Aug 3, 2022 at 11:53 AM Richard Biener <rguent...@suse.de> wrote:
> >
> > I've tried to understand how the greedy search works seeing the
> > bitmap dances and the split into resolve_phi.  I've summarized
> > the intent of the algorithm as
> >
> >       // For further greedy searching we want to remove interesting
> >       // names defined in BB but add ones on the PHI edges for the
> >       // respective edges.
> >
> > but the implementation differs in detail.  In particular when
> > there is more than one interesting PHI in BB it seems to only consider
> > the first for translating defs across edges.  It also only applies
> > the loop crossing restriction when there is an interesting PHI.
> > I've also noticed that while the set of interesting names is rolled
> > back, m_imports just keeps growing - is that a bug?
> 
> I've never quite liked how I was handling PHIs.  I had some
> improvements in this space, especially the problem with only
> considering the first def across a PHI, but I ran out of time last
> cycle, and we were already into stage3.
> 
> The loop crossing restriction I inherited from the original
> implementation, though I suppose I restricted it further by only
> looking at interesting PHIs.  In practice I don't think it mattered,
> because we cap loop crossing violations in cancel_invalid_paths in the
> registry.  Does anything break if you keep the restriction across the
> board?

Probably not - I've also spotted all these "late" invalidates all over
the place, some of them should be done earlier to limit the exponential
search.  I plan to go find them and divide them into things that can
be checked locally per BB (good to do at greedy search time) and those
that need the full path (we should simply not register such path).

I'll keep this as is for now.

> Good spotting on the imports growing.  Instinctively that seems like a bug.
> 
> [As a suggestion, check the total threadable paths as a sanity check
> if you make any changes in this space.  What I've been doing is
> counting "Jumps threaded" in the *.stat dump file across the .ii files
> in a bootstrap, and making sure you're not getting the same # of
> threads because we missed one in the backward threader which then got
> picked up by DOM.  I divide up the threads by ethread, thread,
> threadfull, DOM, etc, to make sure I'm not just shuffling threading
> opportunities around.]

I'll do this experiment and will roll this fix into the patch if it
works out.

> But yeah, we shouldn't be growing imports unnecessarily, if nothing
> else because of the bitmap explosion you're noticing in other places.
> In practice, I'm not so sure it slows the solver itself down:
> 
> // They are hints for the solver.  Adding more elements doesn't slow
> // us down, because we don't solve anything that doesn't appear in the
> // path.  On the other hand, not having enough imports will limit what
> // we can solve.
> 
> But that comment may be old ;-).

I hoped so, yes.

> >
> > The following preserves the loop crossing restriction to the case
> > of interesting PHIs but merges resolve_phi back, changing interesting
> > as outlined with the intent above.  It should get more threading
> > cases when there are multiple interesting PHI defs in a block.
> > It might be a bit faster due to less bitmap operations but in the
> > end the main intent was to make what happens more obvious.
> 
> Sweet!
> 
> >
> > Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
> >
> > Aldy - you wrote the existing implementation, is the following OK?
> 
> I'm tickled pink you're cleaning and improving this.  Looks good.

OK, I'll re-test and push after doing the m_imports experiment.

Thanks,
Richard.

Reply via email to