On 05/30/2016 03:16 AM, Richard Biener wrote:
Ok, but the placement (and number of) threading passes then no longer depends
on DOM/VRP passes - and as you placed the threading passes _before_ those
passes the threading itself does not benefit from DOM/VRP but only from
previous optimization passes.
Right. Note that number of passes now is actually the same as we had
before, they're just occurring outside DOM/VRP.
The backwards threader's only dependency on DOM/VRP was to propagate
constants into PHI nodes and to propagate away copies. That dependency
was removed.
I see this as opportunity to remove some of them ;) I now see in the main
optimization pipeline
NEXT_PASS (pass_fre);
NEXT_PASS (pass_thread_jumps);
NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
position makes sense - FRE removed redundancies and fully copy/constant
propagated the IL.
NEXT_PASS (pass_sra);
/* The dom pass will also resolve all __builtin_constant_p calls
that are still there to 0. This has to be done after some
propagations have already run, but before some more dead code
is removed, and this place fits nicely. Remember this when
trying to move or duplicate pass_dominator somewhere earlier. */
NEXT_PASS (pass_thread_jumps);
NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
this position OTOH doesn't make much sense as IL cleanup is missing
after SRA and previous opts. After loop we now have
We should look at this one closely. The backwards threader doesn't
depend on IL cleanups. It should do its job regardless of the state of
the IL.
NEXT_PASS (pass_tracer);
NEXT_PASS (pass_thread_jumps);
NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
NEXT_PASS (pass_strlen);
NEXT_PASS (pass_thread_jumps);
NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
I don't think we want two threadings so close together. It makes some sense
to have a threading _after_ DOM but before VRP (DOM cleaned up the IL).
That one is, IMHO, the least useful. I haven't done any significant
analysis of this specific instance though to be sure. The step you saw
was meant to largely preserve behavior. Further cleanups are definitely
possible.
The most common case I've seen where the DOM/VRP make transformations
that then expose something useful to the backward threader come from
those pesky context sensitive equivalences.
We (primarily Andrew, but Aldy and myself are also involved) are looking
at ways to more generally expose range information created for these
situations. Exposing range information and getting it more precise by
allowing "unnecessary copies" or some such would eliminate those cases
where DOM/VRP expose new opportunities for the backwards jump threader.
So that would leave two from your four passes and expose the opportunity
to re-add one during early-opts, also after FRE. That one should be
throttled down to operate in "-Os" mode though.
I'll take a look at them, but with some personal stuff and PTO it'll
likely be a few weeks before I've got anything useful.
So, can you see what removing the two threading passes that don't make
sense to me do to your statistics? And check whether a -Os-like threading
can be done early?
Interesting you should mention doing threading early -- that was one of
the secondary motivations behind getting the backwards threading bits
out into their own pass, I just failed to mention it.
Essentially we want to limit the backwards substitution to single step
within a single block for that case (which is trivially easy). That
would allow us to run a very cheap threader during early optimizations.
Jeff