On 05/30/2016 03:16 AM, Richard Biener wrote:

Ok, but the placement (and number of) threading passes then no longer depends
on DOM/VRP passes - and as you placed the threading passes _before_ those
passes the threading itself does not benefit from DOM/VRP but only from
previous optimization passes.
Right. Note that number of passes now is actually the same as we had before, they're just occurring outside DOM/VRP.

The backwards threader's only dependency on DOM/VRP was to propagate constants into PHI nodes and to propagate away copies. That dependency was removed.



I see this as opportunity to remove some of them ;)  I now see in the main
optimization pipeline

      NEXT_PASS (pass_fre);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);

position makes sense - FRE removed redundancies and fully copy/constant
propagated the IL.

      NEXT_PASS (pass_sra);
      /* The dom pass will also resolve all __builtin_constant_p calls
         that are still there to 0.  This has to be done after some
         propagations have already run, but before some more dead code
         is removed, and this place fits nicely.  Remember this when
         trying to move or duplicate pass_dominator somewhere earlier.  */
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);

this position OTOH doesn't make much sense as IL cleanup is missing
after SRA and previous opts.  After loop we now have
We should look at this one closely. The backwards threader doesn't depend on IL cleanups. It should do its job regardless of the state of the IL.



      NEXT_PASS (pass_tracer);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
      NEXT_PASS (pass_strlen);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);

I don't think we want two threadings so close together.  It makes some sense
to have a threading _after_ DOM but before VRP (DOM cleaned up the IL).
That one is, IMHO, the least useful. I haven't done any significant analysis of this specific instance though to be sure. The step you saw was meant to largely preserve behavior. Further cleanups are definitely possible.

The most common case I've seen where the DOM/VRP make transformations that then expose something useful to the backward threader come from those pesky context sensitive equivalences.

We (primarily Andrew, but Aldy and myself are also involved) are looking at ways to more generally expose range information created for these situations. Exposing range information and getting it more precise by allowing "unnecessary copies" or some such would eliminate those cases where DOM/VRP expose new opportunities for the backwards jump threader.





So that would leave two from your four passes and expose the opportunity
to re-add one during early-opts, also after FRE.  That one should be
throttled down to operate in "-Os" mode though.
I'll take a look at them, but with some personal stuff and PTO it'll likely be a few weeks before I've got anything useful.


So, can you see what removing the two threading passes that don't make
sense to me do to your statistics?  And check whether a -Os-like threading
can be done early?
Interesting you should mention doing threading early -- that was one of the secondary motivations behind getting the backwards threading bits out into their own pass, I just failed to mention it.

Essentially we want to limit the backwards substitution to single step within a single block for that case (which is trivially easy). That would allow us to run a very cheap threader during early optimizations.

Jeff

Reply via email to