Hi Revital,

Revital Eres <revital.e...@linaro.org> writes:
> The attached patch adds register pressure estimation of the partial schedule.

My main comment is that we shouldn't need to track separate liveness
sets for each loop here, since we're only looking at one basic block.
I.e., rather than operate on the per-loop LOOP_DATA (loop)->regs_{ref,live},
we should be able to use a single pair of bitmaps.

Also, the code goes to a lot of trouble over this case:

+  /* Add to the set of out live regs all the registers defined in bb
+     which have uses outside of it (those registers where eliminated in
+     the above calculation).  Eliminate from this set the definitions
+     that exist in the epilog and with no uses inside the basic-block
+     as these definitions will be eliminated from the bb and thus should
+     not be considered for estimating register pressure in the bb.  */

But how often does it occur in practice?  It's not necessarily the case
that the instruction will be eliminated, because things like volatility
might require us to keep it.  It's probably more accurate to say that we
can treat these as unused defs.

There's an argument to say that we should only consider registers
that are used in the loop.  If the pressure is high because of
registers that are live across the loop but not used within it,
then it's reasonable to force code outside the loop to spill some
of those.  That would suggest starting with the intersection of
DR_LR_OUT and DF_LR_BB_INFO (bb)->use.  Starting with that set
also has the advantage of handling the above case for free.

(This occurs often in our friend the popular embedded benchmark, which
often has a single function of the form:

  A: ...set up...
  B: for (i = 0; i < num_runs; i++)
  C:   ...benchmark...
  D: ...record time...

Some values are live from A->D, but those values shouldn't affect
an SMSable loop somewhere in C.)

We talked earlier about making the main pressure-estimation code
process the loop twice, but I see instead you've gone for two
separate passes, one to calculate LR out, then the main pass.
I think with the changes above, running the same loop twice is
going to be easier and no less efficient.  We could even add
code to skip the second iteration if it would start with the
same lr_out as the first iteration.

Richard

Reply via email to