On 08/30/2016 06:31 AM, Michael Matz wrote:
Hi,

On Fri, 26 Aug 2016, Bernd Schmidt wrote:

And that comment puzzles me. Surely prologue and epilogue are executed only
once currently, so how does frequency come into it? Again - please provide an
example.

int some_global;
int foo (void) {
  if (!some_global) {
    call_this();
    call_that();
    x = some + large / computation;
    while (x--) { much_more_code(); }
    some_global = 1;
  }
  return some_global;
}

Now you need the full pro/epilogue (saving/restoring all call clobbered
regs presumably used by the large computation and loop) for only one call
of that function (the first), and then never again.  But you do need a
part of it always assuming the architecture is right, and this is a shared
library, namely the setup of the PIC pointer for the access to
some_global.


A prologue does many things, and in some cases many of them aren't
necessary for all calls (indeed that's often the case for functions
containing early-outs), so much so that the full prologue including
unnecessary parts needs more time than the functional body of a functions
particular call.  It's obvious that it would be better to move those parts
of the prologue to places where they actually are needed:
[ ... ]
Right. Essentially Segher's patch introduces the concept of prologue components that are independent of each other and which can be shrink-wrapped separately. The degree of independence is highly target specific, of course.

I think one of the questions (and I haven't looked through the whole thread yet to see if it's answered) is why the basic shrink-wrapping algorithm can't be applied to each of the prologue components -- though you may have answered it with your loop example below.



int f2 (void) {
  setup_pic_reg();
  if (!some_global) {
    save_call_clobbers();
    code_from_above();
    restore_call_clobbers();
  }
  retreg = some_global;
  restore_pic_reg();
}

That includes moving parts of the prologue into loops:

int g() {
  int sum = 0;
  for (int i = 0; i < NUM; i++) {
    sum += i;
    if (sum >= CUTOFF) {
      some_long_winded_expression();
      that_eventually_calls_abort();
    }
  }
  return sum;
}

Here all parts of the prologue that somehow make it possible to call other
functions are necessary only when the program will abort eventually: hence
is necessary only at one call of g() at most.  Again it's sensible to move
those parts inside the unlikely condition, even though it's inside a loop.
Thanks. I'd been wondering about when it'd be useful to push prologue code into a loop nest when I saw the patches fly by, but didn't think about it too much. I haven't looked at the shrink-wrapping literature in years, but I don't recall it having any concept that there were cases where sinking into a loop nest was profitable.

Jeff

Reply via email to