On 08/30/2016 06:31 AM, Michael Matz wrote:
Hi,
On Fri, 26 Aug 2016, Bernd Schmidt wrote:
And that comment puzzles me. Surely prologue and epilogue are executed only
once currently, so how does frequency come into it? Again - please provide an
example.
int some_global;
int foo (void) {
if (!some_global) {
call_this();
call_that();
x = some + large / computation;
while (x--) { much_more_code(); }
some_global = 1;
}
return some_global;
}
Now you need the full pro/epilogue (saving/restoring all call clobbered
regs presumably used by the large computation and loop) for only one call
of that function (the first), and then never again. But you do need a
part of it always assuming the architecture is right, and this is a shared
library, namely the setup of the PIC pointer for the access to
some_global.
A prologue does many things, and in some cases many of them aren't
necessary for all calls (indeed that's often the case for functions
containing early-outs), so much so that the full prologue including
unnecessary parts needs more time than the functional body of a functions
particular call. It's obvious that it would be better to move those parts
of the prologue to places where they actually are needed:
[ ... ]
Right. Essentially Segher's patch introduces the concept of prologue
components that are independent of each other and which can be
shrink-wrapped separately. The degree of independence is highly target
specific, of course.
I think one of the questions (and I haven't looked through the whole
thread yet to see if it's answered) is why the basic shrink-wrapping
algorithm can't be applied to each of the prologue components -- though
you may have answered it with your loop example below.
int f2 (void) {
setup_pic_reg();
if (!some_global) {
save_call_clobbers();
code_from_above();
restore_call_clobbers();
}
retreg = some_global;
restore_pic_reg();
}
That includes moving parts of the prologue into loops:
int g() {
int sum = 0;
for (int i = 0; i < NUM; i++) {
sum += i;
if (sum >= CUTOFF) {
some_long_winded_expression();
that_eventually_calls_abort();
}
}
return sum;
}
Here all parts of the prologue that somehow make it possible to call other
functions are necessary only when the program will abort eventually: hence
is necessary only at one call of g() at most. Again it's sensible to move
those parts inside the unlikely condition, even though it's inside a loop.
Thanks. I'd been wondering about when it'd be useful to push prologue
code into a loop nest when I saw the patches fly by, but didn't think
about it too much. I haven't looked at the shrink-wrapping literature
in years, but I don't recall it having any concept that there were cases
where sinking into a loop nest was profitable.
Jeff