On Thu, Sep 22, 2011 at 12:58:51AM +0930, Alan Modra wrote:
> I spent a little time today looking at why shrink wrap is failing to
> help on PowerPC, and it turns out that the optimization simply doesn't
> trigger that often due to prologue clobbered regs.  PowerPC uses r0 as
> a temp in the prologue to save LR to the stack, and unfortunately r0
> seems to often be live across the candidate edge chosen for
> shrink-wrapping, ie. where the prologue will be inserted.  I suppose
> it's no surprise that r0 is often live; rs6000.h:REG_ALLOC_ORDER makes
> r0 the first gpr to be used.
> 
> As a quick hack, I'm going to try a different REG_ALLOC_ORDER but I
> suspect the real fix will require register renaming in the prologue.

Hi Bernd,
Rearranging the rs6000 register allocation order did in fact help a
lot as far as making more opportunities available for shrink-wrap.  So
did your http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01499.html
patch.  The two together worked so well that gcc won't bootstrap now..

The problem is that shrink wrapping followed by basic block reordering
breaks dwarf unwind info, triggering "internal compiler error: in
maybe_record_trace_start at dwarf2cfi.c:2243".  From your emails on
the list, I gather you've seen this yourself.

The bootstrap breakage happens on libmudflap/mf-hooks1.c, compiling
__wrap_malloc.  Eliding some detail, this function starts off as

void *__wrap_malloc (size_t c)
{
  if (__mf_starting_p)
    return __real_malloc (c);

The "if" is bb2, the sibling call bb3, and shrink wrap rather nicely
puts the prologue for the rest of the function in bb4.  A great
example of shrink wrap doing as it should, if you ignore the fact that
optimizing for startup isn't so clever.  However, bb-reorder inverts
the "if" and moves the sibling call past other blocks in the function.
That's wrong, because the dwarf unwind info for the prologue is not
applicable for the sibling call block:  The prologue hasn't been
executed for that block.  (The unwinder sequentially executes all
unwind opcodes from the start of the function to find the unwind state
at any instruction address.)  Exactly the same sort of problem is
generated by your "unconverted_simple_returns" code.

What should I do here?  bb-reorder could be disabled for these blocks,
but that won't help unconverted_simple_returns.  I'm willing to spend
some time fixing this, but don't want to start if you already have
partial or full solutions.  Another thing I'd like to work on is
stopping ifcvt transformations from killing shrink wrap opportunities.
We have one in CPU2006 povray Ray_In_Bound that ought to give 5%
(figure from shrink wrap by hand), but currently only gets shrink
wrapping there with -fno-if-conversion.

-- 
Alan Modra
Australia Development Lab, IBM

Reply via email to