On Thu, Sep 22, 2011 at 12:58:51AM +0930, Alan Modra wrote: > I spent a little time today looking at why shrink wrap is failing to > help on PowerPC, and it turns out that the optimization simply doesn't > trigger that often due to prologue clobbered regs. PowerPC uses r0 as > a temp in the prologue to save LR to the stack, and unfortunately r0 > seems to often be live across the candidate edge chosen for > shrink-wrapping, ie. where the prologue will be inserted. I suppose > it's no surprise that r0 is often live; rs6000.h:REG_ALLOC_ORDER makes > r0 the first gpr to be used. > > As a quick hack, I'm going to try a different REG_ALLOC_ORDER but I > suspect the real fix will require register renaming in the prologue.
Hi Bernd, Rearranging the rs6000 register allocation order did in fact help a lot as far as making more opportunities available for shrink-wrap. So did your http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01499.html patch. The two together worked so well that gcc won't bootstrap now.. The problem is that shrink wrapping followed by basic block reordering breaks dwarf unwind info, triggering "internal compiler error: in maybe_record_trace_start at dwarf2cfi.c:2243". From your emails on the list, I gather you've seen this yourself. The bootstrap breakage happens on libmudflap/mf-hooks1.c, compiling __wrap_malloc. Eliding some detail, this function starts off as void *__wrap_malloc (size_t c) { if (__mf_starting_p) return __real_malloc (c); The "if" is bb2, the sibling call bb3, and shrink wrap rather nicely puts the prologue for the rest of the function in bb4. A great example of shrink wrap doing as it should, if you ignore the fact that optimizing for startup isn't so clever. However, bb-reorder inverts the "if" and moves the sibling call past other blocks in the function. That's wrong, because the dwarf unwind info for the prologue is not applicable for the sibling call block: The prologue hasn't been executed for that block. (The unwinder sequentially executes all unwind opcodes from the start of the function to find the unwind state at any instruction address.) Exactly the same sort of problem is generated by your "unconverted_simple_returns" code. What should I do here? bb-reorder could be disabled for these blocks, but that won't help unconverted_simple_returns. I'm willing to spend some time fixing this, but don't want to start if you already have partial or full solutions. Another thing I'd like to work on is stopping ifcvt transformations from killing shrink wrap opportunities. We have one in CPU2006 povray Ray_In_Bound that ought to give 5% (figure from shrink wrap by hand), but currently only gets shrink wrapping there with -fno-if-conversion. -- Alan Modra Australia Development Lab, IBM