split_live_ranges_for_shrink_wrap and !SHRINK_WRAPPING_ENABLED?

Senthil Kumar Selvaraj Mon, 18 Apr 2016 01:54:03 -0700

Hi,

  While tracking down a performance regression for the AVR target from
  4.9.x to trunk, I noticed that failing the SHRINK_WRAPPING_ENABLED
  check in ira.c led to noticeably worse code for the following
  case. That check prevents live range splitting of pseudos containing
  formal args, and between 4.9.x and now, the check was modified from
  just flag_shrink_wrap to now flag_shrink_wrap && targetm.have_simple_return.


#include <stdint.h>

extern int bar(uint32_t, uint16_t);
extern int baz(void);

int foo(uint8_t x, uint32_t y, uint16_t z)
{
   uint8_t status;
   switch(x)
   {
      case 0:
        status = bar(y, z); 
        if (status == 0)
           status = baz();
        break;

   }
   return status;
}

If the live range splitting of pseudos containing formal args is not
done in IRA, then reload uses callee-saved registers (SI r12), resulting in
extra saves and restores in the prologue/epilogue.

<snip>
(insn 3 2 5 2 (set (reg/v:SI 12 r12 [orig:47 y ] [47])
        (reg:SI 20 r20 [ y ])) ../ctrl_access.c:7 95 {*movsi}
     (nil))
(note 5 3 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 5 9 2 (set (cc0)
        (compare (reg:QI 24 r24 [ x ])
            (const_int 0 [0]))) ../ctrl_access.c:9 404 {*cmpqi}
     (nil))
(jump_insn 9 8 13 2 (set (pc)
        (if_then_else (ne (cc0)
                (const_int 0 [0]))
            (label_ref:HI 25)
            (pc))) ../ctrl_access.c:9 428 {branch}
     (int_list:REG_BR_PROB 6102 (nil))
 -> 25)
(note 13 9 14 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 14 13 15 3 (set (reg:HI 20 r20)
        (reg/v:HI 18 r18 [orig:48 z ] [48])) ../ctrl_access.c:12 83 {*movhi}
     (nil))
(insn 15 14 16 3 (set (reg:SI 22 r22)
        (reg/v:SI 12 r12 [orig:47 y ] [47])) ../ctrl_access.c:12 95 {*movsi}
     (nil))
</snip>

leading to code like
<snip>
push r12
push r13
push r14
push r15
/* prologue: function */
/* frame size = 0 */
/* stack size = 4 */
.L__stack_usage = 4
movw r12,r20
movw r14,r22
cpse r24,__zero_reg__
rjmp .L2
movw r20,r18
movw r24,r14
movw r22,r12
call bar
</snip>

Whereas in 4.9.2, live range splitting allows reload to generate far
better RTL, like so

<snip>
(insn 8 5 9 2 (set (cc0)
        (compare (reg:QI 24 r24 [ x ])
            (const_int 0 [0]))) ../ctrl_access.c:9 404 {*cmpqi}
     (nil))
(jump_insn 9 8 13 2 (set (pc)
        (if_then_else (ne (cc0)
                (const_int 0 [0]))
            (label_ref:HI 25)
            (pc))) ../ctrl_access.c:9 428 {branch}
     (int_list:REG_BR_PROB 6102 (nil))
 -> 25)
(note 13 9 42 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 42 13 14 3 (set (reg/v:SI 22 r22 [orig:48 y ] [48])
        (reg/v:SI 20 r20 [orig:48 y ] [48])) 95 {*movsi}
     (nil))
(insn 14 42 16 3 (set (reg:HI 20 r20)
        (reg/v:HI 18 r18 [orig:49 z ] [49])) ../ctrl_access.c:12 83 {*movhi}
     (nil))
</snip>

leading to code like
<snip>
cpse r24,__zero_reg__
rjmp .L2
movw r24,r22
movw r22,r20
movw r20,r18
call bar
</snip>

In either case, shrink wrapping isn't done in the end, but the live
range splitting ends up helping reload, AFAICT.

I verified that adding simple_return pattern to avr.md fixes the
regression, but the code size increase if shrink wrapping
really does happen would badly hurt a flash constrained target like the
AVR. I also noticed that targets like sh and arm (for thumb) disable it when
optimizing for size (using the condition part in simple_return pattern)
- understandable.

What do you guys think is the right way to deal with this? Should I look
into why range splitting helps reload and have reload itself handle this?

Regards
Senthil

split_live_ranges_for_shrink_wrap and !SHRINK_WRAPPING_ENABLED?

Reply via email to