Hi, While tracking down a performance regression for the AVR target from 4.9.x to trunk, I noticed that failing the SHRINK_WRAPPING_ENABLED check in ira.c led to noticeably worse code for the following case. That check prevents live range splitting of pseudos containing formal args, and between 4.9.x and now, the check was modified from just flag_shrink_wrap to now flag_shrink_wrap && targetm.have_simple_return.
#include <stdint.h> extern int bar(uint32_t, uint16_t); extern int baz(void); int foo(uint8_t x, uint32_t y, uint16_t z) { uint8_t status; switch(x) { case 0: status = bar(y, z); if (status == 0) status = baz(); break; } return status; } If the live range splitting of pseudos containing formal args is not done in IRA, then reload uses callee-saved registers (SI r12), resulting in extra saves and restores in the prologue/epilogue. <snip> (insn 3 2 5 2 (set (reg/v:SI 12 r12 [orig:47 y ] [47]) (reg:SI 20 r20 [ y ])) ../ctrl_access.c:7 95 {*movsi} (nil)) (note 5 3 8 2 NOTE_INSN_FUNCTION_BEG) (insn 8 5 9 2 (set (cc0) (compare (reg:QI 24 r24 [ x ]) (const_int 0 [0]))) ../ctrl_access.c:9 404 {*cmpqi} (nil)) (jump_insn 9 8 13 2 (set (pc) (if_then_else (ne (cc0) (const_int 0 [0])) (label_ref:HI 25) (pc))) ../ctrl_access.c:9 428 {branch} (int_list:REG_BR_PROB 6102 (nil)) -> 25) (note 13 9 14 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 14 13 15 3 (set (reg:HI 20 r20) (reg/v:HI 18 r18 [orig:48 z ] [48])) ../ctrl_access.c:12 83 {*movhi} (nil)) (insn 15 14 16 3 (set (reg:SI 22 r22) (reg/v:SI 12 r12 [orig:47 y ] [47])) ../ctrl_access.c:12 95 {*movsi} (nil)) </snip> leading to code like <snip> push r12 push r13 push r14 push r15 /* prologue: function */ /* frame size = 0 */ /* stack size = 4 */ .L__stack_usage = 4 movw r12,r20 movw r14,r22 cpse r24,__zero_reg__ rjmp .L2 movw r20,r18 movw r24,r14 movw r22,r12 call bar </snip> Whereas in 4.9.2, live range splitting allows reload to generate far better RTL, like so <snip> (insn 8 5 9 2 (set (cc0) (compare (reg:QI 24 r24 [ x ]) (const_int 0 [0]))) ../ctrl_access.c:9 404 {*cmpqi} (nil)) (jump_insn 9 8 13 2 (set (pc) (if_then_else (ne (cc0) (const_int 0 [0])) (label_ref:HI 25) (pc))) ../ctrl_access.c:9 428 {branch} (int_list:REG_BR_PROB 6102 (nil)) -> 25) (note 13 9 42 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 42 13 14 3 (set (reg/v:SI 22 r22 [orig:48 y ] [48]) (reg/v:SI 20 r20 [orig:48 y ] [48])) 95 {*movsi} (nil)) (insn 14 42 16 3 (set (reg:HI 20 r20) (reg/v:HI 18 r18 [orig:49 z ] [49])) ../ctrl_access.c:12 83 {*movhi} (nil)) </snip> leading to code like <snip> cpse r24,__zero_reg__ rjmp .L2 movw r24,r22 movw r22,r20 movw r20,r18 call bar </snip> In either case, shrink wrapping isn't done in the end, but the live range splitting ends up helping reload, AFAICT. I verified that adding simple_return pattern to avr.md fixes the regression, but the code size increase if shrink wrapping really does happen would badly hurt a flash constrained target like the AVR. I also noticed that targets like sh and arm (for thumb) disable it when optimizing for size (using the condition part in simple_return pattern) - understandable. What do you guys think is the right way to deal with this? Should I look into why range splitting helps reload and have reload itself handle this? Regards Senthil