2016-04-18 11:53 GMT+03:00 Senthil Kumar Selvaraj <senthil_kumar.selva...@atmel.com>: > Hi, > > While tracking down a performance regression for the AVR target from > 4.9.x to trunk, I noticed that failing the SHRINK_WRAPPING_ENABLED > check in ira.c led to noticeably worse code for the following > case. That check prevents live range splitting of pseudos containing > formal args, and between 4.9.x and now, the check was modified from > just flag_shrink_wrap to now flag_shrink_wrap && targetm.have_simple_return. > > #include <stdint.h> > > extern int bar(uint32_t, uint16_t); > extern int baz(void); > > int foo(uint8_t x, uint32_t y, uint16_t z) > { > uint8_t status; > switch(x) > { > case 0: > status = bar(y, z); > if (status == 0) > status = baz(); > break;01 > > } > return status; > } > > If the live range splitting of pseudos containing formal args is not > done in IRA, then reload uses callee-saved registers (SI r12), resulting in > extra saves and restores in the prologue/epilogue. > > <snip> > (insn 3 2 5 2 (set (reg/v:SI 12 r12 [orig:47 y ] [47]) > (reg:SI 20 r20 [ y ])) ../ctrl_access.c:7 95 {*movsi} > (nil)) > (note 5 3 8 2 NOTE_INSN_FUNCTION_BEG) > (insn 8 5 9 2 (set (cc0) > (compare (reg:QI 24 r24 [ x ]) > (const_int 0 [0]))) ../ctrl_access.c:9 404 {*cmpqi} > (nil)) > (jump_insn 9 8 13 2 (set (pc) > (if_then_else (ne (cc0) > (const_int 0 [0])) > (label_ref:HI 25) > (pc))) ../ctrl_access.c:9 428 {branch} > (int_list:REG_BR_PROB 6102 (nil)) > -> 25) > (note 13 9 14 3 [bb 3] NOTE_INSN_BASIC_BLOCK) > (insn 14 13 15 3 (set (reg:HI 20 r20) > (reg/v:HI 18 r18 [orig:48 z ] [48])) ../ctrl_access.c:12 83 {*movhi} > (nil)) > (insn 15 14 16 3 (set (reg:SI 22 r22) > (reg/v:SI 12 r12 [orig:47 y ] [47])) ../ctrl_access.c:12 95 {*movsi} > (nil)) > </snip> > > leading to code like > <snip> > push r12 > push r13 > push r14 > push r15 > /* prologue: function */ > /* frame size = 0 */ > /* stack size = 4 */ > .L__stack_usage = 4 > movw r12,r20 > movw r14,r22 > cpse r24,__zero_reg__ > rjmp .L2 > movw r20,r18 > movw r24,r14 > movw r22,r12 > call bar > </snip> > > Whereas in 4.9.2, live range splitting allows reload to generate far > better RTL, like so > > <snip> > (insn 8 5 9 2 (set (cc0) > (compare (reg:QI 24 r24 [ x ]) > (const_int 0 [0]))) ../ctrl_access.c:9 404 {*cmpqi} > (nil)) > (jump_insn 9 8 13 2 (set (pc) > (if_then_else (ne (cc0) > (const_int 0 [0])) > (label_ref:HI 25) > (pc))) ../ctrl_access.c:9 428 {branch} > (int_list:REG_BR_PROB 6102 (nil)) > -> 25) > (note 13 9 42 3 [bb 3] NOTE_INSN_BASIC_BLOCK) > (insn 42 13 14 3 (set (reg/v:SI 22 r22 [orig:48 y ] [48]) > (reg/v:SI 20 r20 [orig:48 y ] [48])) 95 {*movsi} > (nil)) > (insn 14 42 16 3 (set (reg:HI 20 r20) > (reg/v:HI 18 r18 [orig:49 z ] [49])) ../ctrl_access.c:12 83 {*movhi} > (nil)) > </snip> > > leading to code like > <snip> > cpse r24,__zero_reg__ > rjmp .L2 > movw r24,r22 > movw r22,r20 > movw r20,r18 > call bar > </snip> > > In either case, shrink wrapping isn't done in the end, but the live > range splitting ends up helping reload, AFAICT. > > I verified that adding simple_return pattern to avr.md fixes the > regression, but the code size increase if shrink wrapping > really does happen would badly hurt a flash constrained target like the > AVR. I also noticed that targets like sh and arm (for thumb) disable it when > optimizing for size (using the condition part in simple_return pattern) > - understandable. > > What do you guys think is the right way to deal with this? Should I look > into why range splitting helps reload and have reload itself handle this?
It's a place for experiments. Generally, live range splitting must be very useful for targets with constrained register classes and for targets with small amount of registers. Denis.