Denis Chertykov schrieb: > 2011/10/18 Georg-Johann Lay <a...@gjlay.de>: >> Denis Chertykov schrieb: >>> 2011/10/18 Georg-Johann Lay <a...@gjlay.de>: >>>> This patch do some tweaks to addhi3 like adding QI scratch register. >>>> >>>> The original *addhi3 insn is still there and located prior to new >>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for >>>> this >>>> note) so that there is a version with and a version without scratch >>>> register. >>>> >>>> Patch passes without regressions. >>>> >>> Which improvements added by this patch ? >>> >>> Denis. >> If the addhi3 is expanded early, the addition happens with QI scratch which >> avoids reload of constant if target register is in NO_LD. And reduce register >> pressure as only QI is needed and not reload of constant to HI. >> >> Otherwise, there might be sequences like >> >> ldi r31, 2 ; *reload_inhi >> mov r12, r31 >> clr r13 >> >> add r14, r12 ; *addhi3 >> adc r15, r13 >> >> which now will be >> >> ldi r31, 2 ; addhi3_clobber >> add r14, r31 >> adc r15, __zero_reg__ >> >> Similar applies if the reload of the constant happens to LD regs: >> >> ldi r30, 2 ; *movhi >> clr r31 >> >> add r14, r12 ; *addhi3 >> adc r15, r13 >> >> will become >> >> ldi r30, 2 ; addhi3_clobber >> add r14, r30 >> adc r15, __zero_reg__ >> >> For *addhi3 insns the register pressure is not reduced but the insn sequence >> might be smarter if peep2 comes up with a QI scratch or if it detects a >> *reload_inhi insn just prior to the addition (and the reg that holds the >> reloaded constant dies after the addition). >> >> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn >> without scratch. This is easier because, e.g. prologue and epilogue >> generation >> generate add insns (not by means of addhi3 expander but by explicit >> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an >> addhi3 insn is to be generated via addhi3 expander late in the compilation >> process > > Please provide any real world example. > > Denis.
Consider avr-libc (under the assumption that it is "real world" code): In avr-libc's build directory, and with the patch integrated: $ cd avr/lib/avr4 $ make clean && make CFLAGS='-save-temps -dp -Os' $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment) $ grep 'addhi3_clobber\/2' *.s | wc -l 33 This shows that the insns are already there before peep2 and thus no reload of 16-bit constant is needed; an 8-bit scratch is sufficient. Alternatively, the implementation could omit the expansion to addhi3_clobber in addhi3 expander and instead rely completely on peep2. However, that does not reduce register pressure because a 16-bit register will be allocated and the peep2 just prints things smarter and needs just a QI scratch to call avr_out_plus_clobber. For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a mess. as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter but better because it sets cc0. So you like this patch? Or prefer a patch that is neutral with respect to register allocator and just uses peep2 to print things smarter? Johann
dtoa_prf.s: ldi r31,3 ; , ; 338 addhi3_clobber/2 [length = 3] dtoa_prf.s- add r12,r31 ; s, dtoa_prf.s- adc r13,__zero_reg__ ; s -- dtoa_prf.s: ldi r31,3 ; , ; 447 addhi3_clobber/2 [length = 3] dtoa_prf.s- add r12,r31 ; s, dtoa_prf.s- adc r13,__zero_reg__ ; s -- fgets.s: ldi r31,1 ; , ; 70 addhi3_clobber/2 [length = 3] fgets.s- sub r14,r31 ; ivtmp.9, fgets.s- sbc r15,__zero_reg__ ; ivtmp.9 -- realloc.s: ldi r17,2 ; , ; 80 addhi3_clobber/2 [length = 3] realloc.s- add r12,r17 ; tmp83, realloc.s- adc r13,__zero_reg__ ; -- realloc.s: ldi r18,2 ; , ; 85 addhi3_clobber/2 [length = 3] realloc.s- add r12,r18 ; tmp84, realloc.s- adc r13,__zero_reg__ ; -- strtod.s: ldi r31,1 ; , ; 101 addhi3_clobber/2 [length = 3] strtod.s- sub r14,r31 ; D.2581, strtod.s- sbc r15,__zero_reg__ ; D.2581 -- strtod.s: ldi r18,2 ; , ; 110 addhi3_clobber/2 [length = 3] strtod.s- add r14,r18 ; nptr, strtod.s- adc r15,__zero_reg__ ; nptr -- strtod.s: ldi r21,7 ; , ; 120 addhi3_clobber/2 [length = 3] strtod.s- add r14,r21 ; nptr, strtod.s- adc r15,__zero_reg__ ; nptr -- strtod.s: ldi r31,255 ; , ; 175 addhi3_clobber/2 [length = 3] strtod.s- sub r14,r31 ; exp, strtod.s- sbc r15,r31 ; exp, -- strtod.s: ldi r18,1 ; , ; 185 addhi3_clobber/2 [length = 3] strtod.s- sub r14,r18 ; exp, strtod.s- sbc r15,__zero_reg__ ; exp -- strtod.s: ldi r31,24 ; , ; 376 addhi3_clobber/2 [length = 3] strtod.s- sub r8,r31 ; D.2735, strtod.s- sbc r9,__zero_reg__ ; D.2735 -- strtol.s: ldi r31,2 ; , ; 128 addhi3_clobber/2 [length = 3] strtol.s- add r6,r31 ; nptr, strtol.s- adc r7,__zero_reg__ ; nptr -- strtol.s: ldi r31,1 ; , ; 242 addhi3_clobber/2 [length = 3] strtol.s- sub r6,r31 ; tmp117, strtol.s- sbc r7,__zero_reg__ ; -- strtol.s: ldi r31,2 ; , ; 252 addhi3_clobber/2 [length = 3] strtol.s- sub r6,r31 ; tmp119, strtol.s- sbc r7,__zero_reg__ ; -- strtoul.s: ldi r31,2 ; , ; 126 addhi3_clobber/2 [length = 3] strtoul.s- add r14,r31 ; nptr, strtoul.s- adc r15,__zero_reg__ ; nptr -- strtoul.s: ldi r31,1 ; , ; 229 addhi3_clobber/2 [length = 3] strtoul.s- sub r14,r31 ; tmp113, strtoul.s- sbc r15,__zero_reg__ ; -- strtoul.s: ldi r31,2 ; , ; 239 addhi3_clobber/2 [length = 3] strtoul.s- sub r14,r31 ; tmp115, strtoul.s- sbc r15,__zero_reg__ ; -- vfprintf.s: ldi r24,4 ; , ; 399 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r24 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r21,10 ; , ; 850 addhi3_clobber/2 [length = 3] vfprintf.s- sub r10,r21 ; exp, vfprintf.s- sbc r11,__zero_reg__ ; exp -- vfprintf.s: ldi r30,2 ; , ; 882 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r30 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,2 ; , ; 892 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r31 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,2 ; , ; 919 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r31 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,1 ; , ; 987 addhi3_clobber/2 [length = 3] vfprintf.s- sub r8,r31 ; size, vfprintf.s- sbc r9,__zero_reg__ ; size -- vfprintf.s: ldi r18,4 ; , ; 1012 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r18 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,2 ; , ; 1019 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r31 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r30,4 ; , ; 1109 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r30 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfprintf.s: ldi r31,2 ; , ; 1116 addhi3_clobber/2 [length = 3] vfprintf.s- add r4,r31 ; ap, vfprintf.s- adc r5,__zero_reg__ ; ap -- vfscanf.s: ldi r27,1 ; , ; 213 addhi3_clobber/2 [length = 3] vfscanf.s- sub r10,r27 ; width, vfscanf.s- sbc r11,__zero_reg__ ; width -- vfscanf.s: ldi r25,255 ; , ; 163 addhi3_clobber/2 [length = 3] vfscanf.s- sub r12,r25 ; exp, vfscanf.s- sbc r13,r25 ; exp, -- vfscanf.s: ldi r30,1 ; , ; 173 addhi3_clobber/2 [length = 3] vfscanf.s- sub r12,r30 ; exp, vfscanf.s- sbc r13,__zero_reg__ ; exp -- vfscanf.s: ldi r25,24 ; , ; 354 addhi3_clobber/2 [length = 3] vfscanf.s- sub r6,r25 ; D.3471, vfscanf.s- sbc r7,__zero_reg__ ; D.3471 -- vfscanf.s: ldi r31,1 ; , ; 235 addhi3_clobber/2 [length = 3] vfscanf.s- sub r12,r31 ; width, vfscanf.s- sbc r13,__zero_reg__ ; width -- vfscanf.s: ldi r31,1 ; , ; 334 addhi3_clobber/2 [length = 3] vfscanf.s- sub r12,r31 ; width, vfscanf.s- sbc r13,__zero_reg__ ; width