Denis Chertykov schrieb:
> 2011/10/18 Georg-Johann Lay <a...@gjlay.de>:
>> Denis Chertykov schrieb:
>>> 2011/10/18 Georg-Johann Lay <a...@gjlay.de>:
>>>> This patch do some tweaks to addhi3 like adding QI scratch register.
>>>>
>>>> The original *addhi3 insn is still there and located prior to new
>>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for 
>>>> this
>>>> note) so that there is a version with and a version without scratch 
>>>> register.
>>>>
>>>> Patch passes without regressions.
>>>>
>>> Which improvements added by this patch ?
>>>
>>> Denis.
>> If the addhi3 is expanded early, the addition happens with QI scratch which
>> avoids reload of constant if target register is in NO_LD. And reduce register
>> pressure as only QI is needed and not reload of constant to HI.
>>
>> Otherwise, there might be sequences like
>>
>> ldi r31, 2    ; *reload_inhi
>> mov r12, r31
>> clr r13
>>
>> add r14, r12  ; *addhi3
>> adc r15, r13
>>
>> which now will be
>>
>> ldi r31, 2    ; addhi3_clobber
>> add r14, r31
>> adc r15, __zero_reg__
>>
>> Similar applies if the reload of the constant happens to LD regs:
>>
>> ldi r30, 2    ; *movhi
>> clr r31
>>
>> add r14, r12  ; *addhi3
>> adc r15, r13
>>
>> will become
>>
>> ldi r30, 2    ; addhi3_clobber
>> add r14, r30
>> adc r15, __zero_reg__
>>
>> For *addhi3 insns the register pressure is not reduced but the insn sequence
>> might be smarter if peep2 comes up with a QI scratch or if it detects a
>> *reload_inhi insn just prior to the addition (and the reg that holds the
>> reloaded constant dies after the addition).
>>
>> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn
>> without scratch. This is easier because, e.g. prologue and epilogue 
>> generation
>> generate add insns (not by means of addhi3 expander but by explicit
>> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an
>> addhi3 insn is to be generated via addhi3 expander late in the compilation 
>> process
> 
> Please provide any real world example.
> 
> Denis.

Consider avr-libc (under the assumption that it is "real world" code):

In avr-libc's build directory, and with the patch integrated:

$ cd avr/lib/avr4
$ make clean && make CFLAGS='-save-temps -dp -Os'
$ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment)
$ grep 'addhi3_clobber\/2' *.s | wc -l
33

This shows that the insns are already there before peep2 and thus no reload of
16-bit constant is needed; an 8-bit scratch is sufficient.

Alternatively, the implementation could omit the expansion to addhi3_clobber in
addhi3 expander and instead rely completely on peep2. However, that does not
reduce register pressure because a 16-bit register will be allocated and the
peep2 just prints things smarter and needs just a QI scratch to call
avr_out_plus_clobber.

For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a mess.
 as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter but
better because it sets cc0.

So you like this patch?
Or prefer a patch that is neutral with respect to register allocator and just
uses peep2 to print things smarter?

Johann


dtoa_prf.s:     ldi r31,3        ; ,     ;  338 addhi3_clobber/2        [length 
= 3]
dtoa_prf.s-     add r12,r31      ;  s,
dtoa_prf.s-     adc r13,__zero_reg__     ;  s
--
dtoa_prf.s:     ldi r31,3        ; ,     ;  447 addhi3_clobber/2        [length 
= 3]
dtoa_prf.s-     add r12,r31      ;  s,
dtoa_prf.s-     adc r13,__zero_reg__     ;  s
--
fgets.s:        ldi r31,1        ; ,     ;  70  addhi3_clobber/2        [length 
= 3]
fgets.s-        sub r14,r31      ;  ivtmp.9,
fgets.s-        sbc r15,__zero_reg__     ;  ivtmp.9
--
realloc.s:      ldi r17,2        ; ,     ;  80  addhi3_clobber/2        [length 
= 3]
realloc.s-      add r12,r17      ;  tmp83,
realloc.s-      adc r13,__zero_reg__     ; 
--
realloc.s:      ldi r18,2        ; ,     ;  85  addhi3_clobber/2        [length 
= 3]
realloc.s-      add r12,r18      ;  tmp84,
realloc.s-      adc r13,__zero_reg__     ; 
--
strtod.s:       ldi r31,1        ; ,     ;  101 addhi3_clobber/2        [length 
= 3]
strtod.s-       sub r14,r31      ;  D.2581,
strtod.s-       sbc r15,__zero_reg__     ;  D.2581
--
strtod.s:       ldi r18,2        ; ,     ;  110 addhi3_clobber/2        [length 
= 3]
strtod.s-       add r14,r18      ;  nptr,
strtod.s-       adc r15,__zero_reg__     ;  nptr
--
strtod.s:       ldi r21,7        ; ,     ;  120 addhi3_clobber/2        [length 
= 3]
strtod.s-       add r14,r21      ;  nptr,
strtod.s-       adc r15,__zero_reg__     ;  nptr
--
strtod.s:       ldi r31,255      ; ,     ;  175 addhi3_clobber/2        [length 
= 3]
strtod.s-       sub r14,r31      ;  exp,
strtod.s-       sbc r15,r31      ;  exp,
--
strtod.s:       ldi r18,1        ; ,     ;  185 addhi3_clobber/2        [length 
= 3]
strtod.s-       sub r14,r18      ;  exp,
strtod.s-       sbc r15,__zero_reg__     ;  exp
--
strtod.s:       ldi r31,24       ; ,     ;  376 addhi3_clobber/2        [length 
= 3]
strtod.s-       sub r8,r31       ;  D.2735,
strtod.s-       sbc r9,__zero_reg__      ;  D.2735
--
strtol.s:       ldi r31,2        ; ,     ;  128 addhi3_clobber/2        [length 
= 3]
strtol.s-       add r6,r31       ;  nptr,
strtol.s-       adc r7,__zero_reg__      ;  nptr
--
strtol.s:       ldi r31,1        ; ,     ;  242 addhi3_clobber/2        [length 
= 3]
strtol.s-       sub r6,r31       ;  tmp117,
strtol.s-       sbc r7,__zero_reg__      ; 
--
strtol.s:       ldi r31,2        ; ,     ;  252 addhi3_clobber/2        [length 
= 3]
strtol.s-       sub r6,r31       ;  tmp119,
strtol.s-       sbc r7,__zero_reg__      ; 
--
strtoul.s:      ldi r31,2        ; ,     ;  126 addhi3_clobber/2        [length 
= 3]
strtoul.s-      add r14,r31      ;  nptr,
strtoul.s-      adc r15,__zero_reg__     ;  nptr
--
strtoul.s:      ldi r31,1        ; ,     ;  229 addhi3_clobber/2        [length 
= 3]
strtoul.s-      sub r14,r31      ;  tmp113,
strtoul.s-      sbc r15,__zero_reg__     ; 
--
strtoul.s:      ldi r31,2        ; ,     ;  239 addhi3_clobber/2        [length 
= 3]
strtoul.s-      sub r14,r31      ;  tmp115,
strtoul.s-      sbc r15,__zero_reg__     ; 
--
vfprintf.s:     ldi r24,4        ; ,     ;  399 addhi3_clobber/2        [length 
= 3]
vfprintf.s-     add r4,r24       ;  ap,
vfprintf.s-     adc r5,__zero_reg__      ;  ap
--
vfprintf.s:     ldi r21,10       ; ,     ;  850 addhi3_clobber/2        [length 
= 3]
vfprintf.s-     sub r10,r21      ;  exp,
vfprintf.s-     sbc r11,__zero_reg__     ;  exp
--
vfprintf.s:     ldi r30,2        ; ,     ;  882 addhi3_clobber/2        [length 
= 3]
vfprintf.s-     add r4,r30       ;  ap,
vfprintf.s-     adc r5,__zero_reg__      ;  ap
--
vfprintf.s:     ldi r31,2        ; ,     ;  892 addhi3_clobber/2        [length 
= 3]
vfprintf.s-     add r4,r31       ;  ap,
vfprintf.s-     adc r5,__zero_reg__      ;  ap
--
vfprintf.s:     ldi r31,2        ; ,     ;  919 addhi3_clobber/2        [length 
= 3]
vfprintf.s-     add r4,r31       ;  ap,
vfprintf.s-     adc r5,__zero_reg__      ;  ap
--
vfprintf.s:     ldi r31,1        ; ,     ;  987 addhi3_clobber/2        [length 
= 3]
vfprintf.s-     sub r8,r31       ;  size,
vfprintf.s-     sbc r9,__zero_reg__      ;  size
--
vfprintf.s:     ldi r18,4        ; ,     ;  1012        addhi3_clobber/2        
[length = 3]
vfprintf.s-     add r4,r18       ;  ap,
vfprintf.s-     adc r5,__zero_reg__      ;  ap
--
vfprintf.s:     ldi r31,2        ; ,     ;  1019        addhi3_clobber/2        
[length = 3]
vfprintf.s-     add r4,r31       ;  ap,
vfprintf.s-     adc r5,__zero_reg__      ;  ap
--
vfprintf.s:     ldi r30,4        ; ,     ;  1109        addhi3_clobber/2        
[length = 3]
vfprintf.s-     add r4,r30       ;  ap,
vfprintf.s-     adc r5,__zero_reg__      ;  ap
--
vfprintf.s:     ldi r31,2        ; ,     ;  1116        addhi3_clobber/2        
[length = 3]
vfprintf.s-     add r4,r31       ;  ap,
vfprintf.s-     adc r5,__zero_reg__      ;  ap
--
vfscanf.s:      ldi r27,1        ; ,     ;  213 addhi3_clobber/2        [length 
= 3]
vfscanf.s-      sub r10,r27      ;  width,
vfscanf.s-      sbc r11,__zero_reg__     ;  width
--
vfscanf.s:      ldi r25,255      ; ,     ;  163 addhi3_clobber/2        [length 
= 3]
vfscanf.s-      sub r12,r25      ;  exp,
vfscanf.s-      sbc r13,r25      ;  exp,
--
vfscanf.s:      ldi r30,1        ; ,     ;  173 addhi3_clobber/2        [length 
= 3]
vfscanf.s-      sub r12,r30      ;  exp,
vfscanf.s-      sbc r13,__zero_reg__     ;  exp
--
vfscanf.s:      ldi r25,24       ; ,     ;  354 addhi3_clobber/2        [length 
= 3]
vfscanf.s-      sub r6,r25       ;  D.3471,
vfscanf.s-      sbc r7,__zero_reg__      ;  D.3471
--
vfscanf.s:      ldi r31,1        ; ,     ;  235 addhi3_clobber/2        [length 
= 3]
vfscanf.s-      sub r12,r31      ;  width,
vfscanf.s-      sbc r13,__zero_reg__     ;  width
--
vfscanf.s:      ldi r31,1        ; ,     ;  334 addhi3_clobber/2        [length 
= 3]
vfscanf.s-      sub r12,r31      ;  width,
vfscanf.s-      sbc r13,__zero_reg__     ;  width

Reply via email to