Am 02.11.23 um 12:54 schrieb Roger Sayle:
This patch provides non-looping implementations for more SImode (32-bit)
and PSImode (24-bit) shifts on AVR. For most cases, these are shorter
and faster than using a loop, but for a few (controlled by optimize_size)
Maybe this should also adjust the insn costs, like in avr_rtx_costs_1?
Depending on what you are outputting, avr_asm_len() might be more
convenient.
What I am not sure about are the text cases that expect exact sequences
which might be annoying in the future?
Johann
they are a little larger but significantly faster, The approach is to
perform byte-based shifts by 1, 2 or 3 bytes, followed by bit-based shifts
(effectively in a narrower type) for the remaining bits, beyond 8, 16 or 24.
For example, the simple test case below (inspired by PR 112268):
unsigned long foo(unsigned long x)
{
return x >> 26;
}
gcc -O2 currently generates:
foo: ldi r18,26
1: lsr r25
ror r24
ror r23
ror r22
dec r18
brne 1b
ret
which is 8 instructions, and takes ~158 cycles.
With this patch, we now generate:
foo: mov r22,r25
clr r23
clr r24
clr r25
lsr r22
lsr r22
ret
which is 7 instructions, and takes ~7 cycles.
One complication is that the modified functions sometimes use spaces instead
of TABs, with occasional mistakes in GNU-style formatting, so I've fixed
these indentation/whitespace issues. There's no change in the code for the
cases previously handled/special-cased, with the exception of ashrqi3 reg,5
where with -Os a (4-instruction) loop is shorter than the five single-bit
shifts of a fully unrolled implementation.
This patch has been (partially) tested with a cross-compiler to avr-elf
hosted on x86_64, without a simulator, where the compile-only tests in
the gcc testsuite show no regressions. If someone could test this more
thoroughly that would be great.
2023-11-02 Roger Sayle <[email protected]>
gcc/ChangeLog
* config/avr/avr.cc (ashlqi3_out): Fix indentation whitespace.
(ashlhi3_out): Likewise.
(avr_out_ashlpsi3): Likewise. Handle shifts by 9 and 17-22.
(ashlsi3_out): Fix formatting. Handle shifts by 9 and 25-30.
(ashrqi3_our): Use loop for shifts by 5 when optimizing for size.
Fix indentation whitespace.
(ashrhi3_out): Likewise.
(avr_out_ashrpsi3): Likewise. Handle shifts by 17.
(ashrsi3_out): Fix indentation. Handle shifts by 17 and 25.
(lshrqi3_out): Fix whitespace.
(lshrhi3_out): Likewise.
(avr_out_lshrpsi3): Likewise. Handle shifts by 9 and 17-22.
(lshrsi3_out): Fix indentation. Handle shifts by 9,17,18 and 25-30.
gcc/testsuite/ChangeLog
* gcc.target/avr/ashlsi-1.c: New test case.
* gcc.target/avr/ashlsi-2.c: Likewise.
* gcc.target/avr/ashrsi-1.c: Likewise.
* gcc.target/avr/ashrsi-2.c: Likewise.
* gcc.target/avr/lshrsi-1.c: Likewise.
* gcc.target/avr/lshrsi-2.c: Likewise.
Thanks in advance,
Roger
--