Am 02.11.23 um 12:54 schrieb Roger Sayle:

This patch provides non-looping implementations for more SImode (32-bit)
and PSImode (24-bit) shifts on AVR.  For most cases, these are shorter
and faster than using a loop, but for a few (controlled by optimize_size)

Maybe this should also adjust the insn costs, like in avr_rtx_costs_1?

Depending on what you are outputting, avr_asm_len() might be more
convenient.

What I am not sure about are the text cases that expect exact sequences
which might be annoying in the future?

Johann


they are a little larger but significantly faster,  The approach is to
perform byte-based shifts by 1, 2 or 3 bytes, followed by bit-based shifts
(effectively in a narrower type) for the remaining bits, beyond 8, 16 or 24.

For example, the simple test case below (inspired by PR 112268):

unsigned long foo(unsigned long x)
{
   return x >> 26;
}

gcc -O2 currently generates:

foo:    ldi r18,26
1:      lsr r25
         ror r24
         ror r23
         ror r22
         dec r18
         brne 1b
         ret

which is 8 instructions, and takes ~158 cycles.
With this patch, we now generate:

foo:    mov r22,r25
         clr r23
         clr r24
         clr r25
         lsr r22
         lsr r22
         ret

which is 7 instructions, and takes ~7 cycles.

One complication is that the modified functions sometimes use spaces instead
of TABs, with occasional mistakes in GNU-style formatting, so I've fixed
these indentation/whitespace issues.  There's no change in the code for the
cases previously handled/special-cased, with the exception of ashrqi3 reg,5
where with -Os a (4-instruction) loop is shorter than the five single-bit
shifts of a fully unrolled implementation.

This patch has been (partially) tested with a cross-compiler to avr-elf
hosted on x86_64, without a simulator, where the compile-only tests in
the gcc testsuite show no regressions.  If someone could test this more
thoroughly that would be great.


2023-11-02  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
         * config/avr/avr.cc (ashlqi3_out): Fix indentation whitespace.
         (ashlhi3_out): Likewise.
         (avr_out_ashlpsi3): Likewise.  Handle shifts by 9 and 17-22.
         (ashlsi3_out): Fix formatting.  Handle shifts by 9 and 25-30.
         (ashrqi3_our): Use loop for shifts by 5 when optimizing for size.
         Fix indentation whitespace.
         (ashrhi3_out): Likewise.
         (avr_out_ashrpsi3): Likewise.  Handle shifts by 17.
         (ashrsi3_out): Fix indentation.  Handle shifts by 17 and 25.
         (lshrqi3_out): Fix whitespace.
         (lshrhi3_out): Likewise.
         (avr_out_lshrpsi3): Likewise.  Handle shifts by 9 and 17-22.
         (lshrsi3_out): Fix indentation.  Handle shifts by 9,17,18 and 25-30.

gcc/testsuite/ChangeLog
         * gcc.target/avr/ashlsi-1.c: New test case.
         * gcc.target/avr/ashlsi-2.c: Likewise.
         * gcc.target/avr/ashrsi-1.c: Likewise.
         * gcc.target/avr/ashrsi-2.c: Likewise.
         * gcc.target/avr/lshrsi-1.c: Likewise.
         * gcc.target/avr/lshrsi-2.c: Likewise.


Thanks in advance,
Roger
--

Reply via email to