On 12/5/23 06:59, Roger Sayle wrote:

This patch improves the code generated for bitfield sign extensions on
ARC cpus without a barrel shifter.


Compiling the following test case:

int foo(int x) { return (x<<27)>>27; }

with -O2 -mcpu=em, generates two loops:

foo:    mov     lp_count,27
         lp      2f
         add     r0,r0,r0
         nop
2:      # end single insn loop
         mov     lp_count,27
         lp      2f
         asr     r0,r0
         nop
2:      # end single insn loop
         j_s     [blink]


and the closely related test case:

struct S { int a : 5; };
int bar (struct S *p) { return p->a; }

generates the slightly better:

bar:    ldb_s   r0,[r0]
         mov_s   r2,0    ;3
         add3    r0,r2,r0
         sexb_s  r0,r0
         asr_s   r0,r0
         asr_s   r0,r0
         j_s.d   [blink]
         asr_s   r0,r0

which uses 6 instructions to perform this particular sign extension.
It turns out that sign extensions can always be implemented using at
most three instructions on ARC (without a barrel shifter) using the
idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
of Henry Warren's book "Hacker's Delight"].  Using this, the sign
extensions above on ARC's EM both become:

         bmsk_s  r0,r0,4
         xor     r0,r0,32
         sub     r0,r0,32

which takes about 3 cycles, compared to the ~112 cycles for the loops
in foo.


Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?


2023-12-05  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
         * config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
         implement SImode sign extract using a AND, XOR and MINUS sequence.
Note with this sequence in place (assuming it moves forward on the ARC), you may be able to build a better generalized signed bitfield extraction.

Rather than a shift-left followed by an arithmetic shift right, you can instead do a logical shift right to get the field into the LSBs, then use the this pattern to implement the sign extension from the MSB of the field.

Given it saves a potentially very expensive shift, it may be worth exploring for the ARC.

I've done this for the H8. Not every sequence is better, but many are. There's improvements that could be made, but we're probably capturing the vast majority of the benefit in the patch I'm currently testing.

Anyway just thought I'd point out the natural follow-on from this effort.

Jeff

Reply via email to