On 12/5/23 06:59, Roger Sayle wrote:
This patch improves the code generated for bitfield sign extensions on
ARC cpus without a barrel shifter.
Compiling the following test case:
int foo(int x) { return (x<<27)>>27; }
with -O2 -mcpu=em, generates two loops:
foo: mov lp_count,27
lp 2f
add r0,r0,r0
nop
2: # end single insn loop
mov lp_count,27
lp 2f
asr r0,r0
nop
2: # end single insn loop
j_s [blink]
and the closely related test case:
struct S { int a : 5; };
int bar (struct S *p) { return p->a; }
generates the slightly better:
bar: ldb_s r0,[r0]
mov_s r2,0 ;3
add3 r0,r2,r0
sexb_s r0,r0
asr_s r0,r0
asr_s r0,r0
j_s.d [blink]
asr_s r0,r0
which uses 6 instructions to perform this particular sign extension.
It turns out that sign extensions can always be implemented using at
most three instructions on ARC (without a barrel shifter) using the
idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
of Henry Warren's book "Hacker's Delight"]. Using this, the sign
extensions above on ARC's EM both become:
bmsk_s r0,r0,4
xor r0,r0,32
sub r0,r0,32
which takes about 3 cycles, compared to the ~112 cycles for the loops
in foo.
Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?
2023-12-05 Roger Sayle <ro...@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
implement SImode sign extract using a AND, XOR and MINUS sequence.
Note with this sequence in place (assuming it moves forward on the ARC),
you may be able to build a better generalized signed bitfield extraction.
Rather than a shift-left followed by an arithmetic shift right, you can
instead do a logical shift right to get the field into the LSBs, then
use the this pattern to implement the sign extension from the MSB of the
field.
Given it saves a potentially very expensive shift, it may be worth
exploring for the ARC.
I've done this for the H8. Not every sequence is better, but many are.
There's improvements that could be made, but we're probably capturing
the vast majority of the benefit in the patch I'm currently testing.
Anyway just thought I'd point out the natural follow-on from this effort.
Jeff