On 1/10/22 11:58, Roger Sayle wrote:
One of the unusual target features of the Nvidia PTX ISA is that it
doesn't provide QI mode (byte sized) operations or registers.
[ FWIW: I recently happened to check this, and it actually supports
.u8/.s8/.b8 regs, but indeed just for very few operations: ld/st/cvt. ]
Somewhat
conventionally, 8-bit quantities are read from/written to memory using
special instructions, but stored internally using SImode (32-bit) registers.
GCC's middle-end accomodates targets without QImode optabs, by widening
operations until suitable support is found, and with the current nvptx
backend this means 16-bit HImode operations. The inconvenience is that
nvptx is also a TARGET_TRULY_NOOP_TRUNCATION=false target, meaning that
additional instructions are required to convert between the SImode
registers used to hold QImode values, and the HImode registers used to
operate on them (and back again). This results in a large amount of
shuffling and type conversion in code dealing with bytes, i.e. using
char or Boolean types.
This patch improves the situation by providing expanders in the nvptx
machine description to perform QImode operations natively in SImode
instead of HImode. An alternate implementation might be to provide
some form of target hook to specify which fallback modes to use during
RTL expansion, but I think this requirement is unusual, and a solution
entirely in the nvptx backend doesn't disturb/affect other targets.
The improvements can be quite dramatic, as shown in the example below:
int foo(int x, int y) { return (x==21) && (y==69); }
previously with -O2 required 15 instructions:
mov.u32 %r26, %ar0;
mov.u32 %r27, %ar1;
setp.eq.u32 %r31, %r26, 21;
selp.u32 %r30, 1, 0, %r31;
mov.u32 %r29, %r30;
setp.eq.u32 %r34, %r27, 69;
selp.u32 %r33, 1, 0, %r34;
mov.u32 %r32, %r33;
cvt.u16.u8 %r39, %r29;
mov.u16 %r36, %r39;
cvt.u16.u8 %r39, %r32;
mov.u16 %r37, %r39;
and.b16 %r35, %r36, %r37;
cvt.u32.u16 %r38, %r35;
cvt.u32.u8 %value, %r38;
with this patch, now requires only 7 instructions:
mov.u32 %r26, %ar0;
mov.u32 %r27, %ar1;
setp.eq.u32 %r31, %r26, 21;
setp.eq.u32 %r34, %r27, 69;
selp.u32 %r37, 1, 0, %r31;
selp.u32 %r38, 1, 0, %r34;
and.b32 %value, %r37, %r38;
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.
Ok for mainline?
LGTM, applied.
Thanks,
- Tom
2022-01-10 Roger Sayle <ro...@nextmovesoftware.com>
gcc/ChangeLog
* config/nvptx/nvptx.md (cmp<mode>): Renamed from *cmp<mode>.
(setcc<mode>_from_bi): Additionally support QImode.
(extendbi<mode>2): Additionally support QImode.
(zero_extendbi<mode>2): Additionally support QImode.
(any_sbinary, any_ubinary, any_sunary, any_uunary): New code
iterators for signed and unsigned, binary and unary operations.
(<sbinary>qi3, <ubinary>qi3, <sunary>qi2, <uunary>qi2): New
expanders to perform QImode operations using SImode instructions.
(cstoreqi4): New define_expand.
(*ext_truncsi2_qi): New define_insn.
(*zext_truncsi2_qi): New define_insn.
gcc/testsuite/ChangeLog
* gcc.target/nvptx/bool-1.c: New test case.
Thanks in advance,
Roger
--