We produce inefficient code for some synthesized SImode conditional set
operations (i.e. ones that are not directly implemented in hardware) on
RV64. For example a piece of C code like this:
int
sleu (unsigned int x, unsigned int y)
{
return x <= y;
}
gets compiled (at `-O2') to this:
sleu:
sgtua0,a0,a1# 9 [c=4 l=4] *sgtu_disi
xoria0,a0,1 # 10[c=4 l=4] *xorsi3_internal/1
andia0,a0,1 # 16[c=4 l=4] anddi3/1
ret # 25[c=0 l=4] simple_return
or (at `-O1') to this:
sleu:
sgtua0,a0,a1# 9 [c=4 l=4] *sgtu_disi
xoria0,a0,1 # 10[c=4 l=4] *xorsi3_internal/1
sext.w a0,a0 # 16[c=4 l=4] extendsidi2/0
ret # 24[c=0 l=4] simple_return
This is because the middle end expands a SLEU operation missing from
RISC-V hardware into a sequence of a SImode SGTU operation followed by
an explicit SImode XORI operation with immediate 1. And while the SGTU
machine instruction (alias SLTU with the input operands swapped) gives a
properly sign-extended 32-bit result which is valid both as a SImode or
a DImode operand the middle end does not see that through a SImode XORI
operation, because we tell the middle end that the RISC-V target (unlike
MIPS) may hold values in DImode integer registers that are valid for
SImode operations even if not properly sign-extended.
However the RISC-V psABI requires that 32-bit function arguments and
results passed in 64-bit integer registers be properly sign-extended, so
this is explicitly done at the conclusion of the function.
Fix this by making the backend use a sequence of a DImode SGTU operation
followed by a SImode SEQZ operation instead. The latter operation is
known by the middle end to produce a properly sign-extended 32-bit
result and therefore combine gets rid of the sign-extension operation
that follows and actually folds it into the very same XORI machine
operation resulting in:
sleu:
sgtua0,a0,a1# 9 [c=4 l=4] *sgtu_didi
xoria0,a0,1 # 16[c=4 l=4] xordi3/1
ret # 25[c=0 l=4] simple_return
instead (although the SEQZ alias SLTIU against immediate 1 machine
instruction would equally do and is actually retained at `-O0'). This
is handled analogously for the remaining synthesized operations of this
kind, i.e. `SLE', `SGEU', and `SGE'.
gcc/
* config/riscv/riscv.cc (riscv_emit_int_order_test): Use EQ 0
rather that XOR 1 for LE and LEU operations.
gcc/testsuite/
* gcc.target/riscv/sge.c: New test.
* gcc.target/riscv/sgeu.c: New test.
* gcc.target/riscv/sle.c: New test.
* gcc.target/riscv/sleu.c: New test.
---
On Mon, 28 Nov 2022, Jeff Law wrote:
> > > >I have noticed it went nowhere. Can you please check what
> > > > compilation
> > > > options lead to this discrepancy so that we can have the fix included in
> > > > GCC 13? I'd like to understand what's going on here.
> > > FWIW, I don't see the redundant sign extension with this testcase at -O2
> > > on
> > > the trunk. Is it possible the patch has been made redundant over the last
> > > few
> > > months?
> > Maybe at -O2, but the test cases continue to fail in my configuration for
> > other optimisation levels:
> >
> > FAIL: gcc.target/riscv/sge.c -O1 scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sge.c -Og -g scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sgeu.c -O1 scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sgeu.c -Og -g scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sle.c -O1 scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sle.c -Og -g scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sleu.c -O1 scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sleu.c -Og -g scan-assembler-not sext\\.w
>
> I may have been running an rv32 toolchain... So I'll start over and ensure
> that I'm running rv64 :-)
>
>
> With the trunk, I get code like Kito (AND with 0x1 mask)
Right, I have examined assembly produced at -O2 and this is what happens
here as well:
--- sleu-O1.s 2022-11-28 16:31:18.520538342 +
+++ sleu-O2.s 2022-11-28 16:30:27.054241372 +
@@ -10,7 +10,7 @@
sleu:
sgtua0,a0,a1
xoria0,a0,1
- sext.w a0,a0
+ andia0,a0,1
ret
.size sleu, .-sleu
.section.note.GNU-stack,"",@progbits
following Kito's observations. Which is why the tests incorrectly pass at
some optimisation levels while code produced is still suboptimal and just
trivially different.
> The key difference is Roger's patch:
>
> commit c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f
> Author: Roger Sayle
> Date: Wed Aug 3 08:55:35 2022 +0100
>
> Some additional zero-extension related optimizations in simplify-rtx.
>
> T