| Issue |
179584
|
| Summary |
[RISC-V] suboptimal big integer comparisons
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
tom-rein
|
When comparing integers larger than register width on RISC-V targets with Zicond, there are useless extra `czero.eqz` ([godbolt](https://godbolt.org/z/hqW4qGbKP))
```C++
using I = std::size_t;
constexpr int digits = std::numeric_limits<I>::digits;
using L = unsigned _BitInt(2 * digits);
bool test_lt(L x, L y) {
return x < y;
}
```
```
test_lt(unsigned _BitInt(128), unsigned _BitInt(128)):
xor a4, a1, a3
sltu a1, a1, a3
sltu a0, a0, a2
czero.eqz a1, a1, a4
czero.nez a0, a0, a4
or a0, a0, a1
ret
```
a1 is of course always already zero when a4 is zero.
When there is a branch based on the comparison, the branch and the `or` can be combined, because the operands of the `or` are never both nonzero we can use a `beq` / `bne` instead of `or` + `beqz` / `bnez`.
There is an alternative way to do branch free compares with just base integer instructions:
```C++
bool test_lt2(I x0, I x1, I y0, I y1) {
return ((y1 < x1) - (x1 < y1)) < (x0 < y0);
}
```
```
test_lt2(unsigned _BitInt(128), unsigned _BitInt(128)):
sltu a4, a1, a3
sltu a1, a3, a1
sub a1, a1, a4
sltu a0, a0, a2
slt a0, a1, a0
ret
```
This has a similar critical path length as the `czero` version, except the `czero.nez` may be fused, making it potentially faster on some targets.
The size is only two bytes larger if the xor can be compressed.
For greater or equal comparison currently there is just an extra `xori 1` at the end, but there could be more parallelism if something like this gets generated:
```
test_ge3(unsigned _BitInt(128), unsigned _BitInt(128)):
sltu a2, a0, a2
sltu a0, a1, a3
xori a0, a0, 1
xor a1, a1, a3
czero.nez a2, a2, a1
czero.nez a0, a0, a2
ret
```
The alternative base integer version could use this:
```
test_ge2(unsigned _BitInt(128), unsigned _BitInt(128)):
sltu a0, a0, a2
sltu a2, a3, a1
sltu a1, a1, a3
addi a0, a0, -1
sub a1, a1, a2
slt a0, a0, a1
ret
```
The `addi` can be compressed unlike the `xori`.
With my experiments I also noticed that some boolean combinations fail to use conditional zeros:
```C++
bool test_lt3(I x0, I x1, I y0, I y1) {
bool b0 = x0 < y0;
bool b1 = x1 < y1;
if (x1 != y1) b0 = false;
b1 |= b0;
return b1;
}
```
```
test_lt3(unsigned long, unsigned long, unsigned long, unsigned long):
sltu a0, a0, a2
xor a2, a1, a3
seqz a2, a2
and a0, a0, a2
sltu a1, a1, a3
or a0, a0, a1
ret
```
Here `seqz` + `and` is used instead of `czero.nez`. I think lowerSelectToBinOp() in RISCVISelLowering.cpp is where this comes from.
There seems to also be a general issue, where e.g. a conditional or gets turned into a select, but the information that one argument is already conditionally zero gets lost, like in this example:
```C++
bool test_lt4(I x0, I x1, I y0, I y1) {
bool b0 = x0 < y0;
bool b1 = x1 < y1;
if (x1 == y1) b1 |= b0;
return b1;
}
```
```
test_lt4(unsigned long, unsigned long, unsigned long, unsigned long):
sltu a4, a1, a3
xor a1, a1, a3
sltu a0, a0, a2
czero.eqz a2, a4, a1
czero.nez a0, a0, a1
or a0, a0, a2
ret
```
The above case can be pattern matched in the lowering pass, but the general case with assumes probably needs the select to somehow preserve this information.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs