| Issue |
169691
|
| Summary |
Sub-optimal codegen when adding with carry
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
andrepd
|
Consider the following Rust code (no need to know Rust, it's very straightforward).
```rust
pub fn f1(q: &mut [u64; 8], implicit: u64) {
let mut carry = false;
for i in 0 .. q.len() {
let (r, o) = u64::carrying_add(q[i], implicit, carry); // Add q[i] + implicit + carry, return tuple with "result modulo 2^64" and "new carry"
q[i] = r;
carry = o;
}
}
```
On x86, this gets compiled to what you would expect.
```asm
add qword ptr [rdi], rsi
adc qword ptr [rdi + 8], rsi
adc qword ptr [rdi + 16], rsi
adc qword ptr [rdi + 24], rsi
adc qword ptr [rdi + 32], rsi
adc qword ptr [rdi + 40], rsi
adc qword ptr [rdi + 48], rsi
adc qword ptr [rdi + 56], rsi
ret
```
However, if you take the `carry` as a parameter, rather than initialising it to zero
```rust
pub fn f2(q: &mut [u64; 8], implicit: u64, mut carry: bool) {
for i in 0 .. q.len() {
let (r, o) = u64::carrying_add(q[i], implicit, carry);
q[i] = r;
carry = o;
}
}
```
I would expect it to be compiled to
```asm
add dl, -1 ; Set carry flag to 0 if
adc qword ptr [rdi], rsi
adc qword ptr [rdi + 8], rsi
adc qword ptr [rdi + 16], rsi
adc qword ptr [rdi + 24], rsi
adc qword ptr [rdi + 32], rsi
adc qword ptr [rdi + 40], rsi
adc qword ptr [rdi + 48], rsi
adc qword ptr [rdi + 56], rsi
ret
```
However llvm (via rustc) writes it instead in a very roundabout way:
```asm
mov rax, qword ptr [rdi]
add rax, rsi
setb cl
mov edx, edx
add rdx, rax
setb al
or al, cl
mov qword ptr [rdi], rdx
add al, -1
adc qword ptr [rdi + 8], rsi
adc qword ptr [rdi + 16], rsi
adc qword ptr [rdi + 24], rsi
adc qword ptr [rdi + 32], rsi
adc qword ptr [rdi + 40], rsi
adc qword ptr [rdi + 48], rsi
adc qword ptr [rdi + 56], rsi
```
This is a godbolt illustrating this. https://godbolt.org/z/9M1Tr4cGj
This is the equivalent godbolt, but in C++: https://godbolt.org/z/sKz5vrEjr . If you look at this one, you can switch to gcc to see how it gets it right.
All my attempts to nudge codegen into the optimsed form (e.g. by explicitly writing something like `let (dummy, o) = u64::carrying_add(0, u64::MAX, carry); carry = 0`) have failed. This interests me because the optimised codegen cuts an inner loop in a program of mine in half! :)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs