Issue 169691
Summary Sub-optimal codegen when adding with carry
Labels new issue
Assignees
Reporter andrepd
    Consider the following Rust code (no need to know Rust, it's very straightforward).

```rust
pub fn f1(q: &mut [u64; 8], implicit: u64) {
    let mut carry = false;
    for i in 0 .. q.len() {
        let (r, o) = u64::carrying_add(q[i], implicit, carry);  // Add q[i] + implicit + carry, return tuple with "result modulo 2^64" and "new carry"
        q[i] = r;
        carry = o;
    }
}
```

On x86, this gets compiled to what you would expect.

```asm
add     qword ptr [rdi], rsi
adc     qword ptr [rdi + 8], rsi
adc     qword ptr [rdi + 16], rsi
adc qword ptr [rdi + 24], rsi
adc     qword ptr [rdi + 32], rsi
adc qword ptr [rdi + 40], rsi
adc     qword ptr [rdi + 48], rsi
adc     qword ptr [rdi + 56], rsi
ret
```

However, if you take the `carry` as a parameter, rather than initialising it to zero

```rust
pub fn f2(q: &mut [u64; 8], implicit: u64, mut carry: bool) {
    for i in 0 .. q.len() {
 let (r, o) = u64::carrying_add(q[i], implicit, carry);
        q[i] = r;
        carry = o;
    }
}
```

I would expect it to be compiled to 

```asm
add     dl, -1  ; Set carry flag to 0 if 
adc     qword ptr [rdi], rsi
adc     qword ptr [rdi + 8], rsi
adc     qword ptr [rdi + 16], rsi
adc     qword ptr [rdi + 24], rsi
adc     qword ptr [rdi + 32], rsi
adc     qword ptr [rdi + 40], rsi
adc     qword ptr [rdi + 48], rsi
adc     qword ptr [rdi + 56], rsi
ret
```

However llvm (via rustc) writes it instead in a very roundabout way:

```asm
mov     rax, qword ptr [rdi]
add     rax, rsi
setb    cl
mov     edx, edx
add     rdx, rax
setb al
or      al, cl
mov     qword ptr [rdi], rdx
add     al, -1
adc qword ptr [rdi + 8], rsi
adc     qword ptr [rdi + 16], rsi
adc     qword ptr [rdi + 24], rsi
adc     qword ptr [rdi + 32], rsi
adc     qword ptr [rdi + 40], rsi
adc     qword ptr [rdi + 48], rsi
adc     qword ptr [rdi + 56], rsi
```

This is a godbolt illustrating this. https://godbolt.org/z/9M1Tr4cGj

This is the equivalent godbolt, but in C++: https://godbolt.org/z/sKz5vrEjr . If you look at this one, you can switch to gcc to see how it gets it right.

All my attempts to nudge codegen into the optimsed form (e.g. by explicitly writing something like `let (dummy, o) = u64::carrying_add(0, u64::MAX, carry); carry = 0`) have failed. This interests me because the optimised codegen cuts an inner loop in a program of mine in half! :)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to