Issue 52866
Summary Missed horizontal reduction in armv8
Labels new issue
Assignees
Reporter uncleasm
    The code compiled with `-O2` or `-O3` flags, using clang-13.0.0.0

```
#include <cstdint>
#include <algorithm>

using veci = int32_t __attribute__((vector_size(16)));

int32_t maxv(veci a) {
    return std::max(std::max(a[0], a[1]), std::max(a[2],a[3]));
}
```

Compiles to
```
maxv(int __vector(4)):                           // @maxv(int __vector(4))
        mov     w8, v0.s[1]
        fmov    w11, s0
        mov     w9, v0.s[2]
        mov     w10, v0.s[3]
        cmp     w11, w8
        csel    w8, w8, w11, lt
        cmp     w9, w10
        csel    w9, w10, w9, lt
        cmp     w8, w9
        csel    w0, w9, w8, lt
        ret
```
, where it should be compiled to

```
maxv(int __vector(4)):                           // @maxv(int __vector(4))
        smaxv   s0, v0.4s
        fmov    w0, s0
        ret
```

In contrast, the x64 backend (with -msse4) is able to perform cross lane comparison with shuffles - technique, that would be available in armv8 as well (with  `b = vextq_s32(a,a,2); a = vmaxq_s32(a,b); b = vextq_s32(a,a,1); a = vmaxq_s32(a,b);`)

Another case highlighting the missed vectorised comparison would be

```
#include <cmath>
using vecf = float __attribute__((vector_size(16)));

bool isfinite_ref(vecf a) {
    return std::isfinite(a[0]) &
    std::isfinite(a[1])&
    std::isfinite(a[2])&
    std::isfinite(a[3]);
}
```

which shows very verbose assembler compared to intel-sse2, which is implemented in parallel.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to