Issue 181455
Summary [X86] Vector 8-bit `icmp ugt + blend` with constant should use saturation arithmetic to avoid compare
Labels new issue
Assignees
Reporter WalterKruger
    Due to gaps in support on x86, unsigned vector compares are implemented by checking if one of the operands is equal to the maximum/minimum (e.g. `a >= b` => `max(a, b) == a`). This method is often paired with `blendv`, which performs a conditional selection:

```asm
selectIfGreater:
        movdqa  xmm3, xmm0
 movdqa  xmm0, xmmword ptr [rip + .LCPI0]
        pminub  xmm0, xmm2
 pcmpeqb xmm0, xmm2
        pblendvb        xmm3, xmm1, xmm0
        movdqa xmm0, xmm3
        ret
```

https://godbolt.org/z/4Gvoqrj1P

Blend only checks the most significant bits of the "mask" input, so it is possible to use a single unsigned saturation add/sub to emulate a compare (which is one instruction shorter). The method differs slightly based on the size of the compare constant: 

```
(C < 127): blendv(a, b, addSat(x, 127 - C))
(C > 127): blendv(a, b, subSat(x, C - 127))
```

This appears to only be beneficial for 8-bits due to it supporting both a granular blendv and saturation arithmetic. (Although 64-bit can benefit from a slightly modified version: #181454)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to