| Issue |
174214
|
| Summary |
Improving performance of 2xi64 icmp slt/sgt/ult/ugt on SSE2 using 2xi64 sub
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
johnplatts
|
Here are links to LLVM IR snippets demonstrating the two alternatives for 2xi64 icmp sgt on SSE2:
- https://alive2.llvm.org/ce/z/gtqKGW
- https://godbolt.org/z/1fa5zczna
Here is a more efficient implementation (at least according to llvm-mca) of 2xi64 icmp slt on SSE2:
```
SSE2_2xI64_CompareGt_2: # @SSE2_2xI64_CompareGt_2
movdqa xmm2, xmm0
pcmpgtd xmm2, xmm1
# xmm2 == {(int32_t)a[0] > (int32_t)b[0],
# (int32_t)(a[0] >> 32) > (int32_t)(b[0] >> 32),
# (int32_t)a[1] > (int32_t)b[1],
# (int32_t)(a[1] >> 32) > (int32_t)(b[1] >> 32)}
# (as 4xi32 vector)
movdqa xmm3, xmm0
pcmpeqd xmm3, xmm1
# xmm3 == {(int32_t)a[0] == (int32_t)b[0],
# (int32_t)(a[0] >> 32) == (int32_t)(b[0] >> 32),
# (int32_t)a[1] == (int32_t)b[1],
# (int32_t)(a[1] >> 32) == (int32_t)(b[1] >> 32)}
# (as 4xi32 vector)
psubq xmm1, xmm0
# xmm2 == {b[0] - a[0], b[1] - a[1]} (as 2xi64 vector)
# If (a[0] >> 32) == (b[0] >> 32) && a[0] <= b[0] is true, then
# (xmm1[0] >> 32) == 0 will be true (if a, b, and xmm1 are treated as
# 2xi64 vectors).
# If (a[0] >> 32) == (b[0] >> 32) && a[0] > b[0] is true, then
# (xmm1[0] >> 32) == -1 will be true (if a, b, and xmm1 are treated as
# 2xi64 vectors).
pand xmm1, xmm3
# If xmm1, a, and b are treated as 2xi64 vectors:
# (xmm1[0] >> 32) == (((a[0] >> 32) == (b[0] >> 32) && a[0] > b[0]) ?
# -1 : 0)
# (xmm1[1] >> 32) == (((a[1] >> 32) == (b[1] >> 32) && a[1] > b[1]) ?
# -1 : 0)
# (xmm2[0] >> 32) == (((a[0] >> 32) > (b[0] >> 32)) ? -1 : 0)
# (xmm2[1] >> 32) == (((a[1] >> 32) > (b[1] >> 32)) ? -1 : 0)
por xmm1, xmm2
# If xmm1, a, and b are treated as 2xi64 vectors:
# (xmm1[0] >> 32) == ((a[0] > b[0]) ? -1 : 0)
# (xmm1[1] >> 32) == ((a[1] > b[1]) ? -1 : 0)
pshufd xmm0, xmm1, 245 # xmm0 = xmm1[1,1,3,3]
# xmm0 == { a[0] > b[0] ? -1 : 0, a[1] > b[1] ? -1 : 0 }
# (as 2xi64 vector)
ret
```
If `(a >> 32) == (b >> 32)` is true, then `-2**32 + 1 <= b - a <= 2**32 - 1` will be true, making `(b - a) >> 32` equal to either -1 or 0 if `(a >> 32)` is equal to `(b >> 32)` (with right shifts being arithmetic right shifts and `a` and `b` being 64-bit signed integers).
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs