| Issue |
164200
|
| Summary |
[X86] 8-bit vector multiplication should use shift and add method for more constants
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
WalterKruger
|
Vector multiplication by most 8-bit constants is currently implemented by a width extension to 16-bits:
```asm
multiplyBy10_clang:
movdqa xmm1, xmm0
punpckhbw xmm1, xmm1
movdqa xmm2, xmmword ptr [rip + .LCPI0_0]
pmullw xmm1, xmm2
movdqa xmm3, xmmword ptr [rip + .LCPI0_1]
pand xmm1, xmm3
punpcklbw xmm0, xmm0
pmullw xmm0, xmm2
pand xmm0, xmm3
packuswb xmm0, xmm1
ret
```
However, it is often more efficient to instead perform a short sequence of shift-and-adds both in terms of size and dependency length. For example, `x * 10 = (x << 3) + (x << 1)`:
```asm
multiplyBy10_shiftAndAdd:
movdqa xmm1, xmm0
paddb xmm0, xmm0
psllw xmm1, 3
pand xmm1, xmmword ptr [rip + .LCPI0_0]
paddb xmm0, xmm1
ret
```
This method is currently implemented, but only for constants that are almost powers of two. Notably, gcc always use this method (although its sequences are often non-optimal).
https://godbolt.org/z/naKxr6z6a
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs