Issue 164200
Summary [X86] 8-bit vector multiplication should use shift and add method for more constants
Labels new issue
Assignees
Reporter WalterKruger
    Vector multiplication by most 8-bit constants is currently implemented by a width extension to 16-bits:
```asm
multiplyBy10_clang:
        movdqa  xmm1, xmm0
 punpckhbw       xmm1, xmm1
        movdqa  xmm2, xmmword ptr [rip + .LCPI0_0]
        pmullw  xmm1, xmm2
        movdqa  xmm3, xmmword ptr [rip + .LCPI0_1]
        pand    xmm1, xmm3
        punpcklbw       xmm0, xmm0
 pmullw  xmm0, xmm2
        pand    xmm0, xmm3
        packuswb xmm0, xmm1
        ret
```

However, it is often more efficient to instead perform a short sequence of shift-and-adds both in terms of size and dependency length. For example, `x * 10 = (x << 3) + (x << 1)`:
```asm
multiplyBy10_shiftAndAdd:
        movdqa  xmm1, xmm0
 paddb   xmm0, xmm0
        psllw   xmm1, 3
        pand    xmm1, xmmword ptr [rip + .LCPI0_0]
        paddb   xmm0, xmm1
        ret
```

This method is currently implemented, but only for constants that are almost powers of two. Notably, gcc always use this method (although its sequences are often non-optimal).

https://godbolt.org/z/naKxr6z6a
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to