https://bugs.llvm.org/show_bug.cgi?id=39709
Bug ID: 39709
Summary: [X86] Suboptimal code in vXi8 vector multiply
reduction
Product: libraries
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedb...@nondot.org
Reporter: craig.top...@gmail.com
CC: craig.top...@gmail.com, llvm-bugs@lists.llvm.org,
llvm-...@redking.me.uk, spatel+l...@rotateright.com
Multiplying vXi8 vectors requires widening elements to 16 bits to use vXi16
pmullw then shrinking back to i8. As of r347240 we use punpacklbw/punpackhbw to
do the expansion create undef upper elements and we use an AND+PACKUS to merge
the high and low unpacked values back together after the two pmullw.
When we're doing a horizontal reduction we end up packing after each step and
then unpacking at the start of the next step. It would be great if we could
combine these size changes away.
Some of the packs and unpacks are separated by shuffles to move elements from
higher elements to lower elements to do the reduction. We should see if we can
handle widening those element movement shuffles as well.
These things can be seen in vector-reduce-mul.ll
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs