https://bugs.llvm.org/show_bug.cgi?id=39709

            Bug ID: 39709
           Summary: [X86] Suboptimal code in vXi8 vector multiply
                    reduction
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedb...@nondot.org
          Reporter: craig.top...@gmail.com
                CC: craig.top...@gmail.com, llvm-bugs@lists.llvm.org,
                    llvm-...@redking.me.uk, spatel+l...@rotateright.com

Multiplying vXi8 vectors requires widening elements to 16 bits to use vXi16
pmullw then shrinking back to i8. As of r347240 we use punpacklbw/punpackhbw to
do the expansion create undef upper elements and we use an AND+PACKUS to merge
the high and low unpacked values back together after the two pmullw.

When we're doing a horizontal reduction we end up packing after each step and
then unpacking at the start of the next step. It would be great if we could
combine these size changes away.

Some of the packs and unpacks are separated by shuffles to move elements from
higher elements to lower elements to do the reduction. We should see if we can
handle widening those element movement shuffles as well.

These things can be seen in vector-reduce-mul.ll

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to