https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105966

            Bug ID: 105966
           Summary: x86: operations on certain few-element vectors yield
                    very inefficient code
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jbeulich at suse dot com
  Target Milestone: ---

Respective operations on vectors with more than one element but less than
enough elements to fill minimal available register width are decomposed into
scalar FMA insns. While this may be on-par for small element counts, it
certainly generates absurd code for e.g. AVX512-FP16 with, say, 128- or 256-bit
vectors but AVX512VL not enabled. This would be far more efficient by
zero-extending the vectors to 512 bits (to avoid exceptions on the unused
elements), emitting the FMA insn on %zmm registers, and then using just the low
part of the result. (The same likely applies to e.g. plain addition,
subtraction, and multiplication.)

If necessary the example code from bug 105965 can be re-used to easily see the
odd behavior.

Reply via email to