https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105966
Bug ID: 105966 Summary: x86: operations on certain few-element vectors yield very inefficient code Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jbeulich at suse dot com Target Milestone: --- Respective operations on vectors with more than one element but less than enough elements to fill minimal available register width are decomposed into scalar FMA insns. While this may be on-par for small element counts, it certainly generates absurd code for e.g. AVX512-FP16 with, say, 128- or 256-bit vectors but AVX512VL not enabled. This would be far more efficient by zero-extending the vectors to 512 bits (to avoid exceptions on the unused elements), emitting the FMA insn on %zmm registers, and then using just the low part of the result. (The same likely applies to e.g. plain addition, subtraction, and multiplication.) If necessary the example code from bug 105965 can be re-used to easily see the odd behavior.