https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109892
Bug ID: 109892 Summary: SLP failure with explicit fma Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- At -O2 -mfma (x86) or -O3 (arm64) we fail to SLP-vectorize 'f', but succeed in 'g': double f(double x[], long n) { double r0 = 0, r1 = 0; for (; n; x += 2, n--) { r0 = __builtin_fma(x[0], x[0], r0); r1 = __builtin_fma(x[1], x[1], r1); } return r0 + r1; } static double muladd(double x, double y, double z) { return x * y + z; } double g(double x[], long n) { double r0 = 0, r1 = 0; for (; n; x += 2, n--) { r0 = muladd(x[0], x[0], r0); r1 = muladd(x[1], x[1], r1); } return r0 + r1; } It seems we are calling vectorizable_reduction for __builtin_fma even though it would not participate in a reduction when vectorizing for 16-byte vectors?