https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109892

            Bug ID: 109892
           Summary: SLP failure with explicit fma
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

At -O2 -mfma (x86) or -O3 (arm64) we fail to SLP-vectorize 'f', but succeed in
'g':

double f(double x[], long n)
{
    double r0 = 0, r1 = 0;
    for (; n; x += 2, n--) {
        r0 = __builtin_fma(x[0], x[0], r0);
        r1 = __builtin_fma(x[1], x[1], r1);
    }
    return r0 + r1;
}
static double muladd(double x, double y, double z)
{
    return x * y + z;
}
double g(double x[], long n)
{
    double r0 = 0, r1 = 0;
    for (; n; x += 2, n--) {
        r0 = muladd(x[0], x[0], r0);
        r1 = muladd(x[1], x[1], r1);
    }
    return r0 + r1;
}

It seems we are calling vectorizable_reduction for __builtin_fma even though it
would not participate in a reduction when vectorizing for 16-byte vectors?

Reply via email to