https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80015

            Bug ID: 80015
           Summary: auto vectorization leave scalar code even if it is
                    unreachable
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vanyacpp at gmail dot com
  Target Milestone: ---

Consider these two versions of dot_product:

#include <cstdlib>

float dot_product(float const* a,
                  float const* b,
                  size_t n)
{
    a = (float const*)__builtin_assume_aligned(a, 16);
    b = (float const*)__builtin_assume_aligned(b, 16);

    if ((n % 4) != 0)
       return 0.;                    // (1)
//       __builtin_unreachable();    // (2)

    float result = 0.f;

        for (size_t i = 0; i != n; ++i)
      result += a[i] * b[i];

    return result;
}

The code should be compiled with flags -O3 -ffast-math.

In case of (1) the return 0. is performed when n is not a multiple of 4, in (2)
__builtin_unreachable() is invoked. The code (2) with __builtin_unreachable()
is optimized to the point where only packed operations are used. In the code
(1) with return the scalar operations are still left.

The expected behavior is that gcc should not emit scalar operations in both
versions.

Reply via email to