https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91142

            Bug ID: 91142
           Summary: Incorrect aligned vector load instruction emitted
                    because of vinserti32x4 elision
           Product: gcc
           Version: 9.1.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

Testcase (cf. https://godbolt.org/z/xBEtqT):

#include <x86intrin.h>

alignas(32) long mem[100] = {};

__m128i f()
{
  __m128i r{};
  __builtin_memcpy(&r, &mem[1], sizeof(r));
  return r;
}

__m512i g()
{
  return _mm512_inserti32x4(__m512i(), f(), 0);
}

Compile with `-O2 -march=knl` or skylake-avx512. `g()` will incorrectly be
translated to an aligned load on GCC 9.1.0, even though it correctly translates
`f()` to an unaligned load. The issue is not present on GCC trunk. Also GCC 8
and below didn't implement PR85480, which introduced the optimization to elide
the vinserti32x4 instruction.

Reply via email to