https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90073
Bug ID: 90073 Summary: Very slow code for AVX2 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rcc.dark at gmail dot com Target Milestone: --- Created attachment 46155 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46155&action=edit Source code and assembly output Hi all, the following code generates very poor assembly code for AVX2 targets (GCC 8.2, 8.3 and trunk; compiler flags -O3 -mavx2). --- #include <x86intrin.h> using data = long long __attribute__((vector_size(64))); void f(data& a, const data& x1, const data& x2) { a ^= x1 ^ x2; } --- GCC generates 128-loads and stores, which leads to STLF stalls. Possibly a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80689 ICC generates the expected code. I attach GCC and ICC outputs, but you can check them here https://godbolt.org/z/bwtGUE