https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119464
Bug ID: 119464
Summary: VEC_PERM_EXPR not optimized to pslldq instruction for
AVX2 and AVX512BW
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: mkretz at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Test case (https://compiler-explorer.com/z/Pro5W6e4f):
---
typedef unsigned long long V2 __attribute__((vector_size(16)));
typedef unsigned long long V4 __attribute__((vector_size(32)));
typedef unsigned long long V8 __attribute__((vector_size(64)));
V2 shift(V2 x)
{ return __builtin_shufflevector(x, V2(), 2, 0); }
V4 shift(V4 x)
{ return __builtin_shufflevector(x, V4(), 4, 0, 4, 2); }
V8 shift(V8 x)
{ return __builtin_shufflevector(x, V8(), 8, 0, 8, 2, 8, 4, 8, 6);
---
Clang translates this to the expected
shift(unsigned long long vector[2]):
vpslldq xmm0, xmm0, 8
ret
shift(unsigned long long vector[4]):
vpslldq ymm0, ymm0, 8
ret
shift(unsigned long long vector[8]):
vpslldq zmm0, zmm0, 8
ret
GCC only recognizes vpslldq for vector_size(16), the other two patterns are
missing.