https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113090
Bug ID: 113090 Summary: Suboptimal vector permuation for 64-bit vector. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- When working on PR113079, loop vectorizer try to reduc sum of v2si with permuation, x86 backend generates typedef int v2si __attribute__((vector_size(8))); v2si foo (v2si a, v2si b) { return __builtin_shufflevector (a, b, 1, 2); } foo(int __vector(2), int __vector(2)): vpshufb xmm0, xmm0, XMMWORD PTR .LC0[rip] vpshufb xmm1, xmm1, XMMWORD PTR .LC1[rip] vpor xmm0, xmm0, xmm1 But it can be better with .cfi_startproc punpcklqdq %xmm1, %xmm0 pshufd $153, %xmm0, %xmm0 ret