https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104946
Bug ID: 104946 Summary: [12 regression] Suboptimal gimple foding for blendvpd under sse4.1 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- When working on PR104666, i found cat test.c typedef double __m128d __attribute__((__vector_size__(16), __may_alias__)); __m128d sse4_1_blendvpd (__m128d a, __m128d b, __m128d c) __attribute__((__target__("sse4.1"))); __m128d generic_blendvpd (__m128d a, __m128d b, __m128d c) { return __builtin_ia32_blendvpd (a, b, c); } gcc -O2 -msse4.1 -mno-sse4.2 generic_blendvpd: movq rax, xmm2 movapd xmm3, xmm0 test rax, rax jns .L3 movapd xmm0, xmm1 .L3: pextrq rax, xmm2, 1 unpckhpd xmm3, xmm3 test rax, rax jns .L5 unpckhpd xmm1, xmm1 movapd xmm3, xmm1 .L5: unpcklpd xmm0, xmm3 ret It's because it pcmpgtq is under sse4.2 w/o which vec_cmpv2di will be lower to scalar operations and not combined back. w/ sse4.2 gcc can generate optimal code. generic_blendvpd: movapd xmm3, xmm0 movdqa xmm0, xmm2 blendvpd xmm3, xmm1, xmm0 movapd xmm0, xmm3 ret