https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- So __m128d f(__m128d a, __m128d b) { return _mm_mul_pd(_mm_shuffle_pd(a, a, 0), _mm_shuffle_pd(b, b, 0)); } is expanded as _3 = VEC_PERM_EXPR <b_2(D), b_2(D), { 0, 0 }>; _5 = VEC_PERM_EXPR <a_4(D), a_4(D), { 0, 0 }>; _6 = _3 * _5; return _6; but vector lowering ssa_uniform_vector_p doesn't yet handle VEC_PERM_EXPRs with all-zero permute. Hacking that in (not fixing the fallout) produces <bb 2> [local count: 1073741824]: _7 = BIT_FIELD_REF <b_2(D), 64, 0>; _8 = BIT_FIELD_REF <a_4(D), 64, 0>; _9 = _7 * _8; _6 = {_9, _9}; and f: .LFB534: .cfi_startproc mulsd %xmm1, %xmm0 unpcklpd %xmm0, %xmm0 ret