https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Uroš Bizjak from comment #7) > Created attachment 54607 [details] > Proposed patch > > Patch in testing. > > Attached patch produces (-O2 -msse4.1): > > f: > subq $24, %rsp > xorl %eax, %eax > vmovaps %xmm0, (%rsp) > call g > vmovaps (%rsp), %xmm1 > addq $24, %rsp > vinsertps $64, %xmm0, %xmm1, %xmm0 > ret I'm thinking of something like below so it can be matched both by expand_vselect_vconcat in ix86_expand_vec_perm_const_1 and patterns created by pass_combine(theoretically). +(define_insn_and_split "*sse4_1_insertps_1" + [(set (match_operand:VI4F_128 0 "register_operand") + (vec_select:VI4F_128 + (vec_concat:<ssedoublevecmode> + (match_operand:VI4F_128 1 "register_operand") + (match_operand:VI4F_128 2 "register_operand")) + (match_parallel 3 "insertps_parallel" + [(match_operand 4 "const_int_operand")])))] + "TARGET_SSE4_1 && ix86_pre_reload_split ()" + "#" + "&& 1"