https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #7)
> Created attachment 54607 [details]
> Proposed patch
> 
> Patch in testing.
> 
> Attached patch produces (-O2 -msse4.1):
> 
> f:
>         subq    $24, %rsp
>         xorl    %eax, %eax
>         vmovaps %xmm0, (%rsp)
>         call    g
>         vmovaps (%rsp), %xmm1
>         addq    $24, %rsp
>         vinsertps       $64, %xmm0, %xmm1, %xmm0
>         ret

I'm thinking of something like below so it can be matched both by
expand_vselect_vconcat in ix86_expand_vec_perm_const_1 and patterns created by
pass_combine(theoretically).

+(define_insn_and_split "*sse4_1_insertps_1"
+  [(set (match_operand:VI4F_128 0 "register_operand")
+       (vec_select:VI4F_128
+         (vec_concat:<ssedoublevecmode>
+           (match_operand:VI4F_128 1 "register_operand")
+           (match_operand:VI4F_128 2 "register_operand"))
+         (match_parallel 3 "insertps_parallel"
+           [(match_operand 4 "const_int_operand")])))]
+  "TARGET_SSE4_1 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"

Reply via email to