[Bug target/119368] immintrin code running slower with gcc than clang

hubicka at gcc dot gnu.org via Gcc-bugs Mon, 24 Mar 2025 12:52:49 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368


--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thinking of it more, I think enabling memory alternatives in

(define_insn "sse4_1_<code>v4hiv4si2<mask_name>"
  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
    (any_extend:V4SI
      (vec_select:V4HI
        (match_operand:V8HI 1 "register_operand" "Yr,*x,v")
        (parallel [(const_int 0) (const_int 1)
               (const_int 2) (const_int 3)]))))]
  "TARGET_SSE4_1 && <mask_avx512vl_condition>"
  "%vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
  [(set_attr "isa" "noavx,noavx,avx")
   (set_attr "type" "ssemov")
   (set_attr "prefix_extra" "1")
   (set_attr "prefix" "orig,orig,maybe_evex")
   (set_attr "mode" "TI")])

Would also let LRA to spill source to memory, so it is good idea to include it
in the pattern regardless if combiner is able to simplify vec_select of MEM.

[Bug target/119368] immintrin code running slower with gcc than clang

Reply via email to