On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are > never longer (yet sometimes shorter) than the corresponding VSHUFPS / > VPSHUFD, due to the immediate operand of the shuffle insns balancing the > need for VEX3 in the broadcast ones. When EVEX encoding is required the > broadcast insns are always shorter. > > Add two new alternatives each, one covering the AVX2 case and one > covering AVX512. I think you can just change assemble output for this first alternative when TARGET_AVX2, use vbroadcastss, else use vshufps since vbroadcastss only accept register operand when TARGET_AVX2. And no need to support 2 extra alternatives which doesn't make sense just make RA more confused about the same meaning of different alternatives. > > gcc/ > > * config/i386/sse.md (vec_dupv4sf): New AVX2 and AVX512F > alternatives using vbroadcastss. > (*vec_dupv4si): New AVX2 and AVX512F alternatives using > vpbroadcastd. > --- > I'm working from the assumption that the isa attributes to the original > 1st and 2nd alternatives don't need further restricting (to sse2_noavx2 > or avx_noavx2 as applicable), as the new earlier alternatives cover all > operand forms already when at least AVX2 is enabled. > > Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss > use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode> > and elsewhere.) Not sure about this part. I grep prefix_extra, seems only used by znver.md/znver4.md for schedule, and only for comi instructions(?the reservation name seems so). > > Is use of Yv for the source operand really necessary in *vec_dupv4si? > I.e. would scalar integer values be put in XMM{16...31} when AVX512VL Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is allowed for scalar mode, but not for 128/256-bit vector modes.
20204 if (TARGET_AVX512F 20205 && (VALID_AVX512F_REG_OR_XI_MODE (mode) 20206 || VALID_AVX512F_SCALAR_MODE (mode))) 20207 return true; > isn't enabled? If so (*movsi_internal / *movdi_internal suggest they > might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd > alternative (or just m, as Yv is already covered by the 2nd one)? I guess xm is more suitable since we still want to allocate operands[1] to register when sse3_noavx. It didn't hit any error since for avx and above, alternative 1(2rd one) is always matched than alternative 2. > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -25798,38 +25798,42 @@ > (const_int 1)))]) > > (define_insn "vec_dupv4sf" > - [(set (match_operand:V4SF 0 "register_operand" "=v,v,x") > + [(set (match_operand:V4SF 0 "register_operand" "=Yv,v,v,v,x") > (vec_duplicate:V4SF > - (match_operand:SF 1 "nonimmediate_operand" "Yv,m,0")))] > + (match_operand:SF 1 "nonimmediate_operand" "v,vm,Yv,m,0")))] > "TARGET_SSE" > "@ > + vbroadcastss\t{%1, %0|%0, %1} > + vbroadcastss\t{%1, %g0|%g0, %1} > vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0} > vbroadcastss\t{%1, %0|%0, %1} > shufps\t{$0, %0, %0|%0, %0, 0}" > - [(set_attr "isa" "avx,avx,noavx") > - (set_attr "type" "sseshuf1,ssemov,sseshuf1") > - (set_attr "length_immediate" "1,0,1") > - (set_attr "prefix_extra" "0,1,*") > - (set_attr "prefix" "maybe_evex,maybe_evex,orig") > - (set_attr "mode" "V4SF")]) > + [(set_attr "isa" "avx2,avx512f,avx,avx,noavx") > + (set_attr "type" "ssemov,ssemov,sseshuf1,ssemov,sseshuf1") > + (set_attr "length_immediate" "0,0,1,0,1") > + (set_attr "prefix_extra" "*,*,0,1,*") > + (set_attr "prefix" "maybe_evex,evex,maybe_evex,maybe_evex,orig") > + (set_attr "mode" "V4SF,V16SF,V4SF,V4SF,V4SF")]) > > (define_insn "*vec_dupv4si" > - [(set (match_operand:V4SI 0 "register_operand" "=v,v,x") > + [(set (match_operand:V4SI 0 "register_operand" "=Yv,v,v,v,x") > (vec_duplicate:V4SI > - (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))] > + (match_operand:SI 1 "nonimmediate_operand" "vm,vm,Yv,m,0")))] > "TARGET_SSE" > "@ > + vpbroadcastd\t{%1, %0|%0, %1} > + vpbroadcastd\t{%1, %g0|%g0, %1} > %vpshufd\t{$0, %1, %0|%0, %1, 0} > vbroadcastss\t{%1, %0|%0, %1} > shufps\t{$0, %0, %0|%0, %0, 0}" > - [(set_attr "isa" "sse2,avx,noavx") > - (set_attr "type" "sselog1,ssemov,sselog1") > - (set_attr "length_immediate" "1,0,1") > - (set_attr "prefix_extra" "0,1,*") > - (set_attr "prefix" "maybe_vex,maybe_evex,orig") > - (set_attr "mode" "TI,V4SF,V4SF") > + [(set_attr "isa" "avx2,avx512f,sse2,avx,noavx") > + (set_attr "type" "ssemov,ssemov,sselog1,ssemov,sselog1") > + (set_attr "length_immediate" "0,0,1,0,1") > + (set_attr "prefix_extra" "*,*,0,1,*") > + (set_attr "prefix" "maybe_evex,evex,maybe_vex,maybe_evex,orig") > + (set_attr "mode" "TI,XI,TI,V4SF,V4SF") > (set (attr "preferred_for_speed") > - (cond [(eq_attr "alternative" "1") > + (cond [(eq_attr "alternative" "3") > (symbol_ref "!TARGET_INTER_UNIT_MOVES_TO_VEC") > ] > (symbol_ref "true")))]) -- BR, Hongtao