Richard Biener <rguent...@suse.de> writes: > On Wed, 18 Aug 2021, Hongtao Liu wrote: > >> On Wed, Aug 18, 2021 at 7:37 PM Hongtao Liu <crazy...@gmail.com> wrote: >> > >> > On Wed, Aug 18, 2021 at 7:30 PM Hongtao Liu <crazy...@gmail.com> wrote: >> > > >> > > On Wed, Aug 18, 2021 at 6:28 PM Richard Biener <rguent...@suse.de> wrote: >> > > > >> > > > On Wed, 18 Aug 2021, Richard Biener wrote: >> > > > >> > > > > >> > > > > So in the end I seem to be able to combine AVX & AVX512 arriving >> > > > > at the following which passes basic testing. I will now see to >> > > > > teach the vectorizer the required "promotion" to handle >> > > > > mask_gather_loadv4dfv4si and mask_gather_loadv4sfv4di. >> > > > > >> > > > > Meanwhile, do you see any hole in the below? If not I'll >> > > > > do mask_scatter_store accordingly (that will be simpler since >> > > > > AVX doesn't have scatter). >> > > > >> > > > There seems to be one more complication ... we have >> > > > >> > > > (define_expand "avx2_gatherdi<mode>" >> > > > [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand") >> > > > (unspec:VEC_GATHER_MODE >> > > > [(match_operand:<VEC_GATHER_SRCDI> 1 >> > > > "register_operand") >> > > > (mem:<ssescalarmode> >> > > > (match_par_dup 6 >> > > > [(match_operand 2 "vsib_address_operand") >> > > > (match_operand:<VEC_GATHER_IDXDI> >> > > > 3 "register_operand") >> > > > >> > > > but VEC_GATHER_IDXDI is >> > > > >> > > > (define_mode_attr VEC_GATHER_IDXDI >> > > > [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI") >> > > > (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI") >> > > > (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI") >> > > > (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")]) >> > > > >> > > > I'd have expected (V4SF "V4DI") for example, or (V8SF "V8DI"). >> > > VEX.128 version: For dword indices, the instruction will gather four >> > > single-precision floating-point values. For >> > > qword indices, the instruction will gather two values and zero the >> > > upper 64 bits of the destination. >> > > VEX.256 version: For dword indices, the instruction will gather eight >> > > single-precision floating-point values. For >> > > qword indices, the instruction will gather four values and zero the >> > > upper 128 bits of the destination. >> > > >> > > So, for expander name, it should be v2sfv2di and v4sfv4di for IDXDI >> > > under avx2, and v8sfv8di under avx512. >> > > >> > > ----cut pattern---- >> > > (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>_2" >> > > [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x") >> > > (unspec:VEC_GATHER_MODE >> > > [(pc) >> > > (match_operator:<ssescalarmode> 6 "vsib_mem_operator" >> > > [(unspec:P >> > > [(match_operand:P 2 "vsib_address_operand" "Tv") >> > > (match_operand:<VEC_GATHER_IDXDI> 3 "register_operand" "x") >> > > (match_operand:SI 5 "const1248_operand" "n")] >> > > UNSPEC_VSIBADDR)]) >> > > (mem:BLK (scratch)) >> > > (match_operand:<VEC_GATHER_SRCDI> 4 "register_operand" "1")] >> > > UNSPEC_GATHER)) >> > > (clobber (match_scratch:VEC_GATHER_MODE 1 "=&x"))] >> > > "TARGET_AVX2" >> > > { >> > > if (<VEC_GATHER_MODE:MODE>mode != <VEC_GATHER_SRCDI>mode) >> > > return "%M2v<sseintprefix>gatherq<ssemodesuffix>\t{%4, %6, >> > > %x0|%x0, %6, %4}"; >> > > ----cut end--------------- >> > > We are using the trick of the operand modifier %x0 to force print xmm. >> > (define_mode_attr VEC_GATHER_SRCDI >> > [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI") >> > (V2DF "V2DF") (V4DF "V4DF") (V8DF "V8DF") >> > (V4SI "V4SI") (V8SI "V4SI") (V16SI "V8SI") >> > (V4SF "V4SF") (V8SF "V4SF") (V16SF "V8SF")]) >> > >> > (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>" >> > [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x") >> > (unspec:VEC_GATHER_MODE >> > [(match_operand:VEC_GATHER_MODE 2 "register_operand" "0") >> > (match_operator:<ssescalarmode> 7 "vsib_mem_operator" >> > [(unspec:P >> > [(match_operand:P 3 "vsib_address_operand" "Tv") >> > (match_operand:<VEC_GATHER_IDXSI> 4 "register_operand" "x") >> > (match_operand:SI 6 "const1248_operand" "n")] >> > UNSPEC_VSIBADDR)]) >> > (mem:BLK (scratch)) >> > (match_operand:VEC_GATHER_MODE 5 "register_operand" "1")] >> > UNSPEC_GATHER)) >> > (clobber (match_scratch:VEC_GATHER_MODE 1 "=&x"))] >> > "TARGET_AVX2" >> > "%M3v<sseintprefix>gatherd<ssemodesuffix>\t{%1, %7, %0|%0, %7, %1}" >> > >> > Or only print operands[1] which has mode VEC_GATHER_SRCDI as index. >> >> Typo should be operands[2] in the below one >> (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>" >> [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x") >> (unspec:VEC_GATHER_MODE >> [(match_operand:<VEC_GATHER_SRCDI> 2 "register_operand" "0") >> (match_operator:<ssescalarmode> 7 "vsib_mem_operator" >> [(unspec:P >> [(match_operand:P 3 "vsib_address_operand" "Tv") >> (match_operand:<VEC_GATHER_IDXDI> 4 "register_operand" "x") >> (match_operand:SI 6 "const1248_operand" "n")] >> UNSPEC_VSIBADDR)]) >> (mem:BLK (scratch)) >> (match_operand:<VEC_GATHER_SRCDI> 5 "register_operand" "1")] >> UNSPEC_GATHER)) >> (clobber (match_scratch:VEC_GATHER_MODE 1 "=&x"))] >> "TARGET_AVX2" >> "%M3v<sseintprefix>gatherq<ssemodesuffix>\t{%5, %7, %2|%2, %7, %5}" > > It's somewhat of a mess. When we expose > > mask_gather_loadv8sfv8di > > for example then the vectorizer will with -mavx512f generate a > vector mask for v8sf which will be a AVX2 style mask. But in > the end this will use gen_avx512f_gatherdiv16sf and require a > AVX512 style mask.
I think it would be OK/sensible to use the larger of the index or result vectors to determine the mask, if that helps. There just wasn't any need to make a distinction for SVE, since there the mask type is determined entirely by the number of elements. Thanks, Richard