Re: [PATCH] move x86 to use gather/scatter internal functions

Richard Sandiford via Gcc-patches Wed, 18 Aug 2021 07:51:26 -0700

Richard Biener <rguent...@suse.de> writes:
> On Wed, 18 Aug 2021, Hongtao Liu wrote:
>
>> On Wed, Aug 18, 2021 at 7:37 PM Hongtao Liu <crazy...@gmail.com> wrote:
>> >
>> > On Wed, Aug 18, 2021 at 7:30 PM Hongtao Liu <crazy...@gmail.com> wrote:
>> > >
>> > > On Wed, Aug 18, 2021 at 6:28 PM Richard Biener <rguent...@suse.de> wrote:
>> > > >
>> > > > On Wed, 18 Aug 2021, Richard Biener wrote:
>> > > >
>> > > > >
>> > > > > So in the end I seem to be able to combine AVX & AVX512 arriving
>> > > > > at the following which passes basic testing.  I will now see to
>> > > > > teach the vectorizer the required "promotion" to handle
>> > > > > mask_gather_loadv4dfv4si and mask_gather_loadv4sfv4di.
>> > > > >
>> > > > > Meanwhile, do you see any hole in the below?  If not I'll
>> > > > > do mask_scatter_store accordingly (that will be simpler since
>> > > > > AVX doesn't have scatter).
>> > > >
>> > > > There seems to be one more complication ... we have
>> > > >
>> > > > (define_expand "avx2_gatherdi<mode>"
>> > > >   [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand")
>> > > >                    (unspec:VEC_GATHER_MODE
>> > > >                      [(match_operand:<VEC_GATHER_SRCDI> 1
>> > > > "register_operand")
>> > > >                       (mem:<ssescalarmode>
>> > > >                         (match_par_dup 6
>> > > >                           [(match_operand 2 "vsib_address_operand")
>> > > >                            (match_operand:<VEC_GATHER_IDXDI>
>> > > >                               3 "register_operand")
>> > > >
>> > > > but VEC_GATHER_IDXDI is
>> > > >
>> > > > (define_mode_attr VEC_GATHER_IDXDI
>> > > >                       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
>> > > >                        (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
>> > > >                        (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI")
>> > > >                        (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")])
>> > > >
>> > > > I'd have expected (V4SF "V4DI") for example, or (V8SF "V8DI").
>> > > VEX.128 version: For dword indices, the instruction will gather four
>> > > single-precision floating-point values. For
>> > > qword indices, the instruction will gather two values and zero the
>> > > upper 64 bits of the destination.
>> > > VEX.256 version: For dword indices, the instruction will gather eight
>> > > single-precision floating-point values. For
>> > > qword indices, the instruction will gather four values and zero the
>> > > upper 128 bits of the destination.
>> > >
>> > > So, for expander name, it should be v2sfv2di and v4sfv4di for IDXDI
>> > > under avx2, and v8sfv8di under avx512.
>> > >
>> > > ----cut pattern----
>> > > (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>_2"
>> > >   [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x")
>> > > (unspec:VEC_GATHER_MODE
>> > >   [(pc)
>> > >    (match_operator:<ssescalarmode> 6 "vsib_mem_operator"
>> > >      [(unspec:P
>> > > [(match_operand:P 2 "vsib_address_operand" "Tv")
>> > > (match_operand:<VEC_GATHER_IDXDI> 3 "register_operand" "x")
>> > > (match_operand:SI 5 "const1248_operand" "n")]
>> > > UNSPEC_VSIBADDR)])
>> > >    (mem:BLK (scratch))
>> > >    (match_operand:<VEC_GATHER_SRCDI> 4 "register_operand" "1")]
>> > >   UNSPEC_GATHER))
>> > >    (clobber (match_scratch:VEC_GATHER_MODE 1 "=&x"))]
>> > >   "TARGET_AVX2"
>> > > {
>> > >   if (<VEC_GATHER_MODE:MODE>mode != <VEC_GATHER_SRCDI>mode)
>> > >     return "%M2v<sseintprefix>gatherq<ssemodesuffix>\t{%4, %6,
>> > > %x0|%x0, %6, %4}";
>> > > ----cut end---------------
>> > > We are using the trick of the operand modifier %x0 to force print xmm.
>> > (define_mode_attr VEC_GATHER_SRCDI
>> >       [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
>> >        (V2DF "V2DF") (V4DF "V4DF") (V8DF "V8DF")
>> >        (V4SI "V4SI") (V8SI "V4SI") (V16SI "V8SI")
>> >        (V4SF "V4SF") (V8SF "V4SF") (V16SF "V8SF")])
>> >
>> > (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>"
>> >   [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x")
>> > (unspec:VEC_GATHER_MODE
>> >   [(match_operand:VEC_GATHER_MODE 2 "register_operand" "0")
>> >    (match_operator:<ssescalarmode> 7 "vsib_mem_operator"
>> >      [(unspec:P
>> > [(match_operand:P 3 "vsib_address_operand" "Tv")
>> > (match_operand:<VEC_GATHER_IDXSI> 4 "register_operand" "x")
>> > (match_operand:SI 6 "const1248_operand" "n")]
>> > UNSPEC_VSIBADDR)])
>> >    (mem:BLK (scratch))
>> >    (match_operand:VEC_GATHER_MODE 5 "register_operand" "1")]
>> >   UNSPEC_GATHER))
>> >    (clobber (match_scratch:VEC_GATHER_MODE 1 "=&x"))]
>> >   "TARGET_AVX2"
>> >   "%M3v<sseintprefix>gatherd<ssemodesuffix>\t{%1, %7, %0|%0, %7, %1}"
>> >
>> > Or only print operands[1] which has mode VEC_GATHER_SRCDI as index.
>> 
>> Typo should be operands[2] in the below one
>> (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>"
>>   [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x")
>> (unspec:VEC_GATHER_MODE
>>   [(match_operand:<VEC_GATHER_SRCDI> 2 "register_operand" "0")
>>    (match_operator:<ssescalarmode> 7 "vsib_mem_operator"
>>      [(unspec:P
>> [(match_operand:P 3 "vsib_address_operand" "Tv")
>> (match_operand:<VEC_GATHER_IDXDI> 4 "register_operand" "x")
>> (match_operand:SI 6 "const1248_operand" "n")]
>> UNSPEC_VSIBADDR)])
>>    (mem:BLK (scratch))
>>    (match_operand:<VEC_GATHER_SRCDI> 5 "register_operand" "1")]
>>   UNSPEC_GATHER))
>>    (clobber (match_scratch:VEC_GATHER_MODE 1 "=&x"))]
>>   "TARGET_AVX2"
>>   "%M3v<sseintprefix>gatherq<ssemodesuffix>\t{%5, %7, %2|%2, %7, %5}"
>
> It's somewhat of a mess.  When we expose
>
>   mask_gather_loadv8sfv8di
>
> for example then the vectorizer will with -mavx512f generate a
> vector mask for v8sf which will be a AVX2 style mask.  But in
> the end this will use gen_avx512f_gatherdiv16sf and require a
> AVX512 style mask.


I think it would be OK/sensible to use the larger of the index or
result vectors to determine the mask, if that helps.  There just
wasn't any need to make a distinction for SVE, since there the
mask type is determined entirely by the number of elements.

Thanks,
Richard

Re: [PATCH] move x86 to use gather/scatter internal functions

Reply via email to