Richard Biener <rguent...@suse.de> writes:
> On Wed, 18 Aug 2021, Richard Sandiford wrote:
>> I think it would be OK/sensible to use the larger of the index or
>> result vectors to determine the mask, if that helps.  There just
>> wasn't any need to make a distinction for SVE, since there the
>> mask type is determined entirely by the number of elements.
>
> I don't think that would help - for a V4SFmode gather with V4DImode
> indices the x86 ISA expects the AVX2 mask to be V4SImode, that is,
> the mask corresponds to the data mode, not to the index mode.

Ah, OK.  So the widest type determines the ISA and then the data
type determines the mask type within that ISA?

> The real issue is that we're asking for a mask mode just using
> the data mode but not the instruction.  So with V8SFmode data
> mode and -mavx512f -mno-avx512vl it appears we want a AVX2
> mask mode but in reality the gather instruction we can use
> uses EVEX.512 encoding when the index mode is V8DImode
> and thus is available w/ -mavx512f.
>
> So it appears that we'd need to pass another mode to the
> get_mask_mode target hook that determines the instruction
> encoding ...
>
> I'm also not sure if this wouldn't open up a can of worms
> with the mask computation stmts which would not know which
> (maybe there are two) context they are used in.  We're
> fundamentally using two different vector sizes and ISA subsets
> here :/

Yeah.

> Maybe vinfo->vector_mode should fully determine whether
> we're using EVEX or VEX encoding in the loop and thus we should
> reject attempts to do VEX encoding vectorization in EVEX mode?
> I supose this is how you don't get a mix of SVE and ADVSIMD
> vectorization in one loop?

Yeah, that's right.  It means that for -msve-vector-bits=128
we effectively have two V16QIs: the “real” AdvSIMD V16QI and
the SVE VNx16QI.  I can't imagine that being popular for x86 :-)

> But since VEX/EVEX use the same modes (but SVE/ADVSIMD do not)
> this will get interesting.
>
> Oh, and -mavx512bw would also be prone to use (different size again)
> AVX2 vectors at the same time.  Maybe the "solution" is to simply
> not expose the EVEX modes to the vectorizer when there's such
> a possibility, thus only when all F, VL and BW are available?
> At least when thinking about relaxing the vector sizes we can
> deal with.
>
> I've opened this can of worms with
>
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index 97745a830a2..db938f79c9c 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -3749,13 +3749,15 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool 
> read_p, 
> bool masked_p,
>  
>    for (;;)
>      {
> -      tree offset_vectype = get_vectype_for_scalar_type (vinfo, 
> offset_type);
> -      if (!offset_vectype)
> -       return false;
> -
> +      tree offset_vectype
> +       = get_related_vectype_for_scalar_type (vinfo->vector_mode, 
> offset_type,
> +                                              TYPE_VECTOR_SUBPARTS 
> (vectype));
>
> to even get the mixed size optabs to be considered.  If we were to
> expose the real instructions instead we'd have
> mask_gather_loadv16sfv8di but as said the semantics of such
> optab entry would need to be defined.  That's the way the current
> builtin_gather target hook code works, so it seems that would be
> a working approach, side-stepping the mixed size problems.
> Here we'd define that excess elements on either the data or the
> index operand are ignored (on the output operand they are zeroed).

It seems a bit ugly to expose such a low-level target-specific detail
at the gimple level though.  Also, if we expose the v16sf at the gimple
level then we would need to construct a v16sf vector for scatter stores
and initialise all the elements (at least until we have a don't care
VEC_PERM_EXPR encoding).

I'm not sure this would solve the problem with the mixture of mask
computation stmts that you mentioned above.  E.g. what if the input
to the v16sf scatter store above is a COND_EXPR that shares some mask
subexpresions with the scatter store mask?  This COND_EXPR would be a
v8sf VEC_COND_EXPR and so would presumably use the get_mask_mode
corresponding to v8sf rather than v16sf.  The mask computation stmts
would still need to cope with a mixture of AVX512 and AVX2 masks.

Even if x86 doesn't support that combination yet (because it instead
requires all vector sizes to be equal), it seems like something that
is likely to be supported in future.

Thanks,
Richard

Reply via email to