On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> This patch is the first of a series that tries to remove two
> assumptions:
>
> (1) that all vectors involved in vectorisation must be the same size
>
> (2) that there is only one vector mode for a given element mode and
>     number of elements
>
> Relaxing (1) helps with targets that support multiple vector sizes or
> that require the number of elements to stay the same.  E.g. if we're
> vectorising code that operates on narrow and wide elements, and the
> narrow elements use 64-bit vectors, then on AArch64 it would normally
> be better to use 128-bit vectors rather than pairs of 64-bit vectors
> for the wide elements.
>
> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
> fixed-length code for SVE.  It also allows unpacked/half-size SVE
> vectors to work with -msve-vector-bits=256.
>
> The patch adds a new hook that targets can use to control how we
> move from one vector mode to another.  The hook takes a starting vector
> mode, a new element mode, and (optionally) a new number of elements.
> The flexibility needed for (1) comes in when the number of elements
> isn't specified.
>
> All callers in this patch specify the number of elements, but a later
> vectoriser patch doesn't.  I won't be posting the vectoriser patch
> for a few days, hence the RFC/A tag.
>
> Tested individually on aarch64-linux-gnu and as a series on
> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
> look OK?

In isolation the idea looks good but maybe a bit limited?  I see
how it works for the same-size case but if you consider x86
where we have SSE, AVX256 and AVX512 what would it return
for related_vector_mode (V4SImode, SImode, 0)?  Or is this
kind of query not intended (where the component modes match
but nunits is zero)?  How do you get from SVE fixed 128bit
to NEON fixed 128bit then?  Or is it just used to stay in the
same register set for different component modes?

As said, it looks good but I'd like to see the followups.

Note I delayed thinking about relaxing the single-vector-size
constraint in the vectorizer until after we're SLP only because
that looked more easily done there.  I also remember patches
relaxing this a bit from RISCV folks.

Thanks,
Richard.

> I'll post some follow-up patches too.
>
> Richard
>
>
> 2019-10-23  Richard Sandiford  <richard.sandif...@arm.com>
>
> gcc/
>         * target.def (related_mode): New hook.
>         * doc/tm.texi.in (TARGET_VECTORIZE_RELATED_MODE): New hook.
>         * doc/tm.texi: Regenerate.
>         * targhooks.h (default_vectorize_related_mode): Declare.
>         * targhooks.c (default_vectorize_related_mode): New function.
>         * machmode.h (related_vector_mode): Declare.
>         * stor-layout.c (related_vector_mode): New function.
>         * expmed.c (extract_bit_field_1): Use it instead of mode_for_vector.
>         * optabs-query.c (qimode_for_vec_perm): Likewise.
>         * tree-vect-stmts.c (get_group_load_store_type): Likewise.
>         (vectorizable_store, vectorizable_load): Likewise
>
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def      2019-09-30 17:20:57.370607986 +0100
> +++ gcc/target.def      2019-10-23 11:33:01.568510253 +0100
> @@ -1909,6 +1909,33 @@ for autovectorization.  The default impl
>   (vector_sizes *sizes, bool all),
>   default_autovectorize_vector_sizes)
>
> +DEFHOOK
> +(related_mode,
> + "If a piece of code is using vector mode @var{vector_mode} and also wants\n\
> +to operate on elements of mode @var{element_mode}, return the vector mode\n\
> +it should use for those elements.  If @var{nunits} is nonzero, ensure that\n\
> +the mode has exactly @var{nunits} elements, otherwise pick whichever 
> vector\n\
> +size pairs the most naturally with @var{vector_mode}.  Return an empty\n\
> +@code{opt_machine_mode} if there is no supported vector mode with the\n\
> +required properties.\n\
> +\n\
> +There is no prescribed way of handling the case in which @var{nunits}\n\
> +is zero.  One common choice is to pick a vector mode with the same size\n\
> +as @var{vector_mode}; this is the natural choice if the target has a\n\
> +fixed vector size.  Another option is to choose a vector mode with the\n\
> +same number of elements as @var{vector_mode}; this is the natural choice\n\
> +if the target has a fixed number of elements.  Alternatively, the hook\n\
> +might choose a middle ground, such as trying to keep the number of\n\
> +elements as similar as possible while applying maximum and minimum\n\
> +vector sizes.\n\
> +\n\
> +The default implementation uses @code{mode_for_vector} to find the\n\
> +requested mode, returning a mode with the same size as @var{vector_mode}\n\
> +when @var{nunits} is zero.  This is the correct behavior for most targets.",
> + opt_machine_mode,
> + (machine_mode vector_mode, scalar_mode element_mode, poly_uint64 nunits),
> + default_vectorize_related_mode)
> +
>  /* Function to get a target mode for a vector mask.  */
>  DEFHOOK
>  (get_mask_mode,
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in  2019-09-30 17:20:57.350608130 +0100
> +++ gcc/doc/tm.texi.in  2019-10-23 11:33:01.564510281 +0100
> @@ -4181,6 +4181,8 @@ address;  but often a machine-dependent
>
>  @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
>
> +@hook TARGET_VECTORIZE_RELATED_MODE
> +
>  @hook TARGET_VECTORIZE_GET_MASK_MODE
>
>  @hook TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi     2019-09-30 17:20:57.350608130 +0100
> +++ gcc/doc/tm.texi     2019-10-23 11:33:01.560510309 +0100
> @@ -6029,6 +6029,30 @@ The hook does not need to do anything if
>  for autovectorization.  The default implementation does nothing.
>  @end deftypefn
>
> +@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_RELATED_MODE 
> (machine_mode @var{vector_mode}, scalar_mode @var{element_mode}, poly_uint64 
> @var{nunits})
> +If a piece of code is using vector mode @var{vector_mode} and also wants
> +to operate on elements of mode @var{element_mode}, return the vector mode
> +it should use for those elements.  If @var{nunits} is nonzero, ensure that
> +the mode has exactly @var{nunits} elements, otherwise pick whichever vector
> +size pairs the most naturally with @var{vector_mode}.  Return an empty
> +@code{opt_machine_mode} if there is no supported vector mode with the
> +required properties.
> +
> +There is no prescribed way of handling the case in which @var{nunits}
> +is zero.  One common choice is to pick a vector mode with the same size
> +as @var{vector_mode}; this is the natural choice if the target has a
> +fixed vector size.  Another option is to choose a vector mode with the
> +same number of elements as @var{vector_mode}; this is the natural choice
> +if the target has a fixed number of elements.  Alternatively, the hook
> +might choose a middle ground, such as trying to keep the number of
> +elements as similar as possible while applying maximum and minimum
> +vector sizes.
> +
> +The default implementation uses @code{mode_for_vector} to find the
> +requested mode, returning a mode with the same size as @var{vector_mode}
> +when @var{nunits} is zero.  This is the correct behavior for most targets.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE 
> (poly_uint64 @var{nunits}, poly_uint64 @var{length})
>  A vector mask is a value that holds one boolean result for every element
>  in a vector.  This hook returns the machine mode that should be used to
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h     2019-09-30 17:19:45.051128625 +0100
> +++ gcc/targhooks.h     2019-10-23 11:33:01.568510253 +0100
> @@ -114,6 +114,9 @@ default_builtin_support_vector_misalignm
>  extern machine_mode default_preferred_simd_mode (scalar_mode mode);
>  extern machine_mode default_split_reduction (machine_mode);
>  extern void default_autovectorize_vector_sizes (vector_sizes *, bool);
> +extern opt_machine_mode default_vectorize_related_mode (machine_mode,
> +                                                       scalar_mode,
> +                                                       poly_uint64);
>  extern opt_machine_mode default_get_mask_mode (poly_uint64, poly_uint64);
>  extern bool default_empty_mask_is_expensive (unsigned);
>  extern void *default_init_cost (class loop *);
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c     2019-10-20 13:58:01.283640189 +0100
> +++ gcc/targhooks.c     2019-10-23 11:33:01.568510253 +0100
> @@ -1307,6 +1307,25 @@ default_autovectorize_vector_sizes (vect
>  {
>  }
>
> +/* The default implementation of TARGET_VECTORIZE_RELATED_MODE.  */
> +
> +opt_machine_mode
> +default_vectorize_related_mode (machine_mode vector_mode,
> +                               scalar_mode element_mode,
> +                               poly_uint64 nunits)
> +{
> +  machine_mode result_mode;
> +  if ((maybe_ne (nunits, 0U)
> +       || multiple_p (GET_MODE_SIZE (vector_mode),
> +                     GET_MODE_SIZE (element_mode), &nunits))
> +      && mode_for_vector (element_mode, nunits).exists (&result_mode)
> +      && VECTOR_MODE_P (result_mode)
> +      && targetm.vector_mode_supported_p (result_mode))
> +    return result_mode;
> +
> +  return opt_machine_mode ();
> +}
> +
>  /* By default a vector of integers is used as a mask.  */
>
>  opt_machine_mode
> Index: gcc/machmode.h
> ===================================================================
> --- gcc/machmode.h      2019-10-07 09:38:45.627684403 +0100
> +++ gcc/machmode.h      2019-10-23 11:33:01.564510281 +0100
> @@ -880,6 +880,8 @@ extern opt_scalar_int_mode int_mode_for_
>  extern opt_machine_mode bitwise_mode_for_mode (machine_mode);
>  extern opt_machine_mode mode_for_vector (scalar_mode, poly_uint64);
>  extern opt_machine_mode mode_for_int_vector (unsigned int, poly_uint64);
> +extern opt_machine_mode related_vector_mode (machine_mode, scalar_mode,
> +                                            poly_uint64 = 0);
>
>  /* Return the integer vector equivalent of MODE, if one exists.  In other
>     words, return the mode for an integer vector that has the same number
> Index: gcc/stor-layout.c
> ===================================================================
> --- gcc/stor-layout.c   2019-09-18 10:43:11.562335822 +0100
> +++ gcc/stor-layout.c   2019-10-23 11:33:01.564510281 +0100
> @@ -530,6 +530,26 @@ mode_for_int_vector (unsigned int int_bi
>    return opt_machine_mode ();
>  }
>
> +/* If a piece of code is using vector mode VECTOR_MODE and also wants
> +   to operate on elements of mode ELEMENT_MODE, return the vector mode
> +   it should use for those elements.  If NUNITS is nonzero, ensure that
> +   the mode has exactly NUNITS elements, otherwise pick whichever vector
> +   size pairs the most naturally with VECTOR_MODE; this may mean choosing
> +   a mode with a different size and/or number of elements, depending on
> +   what the target prefers.  Return an empty opt_machine_mode if there
> +   is no supported vector mode with the required properties.
> +
> +   Unlike mode_for_vector. any returned mode is guaranteed to satisfy
> +   both VECTOR_MODE_P and targetm.vector_mode_supported_p.  */
> +
> +opt_machine_mode
> +related_vector_mode (machine_mode vector_mode, scalar_mode element_mode,
> +                    poly_uint64 nunits)
> +{
> +  gcc_assert (VECTOR_MODE_P (vector_mode));
> +  return targetm.vectorize.related_mode (vector_mode, element_mode, nunits);
> +}
> +
>  /* Return the alignment of MODE. This will be bounded by 1 and
>     BIGGEST_ALIGNMENT.  */
>
> Index: gcc/expmed.c
> ===================================================================
> --- gcc/expmed.c        2019-09-10 17:18:39.992121613 +0100
> +++ gcc/expmed.c        2019-10-23 11:33:01.564510281 +0100
> @@ -1641,12 +1641,10 @@ extract_bit_field_1 (rtx str_rtx, poly_u
>           poly_uint64 nunits;
>           if (!multiple_p (GET_MODE_BITSIZE (GET_MODE (op0)),
>                            GET_MODE_UNIT_BITSIZE (tmode), &nunits)
> -             || !mode_for_vector (inner_mode, nunits).exists (&new_mode)
> -             || !VECTOR_MODE_P (new_mode)
> +             || !related_vector_mode (tmode, inner_mode,
> +                                      nunits).exists (&new_mode)
>               || maybe_ne (GET_MODE_SIZE (new_mode),
> -                          GET_MODE_SIZE (GET_MODE (op0)))
> -             || GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode)
> -             || !targetm.vector_mode_supported_p (new_mode))
> +                          GET_MODE_SIZE (GET_MODE (op0))))
>             new_mode = VOIDmode;
>         }
>        poly_uint64 pos;
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2019-07-10 19:41:26.387898094 +0100
> +++ gcc/optabs-query.c  2019-10-23 11:33:01.564510281 +0100
> @@ -354,11 +354,8 @@ can_conditionally_move_p (machine_mode m
>  opt_machine_mode
>  qimode_for_vec_perm (machine_mode mode)
>  {
> -  machine_mode qimode;
> -  if (GET_MODE_INNER (mode) != QImode
> -      && mode_for_vector (QImode, GET_MODE_SIZE (mode)).exists (&qimode)
> -      && VECTOR_MODE_P (qimode))
> -    return qimode;
> +  if (GET_MODE_INNER (mode) != QImode)
> +    return related_vector_mode (mode, QImode, GET_MODE_SIZE (mode));
>    return opt_machine_mode ();
>  }
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-23 11:30:54.953408377 +0100
> +++ gcc/tree-vect-stmts.c       2019-10-23 11:33:01.572510226 +0100
> @@ -2276,9 +2276,8 @@ get_group_load_store_type (stmt_vec_info
>                   || alignment_support_scheme == dr_unaligned_supported)
>               && known_eq (nunits, (group_size - gap) * 2)
>               && known_eq (nunits, group_size)
> -             && mode_for_vector (elmode, (group_size - gap)).exists (&vmode)
> -             && VECTOR_MODE_P (vmode)
> -             && targetm.vector_mode_supported_p (vmode)
> +             && related_vector_mode (TYPE_MODE (vectype), elmode,
> +                                     group_size - gap).exists (&vmode)
>               && (convert_optab_handler (vec_init_optab,
>                                          TYPE_MODE (vectype), vmode)
>                   != CODE_FOR_nothing))
> @@ -7736,9 +7735,8 @@ vectorizable_store (stmt_vec_info stmt_i
>                  of vector elts directly.  */
>               scalar_mode elmode = SCALAR_TYPE_MODE (elem_type);
>               machine_mode vmode;
> -             if (!mode_for_vector (elmode, group_size).exists (&vmode)
> -                 || !VECTOR_MODE_P (vmode)
> -                 || !targetm.vector_mode_supported_p (vmode)
> +             if (!related_vector_mode (TYPE_MODE (vectype), elmode,
> +                                       group_size).exists (&vmode)
>                   || (convert_optab_handler (vec_extract_optab,
>                                              TYPE_MODE (vectype), vmode)
>                       == CODE_FOR_nothing))
> @@ -7755,9 +7753,8 @@ vectorizable_store (stmt_vec_info stmt_i
>                      element extracts from the original vector type and
>                      element size stores.  */
>                   if (int_mode_for_size (lsize, 0).exists (&elmode)
> -                     && mode_for_vector (elmode, lnunits).exists (&vmode)
> -                     && VECTOR_MODE_P (vmode)
> -                     && targetm.vector_mode_supported_p (vmode)
> +                     && related_vector_mode (TYPE_MODE (vectype), elmode,
> +                                             lnunits).exists (&vmode)
>                       && (convert_optab_handler (vec_extract_optab,
>                                                  vmode, elmode)
>                           != CODE_FOR_nothing))
> @@ -8838,9 +8835,8 @@ vectorizable_load (stmt_vec_info stmt_in
>                  vector elts directly.  */
>               scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
>               machine_mode vmode;
> -             if (mode_for_vector (elmode, group_size).exists (&vmode)
> -                 && VECTOR_MODE_P (vmode)
> -                 && targetm.vector_mode_supported_p (vmode)
> +             if (related_vector_mode (TYPE_MODE (vectype), elmode,
> +                                      group_size).exists (&vmode)
>                   && (convert_optab_handler (vec_init_optab,
>                                              TYPE_MODE (vectype), vmode)
>                       != CODE_FOR_nothing))
> @@ -8864,9 +8860,8 @@ vectorizable_load (stmt_vec_info stmt_in
>                   /* If we can't construct such a vector fall back to
>                      element loads of the original vector type.  */
>                   if (int_mode_for_size (lsize, 0).exists (&elmode)
> -                     && mode_for_vector (elmode, lnunits).exists (&vmode)
> -                     && VECTOR_MODE_P (vmode)
> -                     && targetm.vector_mode_supported_p (vmode)
> +                     && related_vector_mode (TYPE_MODE (vectype), elmode,
> +                                             lnunits).exists (&vmode)
>                       && (convert_optab_handler (vec_init_optab, vmode, 
> elmode)
>                           != CODE_FOR_nothing))
>                     {

Reply via email to