Steve Ellcey <sell...@marvell.com> writes:
> Here is an updated version of the GCC patch to enable SIMD functions on
> Aarch64.  There are a number of changes from the last patch.
>
> I reduced the shared code changes, there is still one change in shared code
> (omp-simd-clone.c) to call targetm.simd_clone.adjust from expand_simd_clones
> but it now uses the same argument as the existing call.  This new call allows
> Aarch64 to add the aarch64_vector_pcs attribute to SIMD clone definitions
> which in turn ensures they use the correct ABI.  Previously this target
> function was only called on declarations, not definitions.  This change 
> affects
> the x86 target so I modified ix86_simd_clone_adjust to return and do nothing
> when called with a definition.  This means there is no change in behaviour
> on x86.  I did a build and GCC testsuite run on x86 to verify this.
>
> Most of the changes from the previous patch are in the
> aarch64_simd_clone_compute_vecsize_and_simdlen function.
>
> The previous version was heavily based on the x86 function, this one has
> changes to address the issues that were raised in the earlier patch
> and so it no longer looks like the x86 version.  I use types instead of modes
> to check for what we can/cannot vectorize and I (try to) differentiate
> between vectors that we are not currently handling (but could later) and
> those that won't ever be handled.
>
> I have also added a testsuite patch to fix regressions in the gcc.dg/gomp
> and g++.dg/gomp tests.  There are no regressions with this patch applied.
>
> Steve Ellcey
> sell...@marvell.com
>
> 2018-12-19  Steve Ellcey  <sell...@cavium.com>
>
>       * config/aarch64/aarch64.c (cgraph.h): New include.
>       (supported_simd_type): New function.
>       (currently_supported_simd_type): Ditto.
>       (aarch64_simd_clone_compute_vecsize_and_simdlen): Ditto.
>       (aarch64_simd_clone_adjust): Ditto.
>       (aarch64_simd_clone_usable): Ditto.
>       (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro.
>       (TARGET_SIMD_CLONE_ADJUST): Ditto.
>       (TARGET_SIMD_CLONE_USABLE): Ditto.
>       * config/i386/i386.c (ix86_simd_clone_adjust): Add definition check.
>       * omp-simd-clone.c (expand_simd_clones): Add targetm.simd_clone.adjust
>       call.
>
> 2018-12-19  Steve Ellcey  <sell...@cavium.com>
>
>       * g++.dg/gomp/declare-simd-1.C: Add aarch64 specific
>       warning checks and assembler scans.
>       * g++.dg/gomp/declare-simd-3.C: Ditto.
>       * g++.dg/gomp/declare-simd-4.C: Ditto.
>       * g++.dg/gomp/declare-simd-7.C: Ditto.
>       * gcc.dg/gomp/declare-simd-1.c: Ditto.
>       * gcc.dg/gomp/declare-simd-3.c: Ditto.

Sorry, hadn't realised this was still unreviewed.

> @@ -18064,6 +18066,138 @@ aarch64_estimated_poly_value (poly_int64 val)
>    return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
>  }
>  
> +
> +/* Return true for types that could be supported as SIMD return or
> +   argument types.  */
> +
> +static bool supported_simd_type (tree t)
> +{
> +  return (FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t));

We should also check that the size is 1, 2, 4 or 8 bytes.

> +}
> +
> +/* Return true for types that currently are supported as SIMD return
> +   or argument types.  */
> +
> +static bool currently_supported_simd_type (tree t)
> +{
> +  if (COMPLEX_FLOAT_TYPE_P (t))
> +    return false;
> +
> +  return supported_simd_type (t);
> +}
> +
> +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */
> +
> +static int
> +aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
> +                                     struct cgraph_simd_clone *clonei,
> +                                     tree base_type,
> +                                     int num ATTRIBUTE_UNUSED)
> +{
> +  const char *wmsg;
> +  int vsize;
> +  tree t, ret_type, arg_type;
> +
> +  if (!TARGET_SIMD)
> +    return 0;
> +
> +  if (clonei->simdlen
> +      && (clonei->simdlen < 2
> +       || clonei->simdlen > 1024
> +       || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
> +    {
> +      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> +               "unsupported simdlen %d", clonei->simdlen);
> +      return 0;
> +    }
> +
> +  ret_type = TREE_TYPE (TREE_TYPE (node->decl));
> +  if (TREE_CODE (ret_type) != VOID_TYPE
> +      && !currently_supported_simd_type (ret_type))
> +    {
> +      if (supported_simd_type (ret_type))
> +     wmsg = G_("GCC does not currently support return type %qT for simd");
> +      else
> +     wmsg = G_("unsupported return type return type %qT for simd");

Typo: doubled "return type".

Maybe s/for simd/for %<simd%> functions/ in both messages.
Since that will blow the line limit...

> +      warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg, ret_type);
> +      return 0;
> +    }

...it's probably worth just calling warning_at in each arm of the "if".
We'll then get proper format checking in bootstraps.

> +  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
> +    {
> +      arg_type = TREE_TYPE (t);
> +      if (POINTER_TYPE_P (arg_type))
> +     arg_type = TREE_TYPE (arg_type);
> +      if (!currently_supported_simd_type (arg_type))
> +     {
> +       if (supported_simd_type (arg_type))
> +         wmsg = G_("GCC does not currently support argument type %qT "
> +                   "for simd");
> +       else
> +         wmsg = G_("unsupported argument type %qT for simd");
> +       warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg, arg_type);
> +       return 0;
> +     }
> +    }

The ABI supports all argument types, so this should only check
current_supported_simd_type and should always use the "GCC does not..."
message.

Could you explain why the POINTER_TYPE_P handling is correct?
Think it's worth a comment.  Dropping it is also fine if that's easier.

(Just in case: the point about PBV in my previous reply was that if
an argument isn't a natural vector element, the ABI says that you
should implicitly convert it to a vector of pointers instead, which is
how it can handle any argument type.  I didn't mean that we should add
a pointer check here.)

> +  if (clonei->simdlen == 0)
> +    {
> +      if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
> +     clonei->simdlen = clonei->vecsize_int;
> +      else
> +     clonei->simdlen = clonei->vecsize_float;
> +      clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
> +      return 1;
> +    }

I should have noticed this last time, but base_type is the CDT in the
Intel ABI.  That isn't always right for the AArch64 ABI.

I think for now currently_supported_simd_type should take base_type
as a second parameter and check that the given type has the same size.

> +  /* Restrict ourselves to vectors that fit in a single register  */
> +
> +  gcc_assert (tree_fits_shwi_p (TYPE_SIZE (base_type)));
> +  vsize = clonei->simdlen * tree_to_shwi (TYPE_SIZE (base_type));
> +  if (vsize > 128)
> +    {
> +       warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> +                   "GCC does not currently support simdlen %d for type %qT",
> +                   clonei->simdlen, base_type);
> +       return 0;
> +    }

nit: block contents indented too far.

> +  return 0;
> +}

Doesn't this mean that we always silently fail for an explicit and
correct simdlen?  If so, that suggests we might not have enough
testsuite coverage :-)

Thanks,
Richard

Reply via email to