Steve Ellcey <sell...@marvell.com> writes: > Here is an updated version of the GCC patch to enable SIMD functions on > Aarch64. There are a number of changes from the last patch. > > I reduced the shared code changes, there is still one change in shared code > (omp-simd-clone.c) to call targetm.simd_clone.adjust from expand_simd_clones > but it now uses the same argument as the existing call. This new call allows > Aarch64 to add the aarch64_vector_pcs attribute to SIMD clone definitions > which in turn ensures they use the correct ABI. Previously this target > function was only called on declarations, not definitions. This change > affects > the x86 target so I modified ix86_simd_clone_adjust to return and do nothing > when called with a definition. This means there is no change in behaviour > on x86. I did a build and GCC testsuite run on x86 to verify this. > > Most of the changes from the previous patch are in the > aarch64_simd_clone_compute_vecsize_and_simdlen function. > > The previous version was heavily based on the x86 function, this one has > changes to address the issues that were raised in the earlier patch > and so it no longer looks like the x86 version. I use types instead of modes > to check for what we can/cannot vectorize and I (try to) differentiate > between vectors that we are not currently handling (but could later) and > those that won't ever be handled. > > I have also added a testsuite patch to fix regressions in the gcc.dg/gomp > and g++.dg/gomp tests. There are no regressions with this patch applied. > > Steve Ellcey > sell...@marvell.com > > 2018-12-19 Steve Ellcey <sell...@cavium.com> > > * config/aarch64/aarch64.c (cgraph.h): New include. > (supported_simd_type): New function. > (currently_supported_simd_type): Ditto. > (aarch64_simd_clone_compute_vecsize_and_simdlen): Ditto. > (aarch64_simd_clone_adjust): Ditto. > (aarch64_simd_clone_usable): Ditto. > (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro. > (TARGET_SIMD_CLONE_ADJUST): Ditto. > (TARGET_SIMD_CLONE_USABLE): Ditto. > * config/i386/i386.c (ix86_simd_clone_adjust): Add definition check. > * omp-simd-clone.c (expand_simd_clones): Add targetm.simd_clone.adjust > call. > > 2018-12-19 Steve Ellcey <sell...@cavium.com> > > * g++.dg/gomp/declare-simd-1.C: Add aarch64 specific > warning checks and assembler scans. > * g++.dg/gomp/declare-simd-3.C: Ditto. > * g++.dg/gomp/declare-simd-4.C: Ditto. > * g++.dg/gomp/declare-simd-7.C: Ditto. > * gcc.dg/gomp/declare-simd-1.c: Ditto. > * gcc.dg/gomp/declare-simd-3.c: Ditto.
Sorry, hadn't realised this was still unreviewed. > @@ -18064,6 +18066,138 @@ aarch64_estimated_poly_value (poly_int64 val) > return val.coeffs[0] + val.coeffs[1] * over_128 / 128; > } > > + > +/* Return true for types that could be supported as SIMD return or > + argument types. */ > + > +static bool supported_simd_type (tree t) > +{ > + return (FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t)); We should also check that the size is 1, 2, 4 or 8 bytes. > +} > + > +/* Return true for types that currently are supported as SIMD return > + or argument types. */ > + > +static bool currently_supported_simd_type (tree t) > +{ > + if (COMPLEX_FLOAT_TYPE_P (t)) > + return false; > + > + return supported_simd_type (t); > +} > + > +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN. */ > + > +static int > +aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, > + struct cgraph_simd_clone *clonei, > + tree base_type, > + int num ATTRIBUTE_UNUSED) > +{ > + const char *wmsg; > + int vsize; > + tree t, ret_type, arg_type; > + > + if (!TARGET_SIMD) > + return 0; > + > + if (clonei->simdlen > + && (clonei->simdlen < 2 > + || clonei->simdlen > 1024 > + || (clonei->simdlen & (clonei->simdlen - 1)) != 0)) > + { > + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, > + "unsupported simdlen %d", clonei->simdlen); > + return 0; > + } > + > + ret_type = TREE_TYPE (TREE_TYPE (node->decl)); > + if (TREE_CODE (ret_type) != VOID_TYPE > + && !currently_supported_simd_type (ret_type)) > + { > + if (supported_simd_type (ret_type)) > + wmsg = G_("GCC does not currently support return type %qT for simd"); > + else > + wmsg = G_("unsupported return type return type %qT for simd"); Typo: doubled "return type". Maybe s/for simd/for %<simd%> functions/ in both messages. Since that will blow the line limit... > + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg, ret_type); > + return 0; > + } ...it's probably worth just calling warning_at in each arm of the "if". We'll then get proper format checking in bootstraps. > + for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t)) > + { > + arg_type = TREE_TYPE (t); > + if (POINTER_TYPE_P (arg_type)) > + arg_type = TREE_TYPE (arg_type); > + if (!currently_supported_simd_type (arg_type)) > + { > + if (supported_simd_type (arg_type)) > + wmsg = G_("GCC does not currently support argument type %qT " > + "for simd"); > + else > + wmsg = G_("unsupported argument type %qT for simd"); > + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg, arg_type); > + return 0; > + } > + } The ABI supports all argument types, so this should only check current_supported_simd_type and should always use the "GCC does not..." message. Could you explain why the POINTER_TYPE_P handling is correct? Think it's worth a comment. Dropping it is also fine if that's easier. (Just in case: the point about PBV in my previous reply was that if an argument isn't a natural vector element, the ABI says that you should implicitly convert it to a vector of pointers instead, which is how it can handle any argument type. I didn't mean that we should add a pointer check here.) > + if (clonei->simdlen == 0) > + { > + if (SCALAR_INT_MODE_P (TYPE_MODE (base_type))) > + clonei->simdlen = clonei->vecsize_int; > + else > + clonei->simdlen = clonei->vecsize_float; > + clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)); > + return 1; > + } I should have noticed this last time, but base_type is the CDT in the Intel ABI. That isn't always right for the AArch64 ABI. I think for now currently_supported_simd_type should take base_type as a second parameter and check that the given type has the same size. > + /* Restrict ourselves to vectors that fit in a single register */ > + > + gcc_assert (tree_fits_shwi_p (TYPE_SIZE (base_type))); > + vsize = clonei->simdlen * tree_to_shwi (TYPE_SIZE (base_type)); > + if (vsize > 128) > + { > + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, > + "GCC does not currently support simdlen %d for type %qT", > + clonei->simdlen, base_type); > + return 0; > + } nit: block contents indented too far. > + return 0; > +} Doesn't this mean that we always silently fail for an explicit and correct simdlen? If so, that suggests we might not have enough testsuite coverage :-) Thanks, Richard