> On 11 Jul 2024, at 15:41, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Kyrylo Tkachov <ktkac...@nvidia.com> writes: >> Hi Victor, >> >>> On 10 Jul 2024, at 16:05, Victor Do Nascimento >>> <victor.donascime...@arm.com> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> Given recent changes to the dot_prod standard pattern name, this patch >>> fixes the aarch64 back-end by implementing the following changes: >>> >>> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files. >>> 2. Rewrite initialization and function expansion mechanism for simd >>> builtins. >>> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE >>> builtins. >>> >>> Finally, given that it is now possible for the compiler to >>> differentiate between the two- and four-way dot product, we add a test >>> to ensure that autovectorization picks up on dot-product patterns >>> where the result is twice the width of the operands. >>> >>> gcc/ChangeLog: >>> >>> * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins): >>> New AARCH64_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, >>> UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI. >>> (aarch64_init_builtin_dotprod_functions): New. >>> (aarch64_init_simd_builtins): Add call to >>> `aarch64_init_builtin_dotprod_functions'. >>> (aarch64_general_gimple_fold_builtin): Add DOT_PROD_EXPR >>> handling. >>> * config/aarch64/aarch64-simd-builtins.def: Remove macro >>> expansion-based initialization and expansion >>> of (u|s|us)dot_prod builtins. >>> * config/aarch64/aarch64-simd.md >>> (<sur>dot_prod<vsi2qi><vczle><vczbe>): Deleted. >>> (<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): New. >>> (usdot_prod<vsi2qi><vczle><vczbe>): Deleted. >>> (usdot_prod<mode><vsi2qi><vczle><vczbe>): New. >>> (<su>sadv16qi): Adjust call to gen_udot_prod take second mode. >>> (popcount<mode2>): fix use of `udot_prod_optab'. >>> * config/aarch64/aarch64-sve-builtins-base.cc >>> (svdot_impl::expand): s/direct/convert/ in >>> `convert_optab_handler_for_sign' function call. >>> (svusdot_impl::expand): add second mode argument in call to >>> `code_for_dot_prod'. >>> * config/aarch64/aarch64-sve-builtins.cc >>> (function_expander::convert_optab_handler_for_sign): New class >>> method. >>> * config/aarch64/aarch64-sve-builtins.h >>> (class function_expander): Add prototype for new >>> `convert_optab_handler_for_sign' method. >>> * gcc/config/aarch64/aarch64-sve.md >>> (<sur>dot_prod<vsi2qi>): Deleted. >>> (<sur>dot_prod<mode><vsi2qi>): New. >>> (@<sur>dot_prod<vsi2qi>): Deleted. >>> (@<sur>dot_prod<mode><vsi2qi>): New. >>> (<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode. >>> * gcc/config/aarch64/aarch64-sve2.md >>> (@aarch64_sve_<sur>dotvnx4sivnx8hi): Deleted. >>> (<sur>dot_prodvnx4sivnx8hi): New. >>> >>> gcc/testsuite/ChangeLog: >>> * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New. >>> --- >>> gcc/config/aarch64/aarch64-builtins.cc | 71 +++++++++++++++++++ >>> gcc/config/aarch64/aarch64-simd-builtins.def | 4 -- >>> gcc/config/aarch64/aarch64-simd.md | 9 +-- >>> .../aarch64/aarch64-sve-builtins-base.cc | 13 ++-- >>> gcc/config/aarch64/aarch64-sve-builtins.cc | 17 +++++ >>> gcc/config/aarch64/aarch64-sve-builtins.h | 3 + >>> gcc/config/aarch64/aarch64-sve.md | 6 +- >>> gcc/config/aarch64/aarch64-sve2.md | 2 +- >>> gcc/config/aarch64/iterators.md | 1 + >>> .../aarch64/sme/vect-dotprod-twoway.c | 25 +++++++ >>> 10 files changed, 133 insertions(+), 18 deletions(-) >>> create mode 100644 >>> gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c >>> >>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc >>> b/gcc/config/aarch64/aarch64-builtins.cc >>> index 30669f8aa18..6c7c86d0e6e 100644 >>> --- a/gcc/config/aarch64/aarch64-builtins.cc >>> +++ b/gcc/config/aarch64/aarch64-builtins.cc >>> @@ -783,6 +783,12 @@ enum aarch64_builtins >>> AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1, >>> AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_PATTERN_START >>> + ARRAY_SIZE (aarch64_simd_builtin_data) - 1, >>> + AARCH64_BUILTIN_SDOTV8QI, >>> + AARCH64_BUILTIN_SDOTV16QI, >>> + AARCH64_BUILTIN_UDOTV8QI, >>> + AARCH64_BUILTIN_UDOTV16QI, >>> + AARCH64_BUILTIN_USDOTV8QI, >>> + AARCH64_BUILTIN_USDOTV16QI, >>> AARCH64_CRC32_BUILTIN_BASE, >>> AARCH64_CRC32_BUILTINS >>> AARCH64_CRC32_BUILTIN_MAX, >>> @@ -1642,6 +1648,60 @@ handle_arm_neon_h (void) >>> aarch64_init_simd_intrinsics (); >>> } >>> >>> +void >>> +aarch64_init_builtin_dotprod_functions (void) >>> +{ >>> + tree fndecl = NULL; >>> + tree ftype = NULL; >>> + >>> + tree uv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_unsigned); >>> + tree sv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_none); >>> + tree uv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_unsigned); >>> + tree sv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_none); >>> + tree uv2si = aarch64_simd_builtin_type (V2SImode, qualifier_unsigned); >>> + tree sv2si = aarch64_simd_builtin_type (V2SImode, qualifier_none); >>> + tree uv4si = aarch64_simd_builtin_type (V4SImode, qualifier_unsigned); >>> + tree sv4si = aarch64_simd_builtin_type (V4SImode, qualifier_none); >>> + >>> + struct builtin_decls_data >>> + { >>> + tree out_type_node; >>> + tree in_type1_node; >>> + tree in_type2_node; >>> + const char *builtin_name; >>> + int function_code; >>> + }; >>> + >>> +#define NAME(A) "__builtin_aarch64_" #A >>> +#define ENUM(B) AARCH64_BUILTIN_##B >>> + >>> + builtin_decls_data bdda[] = >>> + { >>> + { sv2si, sv8qi, sv8qi, NAME (sdot_prodv8qi), ENUM (SDOTV8QI) >>> }, >>> + { uv2si, uv8qi, uv8qi, NAME (udot_prodv8qi_uuuu), ENUM (UDOTV8QI) >>> }, >>> + { sv2si, uv8qi, sv8qi, NAME (usdot_prodv8qi_suss), ENUM (USDOTV8QI) >>> }, >>> + { sv4si, sv16qi, sv16qi, NAME (sdot_prodv16qi), ENUM (SDOTV16QI) >>> }, >>> + { uv4si, uv16qi, uv16qi, NAME (udot_prodv16qi_uuuu), ENUM (UDOTV16QI) >>> }, >>> + { sv4si, uv16qi, sv16qi, NAME (usdot_prodv16qi_suss), ENUM >>> (USDOTV16QI) }, >>> + }; >>> + >>> +#undef NAME >>> +#undef ENUM >>> + >>> + builtin_decls_data *bdd = bdda; >>> + builtin_decls_data *bdd_end = bdd + (ARRAY_SIZE (bdda)); >>> + >>> + for (; bdd < bdd_end; bdd++) >>> + { >>> + ftype = build_function_type_list (bdd->out_type_node, >>> bdd->in_type1_node, >>> + bdd->in_type2_node, >>> bdd->out_type_node, >>> + NULL_TREE); >>> + fndecl = aarch64_general_add_builtin (bdd->builtin_name, >>> + ftype, bdd->function_code); >>> + aarch64_builtin_decls[bdd->function_code] = fndecl; >>> + } >>> +} >>> + >>> static void >>> aarch64_init_simd_builtins (void) >>> { >>> @@ -1654,6 +1714,8 @@ aarch64_init_simd_builtins (void) >>> aarch64_init_simd_builtin_scalar_types (); >>> >>> aarch64_init_simd_builtin_functions (false); >>> + aarch64_init_builtin_dotprod_functions (); >>> + >> >> Perhaps we should take this opportunity to instead migrate the dot-product >> intrinsics to the simulate_builtin_function_decl framework instead so that >> they get created as part of “#pragma GCC aarch64 “arm_neon.h””. >> >> That’s the direction of travel we want with these builtins so I’d rather not >> complicate the legacy builtin handling code here. >> I think it shouldn’t be much more work than this patch as you’ve already got >> the various static bookkeeping data on hand. > > To avoid mission creep, it might be simpler to change: > > BUILTIN_VB (TERNOP, sdot_prod, 10, NONE) > BUILTIN_VB (TERNOPU, udot_prod, 10, NONE) > BUILTIN_VB (TERNOP_SUSS, usdot_prod, 10, NONE) > > to: > > BUILTIN_VB (TERNOP, sdot_prod, 0, NONE) > BUILTIN_VB (TERNOPU, udot_prod, 0, NONE) > BUILTIN_VB (TERNOP_SUSS, usdot_prod, 0, NONE) > > so that the internal names are aarch64_udot_prodv8qi etc., and then add: > > constexpr insn_code CODE_FOR_aarch64_udot_prodv8qi > = CODE_FOR_udot_prodv2siv8qi; > > etc. to aarch64-builtins.cc. I agree that moving to the pragma approach > would be a good thing long-term, but at heart this patch is meant to be > a renaming exercise.
Ok, that’d be fine by me. Thanks, Kyrill > > Thanks, > Richard