> On 11 Jul 2024, at 15:41, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov <ktkac...@nvidia.com> writes:
>> Hi Victor,
>> 
>>> On 10 Jul 2024, at 16:05, Victor Do Nascimento 
>>> <victor.donascime...@arm.com> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Given recent changes to the dot_prod standard pattern name, this patch
>>> fixes the aarch64 back-end by implementing the following changes:
>>> 
>>> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
>>> 2. Rewrite initialization and function expansion mechanism for simd
>>> builtins.
>>> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE
>>> builtins.
>>> 
>>> Finally, given that it is now possible for the compiler to
>>> differentiate between the two- and four-way dot product, we add a test
>>> to ensure that autovectorization picks up on dot-product patterns
>>> where the result is twice the width of the operands.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>       * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
>>>       New AARCH64_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI,
>>>       UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI.
>>>       (aarch64_init_builtin_dotprod_functions): New.
>>>       (aarch64_init_simd_builtins): Add call to
>>>       `aarch64_init_builtin_dotprod_functions'.
>>>       (aarch64_general_gimple_fold_builtin): Add DOT_PROD_EXPR
>>>       handling.
>>>       * config/aarch64/aarch64-simd-builtins.def: Remove macro
>>>       expansion-based initialization and expansion
>>>       of (u|s|us)dot_prod builtins.
>>>       * config/aarch64/aarch64-simd.md
>>>       (<sur>dot_prod<vsi2qi><vczle><vczbe>): Deleted.
>>>       (<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): New.
>>>       (usdot_prod<vsi2qi><vczle><vczbe>): Deleted.
>>>       (usdot_prod<mode><vsi2qi><vczle><vczbe>): New.
>>>       (<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
>>>       (popcount<mode2>): fix use of `udot_prod_optab'.
>>>       * config/aarch64/aarch64-sve-builtins-base.cc
>>>       (svdot_impl::expand): s/direct/convert/ in
>>>       `convert_optab_handler_for_sign' function call.
>>>       (svusdot_impl::expand): add second mode argument in call to
>>>       `code_for_dot_prod'.
>>>       * config/aarch64/aarch64-sve-builtins.cc
>>>       (function_expander::convert_optab_handler_for_sign): New class
>>>       method.
>>>       * config/aarch64/aarch64-sve-builtins.h
>>>       (class function_expander): Add prototype for new
>>>       `convert_optab_handler_for_sign' method.
>>>       * gcc/config/aarch64/aarch64-sve.md
>>>       (<sur>dot_prod<vsi2qi>): Deleted.
>>>       (<sur>dot_prod<mode><vsi2qi>): New.
>>>       (@<sur>dot_prod<vsi2qi>): Deleted.
>>>       (@<sur>dot_prod<mode><vsi2qi>): New.
>>>       (<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
>>>       * gcc/config/aarch64/aarch64-sve2.md
>>>       (@aarch64_sve_<sur>dotvnx4sivnx8hi): Deleted.
>>>       (<sur>dot_prodvnx4sivnx8hi): New.
>>> 
>>> gcc/testsuite/ChangeLog:
>>>       * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
>>> ---
>>> gcc/config/aarch64/aarch64-builtins.cc        | 71 +++++++++++++++++++
>>> gcc/config/aarch64/aarch64-simd-builtins.def  |  4 --
>>> gcc/config/aarch64/aarch64-simd.md            |  9 +--
>>> .../aarch64/aarch64-sve-builtins-base.cc      | 13 ++--
>>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 17 +++++
>>> gcc/config/aarch64/aarch64-sve-builtins.h     |  3 +
>>> gcc/config/aarch64/aarch64-sve.md             |  6 +-
>>> gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>>> gcc/config/aarch64/iterators.md               |  1 +
>>> .../aarch64/sme/vect-dotprod-twoway.c         | 25 +++++++
>>> 10 files changed, 133 insertions(+), 18 deletions(-)
>>> create mode 100644 
>>> gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
>>> b/gcc/config/aarch64/aarch64-builtins.cc
>>> index 30669f8aa18..6c7c86d0e6e 100644
>>> --- a/gcc/config/aarch64/aarch64-builtins.cc
>>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
>>> @@ -783,6 +783,12 @@ enum aarch64_builtins
>>>  AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1,
>>>  AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_PATTERN_START
>>>                             + ARRAY_SIZE (aarch64_simd_builtin_data) - 1,
>>> +  AARCH64_BUILTIN_SDOTV8QI,
>>> +  AARCH64_BUILTIN_SDOTV16QI,
>>> +  AARCH64_BUILTIN_UDOTV8QI,
>>> +  AARCH64_BUILTIN_UDOTV16QI,
>>> +  AARCH64_BUILTIN_USDOTV8QI,
>>> +  AARCH64_BUILTIN_USDOTV16QI,
>>>  AARCH64_CRC32_BUILTIN_BASE,
>>>  AARCH64_CRC32_BUILTINS
>>>  AARCH64_CRC32_BUILTIN_MAX,
>>> @@ -1642,6 +1648,60 @@ handle_arm_neon_h (void)
>>>  aarch64_init_simd_intrinsics ();
>>> }
>>> 
>>> +void
>>> +aarch64_init_builtin_dotprod_functions (void)
>>> +{
>>> +  tree fndecl = NULL;
>>> +  tree ftype = NULL;
>>> +
>>> +  tree uv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_unsigned);
>>> +  tree sv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_none);
>>> +  tree uv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_unsigned);
>>> +  tree sv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_none);
>>> +  tree uv2si = aarch64_simd_builtin_type (V2SImode, qualifier_unsigned);
>>> +  tree sv2si = aarch64_simd_builtin_type (V2SImode, qualifier_none);
>>> +  tree uv4si = aarch64_simd_builtin_type (V4SImode, qualifier_unsigned);
>>> +  tree sv4si = aarch64_simd_builtin_type (V4SImode, qualifier_none);
>>> +
>>> +  struct builtin_decls_data
>>> +  {
>>> +    tree out_type_node;
>>> +    tree in_type1_node;
>>> +    tree in_type2_node;
>>> +    const char *builtin_name;
>>> +    int function_code;
>>> +  };
>>> +
>>> +#define NAME(A) "__builtin_aarch64_" #A
>>> +#define ENUM(B) AARCH64_BUILTIN_##B
>>> +
>>> +  builtin_decls_data bdda[] =
>>> +  {
>>> +    { sv2si, sv8qi,  sv8qi,  NAME (sdot_prodv8qi),       ENUM (SDOTV8QI)   
>>> },
>>> +    { uv2si, uv8qi,  uv8qi,  NAME (udot_prodv8qi_uuuu),   ENUM (UDOTV8QI)  
>>>  },
>>> +    { sv2si, uv8qi,  sv8qi,  NAME (usdot_prodv8qi_suss),  ENUM (USDOTV8QI) 
>>>  },
>>> +    { sv4si, sv16qi, sv16qi, NAME (sdot_prodv16qi),      ENUM (SDOTV16QI)  
>>> },
>>> +    { uv4si, uv16qi, uv16qi, NAME (udot_prodv16qi_uuuu),  ENUM (UDOTV16QI) 
>>>  },
>>> +    { sv4si, uv16qi, sv16qi, NAME (usdot_prodv16qi_suss), ENUM 
>>> (USDOTV16QI) },
>>> +  };
>>> +
>>> +#undef NAME
>>> +#undef ENUM
>>> +
>>> +  builtin_decls_data *bdd = bdda;
>>> +  builtin_decls_data *bdd_end = bdd + (ARRAY_SIZE (bdda));
>>> +
>>> +  for (; bdd < bdd_end; bdd++)
>>> +  {
>>> +    ftype = build_function_type_list (bdd->out_type_node, 
>>> bdd->in_type1_node,
>>> +                                     bdd->in_type2_node, 
>>> bdd->out_type_node,
>>> +                                     NULL_TREE);
>>> +    fndecl = aarch64_general_add_builtin (bdd->builtin_name,
>>> +                                         ftype, bdd->function_code);
>>> +    aarch64_builtin_decls[bdd->function_code] = fndecl;
>>> +  }
>>> +}
>>> +
>>> static void
>>> aarch64_init_simd_builtins (void)
>>> {
>>> @@ -1654,6 +1714,8 @@ aarch64_init_simd_builtins (void)
>>>  aarch64_init_simd_builtin_scalar_types ();
>>> 
>>>  aarch64_init_simd_builtin_functions (false);
>>> +  aarch64_init_builtin_dotprod_functions ();
>>> +
>> 
>> Perhaps we should take this opportunity to instead migrate the dot-product 
>> intrinsics to the simulate_builtin_function_decl framework instead so that 
>> they get created as part of “#pragma GCC aarch64 “arm_neon.h””.
>> 
>> That’s the direction of travel we want with these builtins so I’d rather not 
>> complicate the legacy builtin handling code here.
>> I think it shouldn’t be much more work than this patch as you’ve already got 
>> the various static bookkeeping data on hand.
> 
> To avoid mission creep, it might be simpler to change:
> 
>  BUILTIN_VB (TERNOP, sdot_prod, 10, NONE)
>  BUILTIN_VB (TERNOPU, udot_prod, 10, NONE)
>  BUILTIN_VB (TERNOP_SUSS, usdot_prod, 10, NONE)
> 
> to:
> 
>  BUILTIN_VB (TERNOP, sdot_prod, 0, NONE)
>  BUILTIN_VB (TERNOPU, udot_prod, 0, NONE)
>  BUILTIN_VB (TERNOP_SUSS, usdot_prod, 0, NONE)
> 
> so that the internal names are aarch64_udot_prodv8qi etc., and then add:
> 
> constexpr insn_code CODE_FOR_aarch64_udot_prodv8qi
>    = CODE_FOR_udot_prodv2siv8qi;
> 
> etc. to aarch64-builtins.cc.  I agree that moving to the pragma approach
> would be a good thing long-term, but at heart this patch is meant to be
> a renaming exercise.

Ok, that’d be fine by me.
Thanks,
Kyrill


> 
> Thanks,
> Richard

Reply via email to