Re: [PATCH v6 1/2] aarch64: Add AdvSIMD faminmax intrinsics

Richard Sandiford Thu, 29 Aug 2024 03:36:34 -0700

Kyrylo Tkachov <ktkac...@nvidia.com> writes:
> Hi Saurabh,
>
>> On 29 Aug 2024, at 09:51, saurabh....@arm.com wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
>> mandatory from Armv9.5-a. It introduces instructions for computing the
>> floating point absolute maximum and minimum of the two vectors element-wise.
>> 
>> This patch introduces AdvSIMD faminmax intrinsics. The intrinsics of
>> this extension are implemented as the following builtin functions:
>> * vamax_f16
>> * vamaxq_f16
>> * vamax_f32
>> * vamaxq_f32
>> * vamaxq_f64
>> * vamin_f16
>> * vaminq_f16
>> * vamin_f32
>> * vaminq_f32
>> * vaminq_f64
>> 
>> We are defining a new way to add AArch64 AdvSIMD intrinsics by listing
>> all the intrinsics in a .def file and then using that .def file to
>> initialise various data structures. This would lead to more concise code
>> and easier addition of the new AdvSIMD intrinsics in future.
>> 
>> The faminmax intrinsics are defined using the new approach
>> 
>> gcc/ChangeLog:
>> 
>>        * config/aarch64/aarch64-builtins.cc
>>        (ENTRY): Macro to parse the contents of
>> aarch64-simd-pragma-builtins.def.
>>        (enum aarch64_builtins): New enum values for faminmax builtins
>> via aarch64-simd-pragma-builtins.def.
>>        (aarch64_init_pragma_builtins): New function to define pragma 
>> builtins.
>>        (handle_arm_neon_h): Modify to call
>> aarch64_init_pragma_builtins.
>>        (aarch64_general_check_builtin_call): Modify to check whether
>> required flag is being used for pragma builtins.
>>        (aarch64_expand_pragma_builtin): New function to emit
>> instructions of pragma builtins.
>>        (aarch64_general_expand_builtin): Modify to call
>> aarch64_expand_pragma_builtin.
>>        * config/aarch64/aarch64-option-extensions.def
>>        (AARCH64_OPT_EXTENSION): Introduce new flag for this
>> extension.
>>        * config/aarch64/aarch64-simd.md
>>        (@aarch64_<faminmax_uns_op><mode>): Instruction pattern for
>> faminmax intrinsics.
>>        * config/aarch64/aarch64.h
>>        (TARGET_FAMINMAX): Introduce new flag for this extension.
>>        * config/aarch64/iterators.md: New iterators and unspecs.
>>        * config/arm/types.md: Introduce neon_fp_aminmax<q> attributes.
>>        * doc/invoke.texi: Document extension in AArch64 Options.
>>        * config/aarch64/aarch64-simd-pragma-builtins.def: New file to
>>          list pragma builtins.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>        * gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test.
>>        * gcc.target/aarch64/simd/faminmax-builtins.c: New test.
>
> Sorry for the back-and-forth, but I just realized, why can’t we reuse the 
> existing aarch64_init_simd_intrinsics code in aarch64-builtins.cc 
> <http://aarch64-builtins.cc/>?
> It seems that it already handles most of the registration code, except that 
> it doesn’t handle checking of arch extensions.
> I think we should aim to refactor this a bit so that we can use that 
> functionality.
> But I appreciate that this would extend the scope of this patch a bit too 
> much.
> So I’m okay with this going in now, but it would be good to clean this area 
> somewhat in a separate patch so that we can rely on just 
> aarch64-simd-builtins.def, potentially augmented with extension information.


Yeah, I agree we should consolidate this more.  I think the .def file
is a good long-term direction though, and it would be good to move the
existing builtins over to that.

As it stands, neither approach is a superset of the other.  The new
approach has the architecture requirements and unspec number, but the
old approach makes it easier to handle simplifications.  Neither approach
makes it particularly easy to encode the type signature.

So one of the tasks for the follow-on would be to find a good way of
expressing the type signature and a good way of handling things like
vget_high simplification (maybe via callbacks?).

On the patch itself:

>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
>> b/gcc/config/aarch64/aarch64-builtins.cc
>> index eb878b933fe..61df394b881 100644
>> --- a/gcc/config/aarch64/aarch64-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
>> @@ -757,6 +757,10 @@ typedef struct
>>  #define VAR1(T, N, MAP, FLAG, A) \
>>    AARCH64_SIMD_BUILTIN_##T##_##N##A,
>>  
>> +#undef ENTRY
>> +#define ENTRY(N, M, U, F) \
>> +  AARCH64_##N,
>> +
>>  enum aarch64_builtins
>>  {
>>    AARCH64_BUILTIN_MIN,
>> @@ -829,6 +833,10 @@ enum aarch64_builtins
>>    AARCH64_RBIT,
>>    AARCH64_RBITL,
>>    AARCH64_RBITLL,
>> +  /* Pragma builtins.  */
>> +  AARCH64_PRAGMA_BUILTIN_START,
>> +#include "aarch64-simd-pragma-builtins.def"
>> +  AARCH64_PRAGMA_BUILTIN_END,
>>    /* System register builtins.  */
>>    AARCH64_RSR,
>>    AARCH64_RSRP,
>> @@ -947,6 +955,7 @@ const char *aarch64_scalar_builtin_types[] = {
>>  
>>  extern GTY(()) aarch64_simd_type_info aarch64_simd_types[];
>>  
>> +#undef ENTRY
>>  #define ENTRY(E, M, Q, G)  \
>>    {E, "__" #E, #G "__" #E, NULL_TREE, NULL_TREE, E_##M##mode, 
>> qualifier_##Q},
>>  struct aarch64_simd_type_info aarch64_simd_types [] = {
>> @@ -1547,6 +1556,39 @@ aarch64_init_simd_builtin_functions (bool 
>> called_from_pragma)
>>      }
>>  }
>>  
>> +/* Initialize pragma builtins.  */
>> +
>> +typedef struct
>> +{
>> +  const char *name;
>> +  machine_mode mode;
>> +  int unspec;
>> +  aarch64_feature_flags required_extensions;
>> +} pragma_builtins_data;

The typedef struct { ... } thing is a holdover from C days.  New code
should just use structs directly.  The type name should have an "aarch64_"
prefix, to avoid accidental clashes with any future target-independent code.

So:

struct aarch64_pragma_builtins_data
{
  ...
};

>> +
>> +#undef ENTRY
>> +#define ENTRY(N, M, U, F) \
>> +  {#N, E_##M##mode, U, F},
>> +
>> +pragma_builtins_data pragma_builtins[] = {

Similarly, the variable name should have an "aarch64_" prefix.  It would
be good to make the array static.

>> +#include "aarch64-simd-pragma-builtins.def"
>> +};
>> +
>> +static void
>> +aarch64_init_pragma_builtins ()
>> +{
>> +  for (size_t i = 0; i < ARRAY_SIZE (pragma_builtins); ++i)
>> +    {
>> +      pragma_builtins_data data = pragma_builtins[i];
>> +      tree type = aarch64_simd_builtin_type (data.mode, qualifier_none);
>> +      tree fntype = build_function_type_list (type, type, type, NULL_TREE);
>> +      unsigned int code = AARCH64_PRAGMA_BUILTIN_START + i + 1;
>> +      const char *name = data.name;
>> +      aarch64_builtin_decls[code]
>> +    = aarch64_general_simulate_builtin (name, fntype, code);
>> +    }
>> +}
>> +
>>  /* Register the tuple type that contains NUM_VECTORS of the AdvSIMD type
>>     indexed by TYPE_INDEX.  */
>>  static void
>> @@ -1640,6 +1682,7 @@ handle_arm_neon_h (void)
>>  
>>    aarch64_init_simd_builtin_functions (true);
>>    aarch64_init_simd_intrinsics ();
>> +  aarch64_init_pragma_builtins ();
>>  }
>>  
>>  static void
>> @@ -2326,6 +2369,16 @@ aarch64_general_check_builtin_call (location_t 
>> location, vec<location_t>,
>>      return aarch64_check_required_extensions (location, decl,
>>                                            AARCH64_FL_MEMTAG);
>>  
>> +  if (code > AARCH64_PRAGMA_BUILTIN_START
>> +      && code < AARCH64_PRAGMA_BUILTIN_END)
>> +    {
>> +      unsigned int pragma_builtins_idx =
>> +    code - (AARCH64_PRAGMA_BUILTIN_START + 1);
>> +      aarch64_feature_flags flag =
>> +    pragma_builtins[pragma_builtins_idx].required_extensions;

It would be good to have a helper for this, say:

static const aarch64_pragma_builtins_data *
aarch64_get_pragma_builtin (int code)

that returns a pointer to the entry, or null if CODE isn't a pragma
builtin.  Then the code above could just be:

  if (auto *builtin = aarch64_get_pragma_builtin (int code))
    {
      auto flags = builtin->required_extensions;
      ...
    }

("flags" rather than "flag", since in general there can be multiple
requirements.)

>> +      return aarch64_check_required_extensions (location, decl, flag);
>> +    }
>> +
>>    return true;
>>  }
>>  
>> @@ -3189,6 +3242,29 @@ aarch64_expand_builtin_data_intrinsic (unsigned int 
>> fcode, tree exp, rtx target)
>>    return ops[0].value;
>>  }
>>  
>> +static rtx
>> +aarch64_expand_pragma_builtin (unsigned int fcode, tree exp, rtx target)

With the change above, this function could take the
const aarch64_pragma_builtins_data * (or const aarch64_pragma_builtins_data &)
as an extra argument.  Then...

>> +{
>> +  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
>> +  rtx op0 = force_reg (mode, expand_normal (CALL_EXPR_ARG (exp, 0)));
>> +  rtx op1 = force_reg (mode, expand_normal (CALL_EXPR_ARG (exp, 1)));
>> +
>> +  int unspec;
>> +  if (fcode >= AARCH64_vamax_f16 && fcode <= AARCH64_vamaxq_f64)
>> +    unspec = UNSPEC_FAMAX;
>> +  else if (fcode >= AARCH64_vamin_f16 && fcode <= AARCH64_vaminq_f64)
>> +    unspec = UNSPEC_FAMIN;
>> +  else
>> +    gcc_unreachable ();
>> +
>> +  enum insn_code icode = code_for_aarch64 (unspec, mode);

...we can get the code and mode from there.

>> +  rtx pat = GEN_FCN (icode) (target, op0, op1);

It probably doesn't matter for these builtins, but it's better to use
the expand_insn interface where possible, since it coerces the input and
output operands to the required form.  See existing uses in the same file
for examples.

Thanks for doing this!

Richard

Re: [PATCH v6 1/2] aarch64: Add AdvSIMD faminmax intrinsics

Reply via email to