> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Tuesday, December 14, 2021 12:38 PM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> <richard.earns...@arm.com>; Marcus Shawcroft
> <marcus.shawcr...@arm.com>; Kyrylo Tkachov <kyrylo.tkac...@arm.com>
> Subject: Re: [PATCH]AArch64 Fix the AAPCs for new partial and full SIMD
> structure types [PR103094]
> 
> Tamar Christina <tamar.christ...@arm.com> writes:
> > Hi All,
> >
> > The new partial and full vector types added to AArch64, e.g.
> >
> > int8x8x2_t with mode V2x8QI are incorrectly being defined as being
> > short vectors and not being composite types.
> >
> > This causes the layout code to incorrectly conclude that the registers
> > are packed. i.e. for V2x8QI it thinks those 16-bytes are in the same 
> > registers.
> >
> > Because of this the code under !aarch64_composite_type_p is
> > unreachable but also lacked any extra checks to see that nregs is what we
> expected it to be.
> >
> > I have also updated aarch64_advsimd_full_struct_mode_p and
> > aarch64_advsimd_partial_struct_mode_p to only consider vector types as
> > struct modes.  Otherwise types such as OImode and friends would
> > qualify leading to incorrect results.
> 
> How easy would it be to fix the bug without doing this last bit?
> The idea was that OI, CI and XI should continue to be structure modes until
> we remove them.  aarch64_advsimd_partial_struct_mode_p
> and aarch64_advsimd_full_struct_mode_p are meant to be convenience
> wrappers and so they shouldn't make different decisions from the
> underlying aarch64_classify_vector_mode.

It can be done by moving the check higher in callers of these functions, but 
the problem is that
With an e.g. an OImode there's no real indication of how many registers are 
used to create the
IOmode. It could be 4, 6, 8 as it's just a bag of bits.

My concern is that these functions are misleading without this, with any of 
these opaque
types returning true for both of these functions it becomes harder to make 
decisions between
the two, in particular because we still expand to these modes for certain 
structures.

> 
> >
> > This patch fixes up the issues and we now generate correct code.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> >
> >
> > gcc/ChangeLog:
> >
> >     PR target/103094
> >     * config/aarch64/aarch64.c (aarch64_function_value,
> aarch64_layout_arg):
> >     Fix unreachable code for partial vectors and re-order switch to
> perform
> >     the simplest test first.
> >     (aarch64_short_vector_p): Mark as not short vectors.
> >     (aarch64_composite_type_p): Mark as composite types.
> >     (aarch64_advsimd_partial_struct_mode_p,
> >     aarch64_advsimd_full_struct_mode_p): Restrict to actual SIMD types.
> >
> > gcc/testsuite/ChangeLog:
> >
> >     PR target/103094
> >     * gcc.target/aarch64/pr103094.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.c
> > b/gcc/config/aarch64/aarch64.c index
> >
> fdf05505846721b02059df494d6395ae9423a8ef..d9104ddac3cdd44f7c2290b872
> 5d
> > 05be4fd6468f 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -3055,15 +3055,17 @@ aarch64_advsimd_struct_mode_p
> (machine_mode
> > mode)  static bool  aarch64_advsimd_partial_struct_mode_p
> > (machine_mode mode)  {
> > -  return (aarch64_classify_vector_mode (mode)
> > -     == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
> > +  return VECTOR_MODE_P (mode)
> > +    && (aarch64_classify_vector_mode (mode)
> > +           == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
> >  }
> >
> >  /* Return true if MODE is an Advanced SIMD Q-register structure mode.
> > */  static bool  aarch64_advsimd_full_struct_mode_p (machine_mode
> > mode)  {
> > -  return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
> > VEC_STRUCT));
> > +  return VECTOR_MODE_P (mode)
> > +    && (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
> > +VEC_STRUCT));
> >  }
> >
> >  /* Return true if MODE is any of the data vector modes, including @@
> > -6468,17 +6470,21 @@ aarch64_function_value (const_tree type,
> const_tree func,
> >                                            NULL, false))
> >      {
> >        gcc_assert (!sve_p);
> > -      if (!aarch64_composite_type_p (type, mode))
> > +      if (aarch64_advsimd_full_struct_mode_p (mode))
> > +   {
> > +     gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 16),
> count));
> > +     return gen_rtx_REG (mode, V0_REGNUM);
> > +   }
> > +      else if (aarch64_advsimd_partial_struct_mode_p (mode))
> > +   {
> > +     gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 8),
> count));
> > +     return gen_rtx_REG (mode, V0_REGNUM);
> > +   }
> > +      else if (!aarch64_composite_type_p (type, mode))
> >     {
> >       gcc_assert (count == 1 && mode == ag_mode);
> >       return gen_rtx_REG (mode, V0_REGNUM);
> >     }
> > -      else if (aarch64_advsimd_full_struct_mode_p (mode)
> > -          && known_eq (GET_MODE_SIZE (ag_mode), 16))
> > -   return gen_rtx_REG (mode, V0_REGNUM);
> > -      else if (aarch64_advsimd_partial_struct_mode_p (mode)
> > -          && known_eq (GET_MODE_SIZE (ag_mode), 8))
> > -   return gen_rtx_REG (mode, V0_REGNUM);
> >        else
> >     {
> >       int i;
> > @@ -6745,6 +6751,7 @@ aarch64_layout_arg (cumulative_args_t pcum_v,
> const function_arg_info &arg)
> >      /* No frontends can create types with variable-sized modes, so we
> >         shouldn't be asked to pass or return them.  */
> >      size = GET_MODE_SIZE (mode).to_constant ();
> > +
> >    size = ROUND_UP (size, UNITS_PER_WORD);
> >
> >    allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P
> > (mode); @@ -6769,17 +6776,21 @@ aarch64_layout_arg
> (cumulative_args_t pcum_v, const function_arg_info &arg)
> >        if (nvrn + nregs <= NUM_FP_ARG_REGS)
> >     {
> >       pcum->aapcs_nextnvrn = nvrn + nregs;
> > -     if (!aarch64_composite_type_p (type, mode))
> > +     if (aarch64_advsimd_full_struct_mode_p (mode))
> > +       {
> > +         gcc_assert (nregs == size / 16);
> > +         pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
> > +       }
> > +     else if (aarch64_advsimd_partial_struct_mode_p (mode))
> > +       {
> > +         gcc_assert (nregs == size / 8);
> > +         pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
> > +       }
> > +     else if (!aarch64_composite_type_p (type, mode))
> >         {
> >           gcc_assert (nregs == 1);
> >           pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
> >         }
> > -     else if (aarch64_advsimd_full_struct_mode_p (mode)
> > -              && known_eq (GET_MODE_SIZE (pcum-
> >aapcs_vfp_rmode), 16))
> > -       pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
> > -     else if (aarch64_advsimd_partial_struct_mode_p (mode)
> > -              && known_eq (GET_MODE_SIZE (pcum-
> >aapcs_vfp_rmode), 8))
> > -       pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + nvrn);
> >       else
> >         {
> >           rtx par;
> > @@ -19285,6 +19296,13 @@ aarch64_short_vector_p (const_tree type,
> >        else
> >     size = GET_MODE_SIZE (mode);
> >      }
> > +
> > +  /* If a Advanced SIMD partial or full aggregate vector type we aren't a
> short
> > +     type.  */
> > +  if (aarch64_advsimd_partial_struct_mode_p (mode)
> > +      || aarch64_advsimd_full_struct_mode_p (mode))
> > +    return false;
> > +
> >    if (known_eq (size, 8) || known_eq (size, 16))
> >      {
> >        /* 64-bit and 128-bit vectors should only acquire an SVE mode
> > if
> 
> I think the bug here is that we trust the mode even if we're given a
> conflicting type.  In principle it would be OK to use, say, V4SI for an array 
> of 4
> ints, but that shouldn't suddenly make aarch64_short_vector_p true.
> 
> Unfortunately that ship has sailed, so we e.g. treat:
> 
>   struct wrapper { int32x4_t x; int :0; };
> 
> as a short vector too.
> 
> So it feels like this a case of limiting the contagion and that the check 
> should
> go in here:
> 
>   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>          || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
>     {
>       /* Rely only on the type, not the mode, when processing SVE types.  */
>       if (type && aarch64_some_values_include_pst_objects_p (type))
>       /* Leave later code to report an error if SVE is disabled.  */
>       gcc_assert (!TARGET_SVE || aarch64_sve_mode_p (mode));
>       else
>       size = GET_MODE_SIZE (mode);
>     }
> 
> where we needed similar protection for SVE.  E.g. we could change the inner
> else to:

Indeed, I did see that for SVE we use the types instead of the modes, but the
types are not passed to all functions. So this would get these to return a 
different
nregs than what e.g. aarch64_layout_arg calculates itself.  Of course I can 
remove
the asserts but I think they're useful in catching issues like these.

I can also just change all that code to use type instead.

> 
>       else if (!aarch64_advsimd_struct_mode_p (mode))
> 
> or keep it is an early-out (but within the outer “else if”) if that seems 
> clearer.
> 
> > @@ -19316,6 +19334,12 @@ static bool
> >  aarch64_composite_type_p (const_tree type,
> >                       machine_mode mode)
> >  {
> > +  /* If a Advanced SIMD partial or full aggregate vector type we are a
> > +     composite type.  */
> > +  if (aarch64_advsimd_partial_struct_mode_p (mode)
> > +      || aarch64_advsimd_full_struct_mode_p (mode))
> > +    return true;
> > +
> 
> Isn't this naturally true after the fix to aarch64_short_vector_p?
> It would be good to avoid adding new “mode only” tests if we can help it.

Yes but you can call this function directly and it should still return the right
value for the new struct modes. 

> 
> Also, the old code didn't handle OI, CI or XI specially here, so doing
> something different now might be dangerous.

This shouldn't change the handling of OI mode and friends though. Since they 
would
all return false here and fall through to the old code.  It's only problematic 
if these new
convenience functions don't exclude OI and other non-vector modes.

So this should only change the behaviour for actual structure modes.  But as 
you say,
 I can look at the types, though my concern is that there's technically nothing 
stopping
an expand pattern from expanding to OImode with a structure "type", in which 
case
inspecting the type will change the behavior whereas the mode is a bit safer 
until we
remove the other modes entirely.

But happy to rewrite it to use the type instead if that's preferred. 

Cheers,
Tamar

> 
> Thanks,
> Richard
> 
> >    if (aarch64_short_vector_p (type, mode))
> >      return false;
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr103094.c
> > b/gcc/testsuite/gcc.target/aarch64/pr103094.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..441e602928ce8ac4e9890a137
> 6ac
> > bc25671e284d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr103094.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-fdump-rtl-expand -w" } */
> > +
> > +#include <arm_neon.h>
> > +
> > +void foo (uint8x8x2_t cols_01_23, uint8x8x2_t cols_45_67, uint16_t*
> > +outptr0) {
> > +  uint16x4x4_t cols_01_23_45_67 = { {
> > +    vreinterpret_u16_u8(cols_01_23.val[0]),
> > +    vreinterpret_u16_u8(cols_01_23.val[1]),
> > +    vreinterpret_u16_u8(cols_45_67.val[0]),
> > +    vreinterpret_u16_u8(cols_45_67.val[1])
> > +  } };
> > +
> > +  vst4_lane_u16(outptr0, cols_01_23_45_67, 0); }
> > +
> > +/* Check that we expand to v0 and v2 from the function arguments.  */
> > +/* { dg-final { scan-rtl-dump {\(reg:V2x8QI \d+ v0 \[ cols_01_23
> > +\]\)} expand } } */
> > +/* { dg-final { scan-rtl-dump {\(reg:V2x8QI \d+ v2 \[ cols_45_67
> > +\]\)} expand } } */
> > +

Reply via email to