Thanks for your comments, Richard.

From: Richard Sandiford <richard.sandif...@arm.com>
Date: Friday, May 12, 2023 at 1:02 AM
To: Tejas Belagod <tejas.bela...@arm.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>, Tejas Belagod 
<tejas.bela...@arm.com>
Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]
Tejas Belagod <tejas.bela...@arm.com> writes:
> From: Tejas Belagod <tbela...@arm.com>
>
>   This PR optimizes an SVE intrinsics sequence where
>     svlasta (svptrue_pat_b8 (SV_VL1), x)
>   a scalar is selected based on a constant predicate and a variable vector.
>   This sequence is optimized to return the correspoding element of a NEON
>   vector. For eg.
>     svlasta (svptrue_pat_b8 (SV_VL1), x)
>   returns
>     umov    w0, v0.b[1]
>   Likewise,
>     svlastb (svptrue_pat_b8 (SV_VL1), x)
>   returns
>      umov    w0, v0.b[0]
>   This optimization only works provided the constant predicate maps to a range
>   that is within the bounds of a 128-bit NEON register.
>
> gcc/ChangeLog:
>
>        PR target/96339
>        * config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): 
> Fold sve
>        calls that have a constant input predicate vector.
>        (svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
>        (svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
>        (svlast_impl::vect_all_same): Check if all vector elements are equal.
>
> gcc/testsuite/ChangeLog:
>
>        PR target/96339
>        * gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
>        * gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
>        * gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
>        * gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
>        to expect optimized code for function body.
>        * gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
>        * gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      | 124 +++++++
>  .../aarch64/sve/acle/general-c/svlast.c       |  63 ++++
>  .../sve/acle/general-c/svlast128_run.c        | 313 +++++++++++++++++
>  .../sve/acle/general-c/svlast256_run.c        | 314 ++++++++++++++++++
>  .../gcc.target/aarch64/sve/pcs/return_4.c     |   2 -
>  .../aarch64/sve/pcs/return_4_1024.c           |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_4_128.c |   2 -
>  .../aarch64/sve/pcs/return_4_2048.c           |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_4_256.c |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_4_512.c |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5.c     |   2 -
>  .../aarch64/sve/pcs/return_5_1024.c           |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5_128.c |   2 -
>  .../aarch64/sve/pcs/return_5_2048.c           |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5_256.c |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5_512.c |   2 -
>  16 files changed, 814 insertions(+), 24 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index cd9cace3c9b..db2b4dcaac9 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -1056,6 +1056,130 @@ class svlast_impl : public quiet<function_base>
>  public:
>    CONSTEXPR svlast_impl (int unspec) : m_unspec (unspec) {}
>
> +  bool is_lasta () const { return m_unspec == UNSPEC_LASTA; }
> +  bool is_lastb () const { return m_unspec == UNSPEC_LASTB; }
> +
> +  bool vect_all_same (tree v , int step) const

Nit: stray space after "v".

> +  {
> +    int i;
> +    int nelts = vector_cst_encoded_nelts (v);
> +    int first_el = 0;
> +
> +    for (i = first_el; i < nelts; i += step)
> +      if (VECTOR_CST_ENCODED_ELT (v, i) != VECTOR_CST_ENCODED_ELT (v, 
> first_el))

I think this should use !operand_equal_p (..., ..., 0).

Oops! I wonder why I thought VECTOR_CST_ENCODED_ELT returned a constant! Thanks 
for spotting that. Also, should the flags here be OEP_ONLY_CONST ?


> +     return false;
> +
> +    return true;
> +  }
> +
> +  /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF.
> +     BIT_FIELD_REF lowers to a NEON element extract, so we have to make sure
> +     the index of the element being accessed is in the range of a NEON vector
> +     width.  */

s/NEON/Advanced SIMD/.  Same in later comments

> +  gimple *fold (gimple_folder & f) const override
> +  {
> +    tree pred = gimple_call_arg (f.call, 0);
> +    tree val = gimple_call_arg (f.call, 1);
> +
> +    if (TREE_CODE (pred) == VECTOR_CST)
> +      {
> +     HOST_WIDE_INT pos;
> +     unsigned int const_vg;
> +     int i = 0;
> +     int step = f.type_suffix (0).element_bytes;
> +     int step_1 = gcd (step, VECTOR_CST_NPATTERNS (pred));
> +     int npats = VECTOR_CST_NPATTERNS (pred);
> +     unsigned HOST_WIDE_INT nelts = vector_cst_encoded_nelts (pred);
> +     tree b = NULL_TREE;
> +     bool const_vl = aarch64_sve_vg.is_constant (&const_vg);

I think this might be left over from previous versions, but:
const_vg isn't used and const_vl is only used once, so I think it
would be better to remove them.

> +
> +     /* We can optimize 2 cases common to variable and fixed-length cases
> +        without a linear search of the predicate vector:
> +        1.  LASTA if predicate is all true, return element 0.
> +        2.  LASTA if predicate all false, return element 0.  */
> +     if (is_lasta () && vect_all_same (pred, step_1))
> +       {
> +         b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
> +                     bitsize_int (step * BITS_PER_UNIT), bitsize_int (0));
> +         return gimple_build_assign (f.lhs, b);
> +       }
> +
> +     /* Handle the all-false case for LASTB where SVE VL == 128b -
> +        return the highest numbered element.  */
> +     if (is_lastb () && known_eq (BYTES_PER_SVE_VECTOR, 16)
> +         && vect_all_same (pred, step_1)
> +         && integer_zerop (VECTOR_CST_ENCODED_ELT (pred, 0)))

Formatting nit: one condition per line once one line isn't enough.

> +       {
> +         b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
> +                     bitsize_int (step * BITS_PER_UNIT),
> +                     bitsize_int ((16 - step) * BITS_PER_UNIT));
> +
> +         return gimple_build_assign (f.lhs, b);
> +       }
> +
> +     /* If VECTOR_CST_NELTS_PER_PATTERN (pred) == 2 and every multiple of
> +        'step_1' in
> +        [VECTOR_CST_NPATTERNS .. VECTOR_CST_ENCODED_NELTS - 1]
> +        is zero, then we can treat the vector as VECTOR_CST_NPATTERNS
> +        elements followed by all inactive elements.  */
> +     if (!const_vl && VECTOR_CST_NELTS_PER_PATTERN (pred) == 2)

Following on from the above, maybe use:

  !VECTOR_CST_NELTS (pred).is_constant ()

instead of !const_vl here.

I have a horrible suspicion that I'm contradicting our earlier discussion
here, sorry, but: I think we have to return null if NELTS_PER_PATTERN != 2.

IIUC, the NPATTERNS .. ENCODED_ELTS represent the repeated part of the encoded 
constant. This means the repetition occurs if NELTS_PER_PATTERN == 2, IOW the 
base1 repeats in the encoding. This loop is checking this condition and looks 
for a 1 in the repeated part of the NELTS_PER_PATTERN == 2 in a VL vector. 
Please correct me if I’m misunderstanding here.



OK with those changes, thanks.  I'm going to have to take your word
for the tests being right :)

I’ve manually inspected the assembler tests and compared runtime tests with the 
optimization switched off which looked Ok.

Thanks,
Tejas.

Richard

> +       for (i = npats; i < nelts; i += step_1)
> +         {
> +           /* If there are active elements in the repeated pattern of
> +              a variable-length vector, then return NULL as there is no way
> +              to be sure statically if this falls within the NEON range.  */
> +           if (!integer_zerop (VECTOR_CST_ENCODED_ELT (pred, i)))
> +             return NULL;
> +         }
> +
> +     /* If we're here, it means either:
> +        1. The vector is variable-length and there's no active element in the
> +        repeated part of the pattern, or
> +        2. The vector is fixed-length.
> +        Fall-through to a linear search.  */
> +
> +     /* Restrict the scope of search to NPATS if vector is
> +        variable-length.  */
> +     if (!VECTOR_CST_NELTS (pred).is_constant (&nelts))
> +       nelts = npats;
> +
> +     /* Fall through to finding the last active element linearly for
> +        for all cases where the last active element is known to be
> +        within a statically-determinable range.  */
> +     i = MAX ((int)nelts - step, 0);
> +     for (; i >= 0; i -= step)
> +       if (!integer_zerop (VECTOR_CST_ELT (pred, i)))
> +         break;
> +
> +     if (is_lastb ())
> +       {
> +         /* For LASTB, the element is the last active element.  */
> +         pos = i;
> +       }
> +     else
> +       {
> +         /* For LASTA, the element is one after last active element.  */
> +         pos = i + step;
> +
> +         /* If last active element is
> +            last element, wrap-around and return first NEON element.  */
> +         if (known_ge (pos, BYTES_PER_SVE_VECTOR))
> +           pos = 0;
> +       }
> +
> +     /* Out of NEON range.  */
> +     if (pos < 0 || pos > 15)
> +       return NULL;
> +
> +     b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
> +                 bitsize_int (step * BITS_PER_UNIT),
> +                 bitsize_int (pos * BITS_PER_UNIT));
> +
> +     return gimple_build_assign (f.lhs, b);
> +      }
> +    return NULL;
> +  }
> +
>    rtx
>    expand (function_expander &e) const override
>    {
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c
> new file mode 100644
> index 00000000000..fdbe5e309af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c
> @@ -0,0 +1,63 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -msve-vector-bits=256" } */
> +
> +#include <stdint.h>
> +#include "arm_sve.h"
> +
> +#define NAME(name, size, pat, sign, ab) \
> +  name ## _ ## size ## _ ## pat ## _ ## sign ## _ ## ab
> +
> +#define NAMEF(name, size, pat, sign, ab) \
> +  name ## _ ## size ## _ ## pat ## _ ## sign ## _ ## ab ## _false
> +
> +#define SVTYPE(size, sign) \
> +  sv ## sign ## int ## size ## _t
> +
> +#define STYPE(size, sign) sign ## int ## size ##_t
> +
> +#define SVELAST_DEF(size, pat, sign, ab, su) \
> +  STYPE (size, sign) __attribute__((noinline)) \
> +  NAME (foo, size, pat, sign, ab) (SVTYPE (size, sign) x) \
> +  { \
> +    return svlast ## ab (svptrue_pat_b ## size (pat), x); \
> +  } \
> +  STYPE (size, sign) __attribute__((noinline)) \
> +  NAMEF (foo, size, pat, sign, ab) (SVTYPE (size, sign) x) \
> +  { \
> +    return svlast ## ab (svpfalse (), x); \
> +  }
> +
> +#define ALL_PATS(SIZE, SIGN, AB, SU) \
> +  SVELAST_DEF (SIZE, SV_VL1, SIGN, AB, SU) \
> +  SVELAST_DEF (SIZE, SV_VL2, SIGN, AB, SU) \
> +  SVELAST_DEF (SIZE, SV_VL3, SIGN, AB, SU) \
> +  SVELAST_DEF (SIZE, SV_VL4, SIGN, AB, SU) \
> +  SVELAST_DEF (SIZE, SV_VL5, SIGN, AB, SU) \
> +  SVELAST_DEF (SIZE, SV_VL6, SIGN, AB, SU) \
> +  SVELAST_DEF (SIZE, SV_VL7, SIGN, AB, SU) \
> +  SVELAST_DEF (SIZE, SV_VL8, SIGN, AB, SU) \
> +  SVELAST_DEF (SIZE, SV_VL16, SIGN, AB, SU)
> +
> +#define ALL_SIGN(SIZE, AB) \
> +  ALL_PATS (SIZE, , AB, s) \
> +  ALL_PATS (SIZE, u, AB, u)
> +
> +#define ALL_SIZE(AB) \
> +  ALL_SIGN (8, AB) \
> +  ALL_SIGN (16, AB) \
> +  ALL_SIGN (32, AB) \
> +  ALL_SIGN (64, AB)
> +
> +#define ALL_POS() \
> +  ALL_SIZE (a) \
> +  ALL_SIZE (b)
> +
> +
> +ALL_POS()
> +
> +/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.b} 52 } } */
> +/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.h} 50 } } */
> +/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.s} 12 } } */
> +/* { dg-final { scan-assembler-times {\tfmov\tw0, s0} 24 } } */
> +/* { dg-final { scan-assembler-times {\tumov\tx[0-9]+, v[0-9]+\.d} 4 } } */
> +/* { dg-final { scan-assembler-times {\tfmov\tx0, d0} 32 } } */
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c
> new file mode 100644
> index 00000000000..5e1e9303d7b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c
> @@ -0,0 +1,313 @@
> +/* { dg-do run { target aarch64_sve128_hw } } */
> +/* { dg-options "-O3 -msve-vector-bits=128 -std=gnu99" } */
> +
> +#include "svlast.c"
> +
> +int
> +main (void)
> +{
> +  int8_t res_8_SV_VL1__a = 1;
> +  int8_t res_8_SV_VL2__a = 2;
> +  int8_t res_8_SV_VL3__a = 3;
> +  int8_t res_8_SV_VL4__a = 4;
> +  int8_t res_8_SV_VL5__a = 5;
> +  int8_t res_8_SV_VL6__a = 6;
> +  int8_t res_8_SV_VL7__a = 7;
> +  int8_t res_8_SV_VL8__a = 8;
> +  int8_t res_8_SV_VL16__a = 0;
> +  uint8_t res_8_SV_VL1_u_a = 1;
> +  uint8_t res_8_SV_VL2_u_a = 2;
> +  uint8_t res_8_SV_VL3_u_a = 3;
> +  uint8_t res_8_SV_VL4_u_a = 4;
> +  uint8_t res_8_SV_VL5_u_a = 5;
> +  uint8_t res_8_SV_VL6_u_a = 6;
> +  uint8_t res_8_SV_VL7_u_a = 7;
> +  uint8_t res_8_SV_VL8_u_a = 8;
> +  uint8_t res_8_SV_VL16_u_a = 0;
> +  int16_t res_16_SV_VL1__a = 1;
> +  int16_t res_16_SV_VL2__a = 2;
> +  int16_t res_16_SV_VL3__a = 3;
> +  int16_t res_16_SV_VL4__a = 4;
> +  int16_t res_16_SV_VL5__a = 5;
> +  int16_t res_16_SV_VL6__a = 6;
> +  int16_t res_16_SV_VL7__a = 7;
> +  int16_t res_16_SV_VL8__a = 0;
> +  int16_t res_16_SV_VL16__a = 0;
> +  uint16_t res_16_SV_VL1_u_a = 1;
> +  uint16_t res_16_SV_VL2_u_a = 2;
> +  uint16_t res_16_SV_VL3_u_a = 3;
> +  uint16_t res_16_SV_VL4_u_a = 4;
> +  uint16_t res_16_SV_VL5_u_a = 5;
> +  uint16_t res_16_SV_VL6_u_a = 6;
> +  uint16_t res_16_SV_VL7_u_a = 7;
> +  uint16_t res_16_SV_VL8_u_a = 0;
> +  uint16_t res_16_SV_VL16_u_a = 0;
> +  int32_t res_32_SV_VL1__a = 1;
> +  int32_t res_32_SV_VL2__a = 2;
> +  int32_t res_32_SV_VL3__a = 3;
> +  int32_t res_32_SV_VL4__a = 0;
> +  int32_t res_32_SV_VL5__a = 0;
> +  int32_t res_32_SV_VL6__a = 0;
> +  int32_t res_32_SV_VL7__a = 0;
> +  int32_t res_32_SV_VL8__a = 0;
> +  int32_t res_32_SV_VL16__a = 0;
> +  uint32_t res_32_SV_VL1_u_a = 1;
> +  uint32_t res_32_SV_VL2_u_a = 2;
> +  uint32_t res_32_SV_VL3_u_a = 3;
> +  uint32_t res_32_SV_VL4_u_a = 0;
> +  uint32_t res_32_SV_VL5_u_a = 0;
> +  uint32_t res_32_SV_VL6_u_a = 0;
> +  uint32_t res_32_SV_VL7_u_a = 0;
> +  uint32_t res_32_SV_VL8_u_a = 0;
> +  uint32_t res_32_SV_VL16_u_a = 0;
> +  int64_t res_64_SV_VL1__a = 1;
> +  int64_t res_64_SV_VL2__a = 0;
> +  int64_t res_64_SV_VL3__a = 0;
> +  int64_t res_64_SV_VL4__a = 0;
> +  int64_t res_64_SV_VL5__a = 0;
> +  int64_t res_64_SV_VL6__a = 0;
> +  int64_t res_64_SV_VL7__a = 0;
> +  int64_t res_64_SV_VL8__a = 0;
> +  int64_t res_64_SV_VL16__a = 0;
> +  uint64_t res_64_SV_VL1_u_a = 1;
> +  uint64_t res_64_SV_VL2_u_a = 0;
> +  uint64_t res_64_SV_VL3_u_a = 0;
> +  uint64_t res_64_SV_VL4_u_a = 0;
> +  uint64_t res_64_SV_VL5_u_a = 0;
> +  uint64_t res_64_SV_VL6_u_a = 0;
> +  uint64_t res_64_SV_VL7_u_a = 0;
> +  uint64_t res_64_SV_VL8_u_a = 0;
> +  uint64_t res_64_SV_VL16_u_a = 0;
> +  int8_t res_8_SV_VL1__b = 0;
> +  int8_t res_8_SV_VL2__b = 1;
> +  int8_t res_8_SV_VL3__b = 2;
> +  int8_t res_8_SV_VL4__b = 3;
> +  int8_t res_8_SV_VL5__b = 4;
> +  int8_t res_8_SV_VL6__b = 5;
> +  int8_t res_8_SV_VL7__b = 6;
> +  int8_t res_8_SV_VL8__b = 7;
> +  int8_t res_8_SV_VL16__b = 15;
> +  uint8_t res_8_SV_VL1_u_b = 0;
> +  uint8_t res_8_SV_VL2_u_b = 1;
> +  uint8_t res_8_SV_VL3_u_b = 2;
> +  uint8_t res_8_SV_VL4_u_b = 3;
> +  uint8_t res_8_SV_VL5_u_b = 4;
> +  uint8_t res_8_SV_VL6_u_b = 5;
> +  uint8_t res_8_SV_VL7_u_b = 6;
> +  uint8_t res_8_SV_VL8_u_b = 7;
> +  uint8_t res_8_SV_VL16_u_b = 15;
> +  int16_t res_16_SV_VL1__b = 0;
> +  int16_t res_16_SV_VL2__b = 1;
> +  int16_t res_16_SV_VL3__b = 2;
> +  int16_t res_16_SV_VL4__b = 3;
> +  int16_t res_16_SV_VL5__b = 4;
> +  int16_t res_16_SV_VL6__b = 5;
> +  int16_t res_16_SV_VL7__b = 6;
> +  int16_t res_16_SV_VL8__b = 7;
> +  int16_t res_16_SV_VL16__b = 7;
> +  uint16_t res_16_SV_VL1_u_b = 0;
> +  uint16_t res_16_SV_VL2_u_b = 1;
> +  uint16_t res_16_SV_VL3_u_b = 2;
> +  uint16_t res_16_SV_VL4_u_b = 3;
> +  uint16_t res_16_SV_VL5_u_b = 4;
> +  uint16_t res_16_SV_VL6_u_b = 5;
> +  uint16_t res_16_SV_VL7_u_b = 6;
> +  uint16_t res_16_SV_VL8_u_b = 7;
> +  uint16_t res_16_SV_VL16_u_b = 7;
> +  int32_t res_32_SV_VL1__b = 0;
> +  int32_t res_32_SV_VL2__b = 1;
> +  int32_t res_32_SV_VL3__b = 2;
> +  int32_t res_32_SV_VL4__b = 3;
> +  int32_t res_32_SV_VL5__b = 3;
> +  int32_t res_32_SV_VL6__b = 3;
> +  int32_t res_32_SV_VL7__b = 3;
> +  int32_t res_32_SV_VL8__b = 3;
> +  int32_t res_32_SV_VL16__b = 3;
> +  uint32_t res_32_SV_VL1_u_b = 0;
> +  uint32_t res_32_SV_VL2_u_b = 1;
> +  uint32_t res_32_SV_VL3_u_b = 2;
> +  uint32_t res_32_SV_VL4_u_b = 3;
> +  uint32_t res_32_SV_VL5_u_b = 3;
> +  uint32_t res_32_SV_VL6_u_b = 3;
> +  uint32_t res_32_SV_VL7_u_b = 3;
> +  uint32_t res_32_SV_VL8_u_b = 3;
> +  uint32_t res_32_SV_VL16_u_b = 3;
> +  int64_t res_64_SV_VL1__b = 0;
> +  int64_t res_64_SV_VL2__b = 1;
> +  int64_t res_64_SV_VL3__b = 1;
> +  int64_t res_64_SV_VL4__b = 1;
> +  int64_t res_64_SV_VL5__b = 1;
> +  int64_t res_64_SV_VL6__b = 1;
> +  int64_t res_64_SV_VL7__b = 1;
> +  int64_t res_64_SV_VL8__b = 1;
> +  int64_t res_64_SV_VL16__b = 1;
> +  uint64_t res_64_SV_VL1_u_b = 0;
> +  uint64_t res_64_SV_VL2_u_b = 1;
> +  uint64_t res_64_SV_VL3_u_b = 1;
> +  uint64_t res_64_SV_VL4_u_b = 1;
> +  uint64_t res_64_SV_VL5_u_b = 1;
> +  uint64_t res_64_SV_VL6_u_b = 1;
> +  uint64_t res_64_SV_VL7_u_b = 1;
> +  uint64_t res_64_SV_VL8_u_b = 1;
> +  uint64_t res_64_SV_VL16_u_b = 1;
> +
> +  int8_t res_8_SV_VL1__a_false = 0;
> +  int8_t res_8_SV_VL2__a_false = 0;
> +  int8_t res_8_SV_VL3__a_false = 0;
> +  int8_t res_8_SV_VL4__a_false = 0;
> +  int8_t res_8_SV_VL5__a_false = 0;
> +  int8_t res_8_SV_VL6__a_false = 0;
> +  int8_t res_8_SV_VL7__a_false = 0;
> +  int8_t res_8_SV_VL8__a_false = 0;
> +  int8_t res_8_SV_VL16__a_false = 0;
> +  uint8_t res_8_SV_VL1_u_a_false = 0;
> +  uint8_t res_8_SV_VL2_u_a_false = 0;
> +  uint8_t res_8_SV_VL3_u_a_false = 0;
> +  uint8_t res_8_SV_VL4_u_a_false = 0;
> +  uint8_t res_8_SV_VL5_u_a_false = 0;
> +  uint8_t res_8_SV_VL6_u_a_false = 0;
> +  uint8_t res_8_SV_VL7_u_a_false = 0;
> +  uint8_t res_8_SV_VL8_u_a_false = 0;
> +  uint8_t res_8_SV_VL16_u_a_false = 0;
> +  int16_t res_16_SV_VL1__a_false = 0;
> +  int16_t res_16_SV_VL2__a_false = 0;
> +  int16_t res_16_SV_VL3__a_false = 0;
> +  int16_t res_16_SV_VL4__a_false = 0;
> +  int16_t res_16_SV_VL5__a_false = 0;
> +  int16_t res_16_SV_VL6__a_false = 0;
> +  int16_t res_16_SV_VL7__a_false = 0;
> +  int16_t res_16_SV_VL8__a_false = 0;
> +  int16_t res_16_SV_VL16__a_false = 0;
> +  uint16_t res_16_SV_VL1_u_a_false = 0;
> +  uint16_t res_16_SV_VL2_u_a_false = 0;
> +  uint16_t res_16_SV_VL3_u_a_false = 0;
> +  uint16_t res_16_SV_VL4_u_a_false = 0;
> +  uint16_t res_16_SV_VL5_u_a_false = 0;
> +  uint16_t res_16_SV_VL6_u_a_false = 0;
> +  uint16_t res_16_SV_VL7_u_a_false = 0;
> +  uint16_t res_16_SV_VL8_u_a_false = 0;
> +  uint16_t res_16_SV_VL16_u_a_false = 0;
> +  int32_t res_32_SV_VL1__a_false = 0;
> +  int32_t res_32_SV_VL2__a_false = 0;
> +  int32_t res_32_SV_VL3__a_false = 0;
> +  int32_t res_32_SV_VL4__a_false = 0;
> +  int32_t res_32_SV_VL5__a_false = 0;
> +  int32_t res_32_SV_VL6__a_false = 0;
> +  int32_t res_32_SV_VL7__a_false = 0;
> +  int32_t res_32_SV_VL8__a_false = 0;
> +  int32_t res_32_SV_VL16__a_false = 0;
> +  uint32_t res_32_SV_VL1_u_a_false = 0;
> +  uint32_t res_32_SV_VL2_u_a_false = 0;
> +  uint32_t res_32_SV_VL3_u_a_false = 0;
> +  uint32_t res_32_SV_VL4_u_a_false = 0;
> +  uint32_t res_32_SV_VL5_u_a_false = 0;
> +  uint32_t res_32_SV_VL6_u_a_false = 0;
> +  uint32_t res_32_SV_VL7_u_a_false = 0;
> +  uint32_t res_32_SV_VL8_u_a_false = 0;
> +  uint32_t res_32_SV_VL16_u_a_false = 0;
> +  int64_t res_64_SV_VL1__a_false = 0;
> +  int64_t res_64_SV_VL2__a_false = 0;
> +  int64_t res_64_SV_VL3__a_false = 0;
> +  int64_t res_64_SV_VL4__a_false = 0;
> +  int64_t res_64_SV_VL5__a_false = 0;
> +  int64_t res_64_SV_VL6__a_false = 0;
> +  int64_t res_64_SV_VL7__a_false = 0;
> +  int64_t res_64_SV_VL8__a_false = 0;
> +  int64_t res_64_SV_VL16__a_false = 0;
> +  uint64_t res_64_SV_VL1_u_a_false = 0;
> +  uint64_t res_64_SV_VL2_u_a_false = 0;
> +  uint64_t res_64_SV_VL3_u_a_false = 0;
> +  uint64_t res_64_SV_VL4_u_a_false = 0;
> +  uint64_t res_64_SV_VL5_u_a_false = 0;
> +  uint64_t res_64_SV_VL6_u_a_false = 0;
> +  uint64_t res_64_SV_VL7_u_a_false = 0;
> +  uint64_t res_64_SV_VL8_u_a_false = 0;
> +  uint64_t res_64_SV_VL16_u_a_false = 0;
> +  int8_t res_8_SV_VL1__b_false = 15;
> +  int8_t res_8_SV_VL2__b_false = 15;
> +  int8_t res_8_SV_VL3__b_false = 15;
> +  int8_t res_8_SV_VL4__b_false = 15;
> +  int8_t res_8_SV_VL5__b_false = 15;
> +  int8_t res_8_SV_VL6__b_false = 15;
> +  int8_t res_8_SV_VL7__b_false = 15;
> +  int8_t res_8_SV_VL8__b_false = 15;
> +  int8_t res_8_SV_VL16__b_false = 15;
> +  uint8_t res_8_SV_VL1_u_b_false = 15;
> +  uint8_t res_8_SV_VL2_u_b_false = 15;
> +  uint8_t res_8_SV_VL3_u_b_false = 15;
> +  uint8_t res_8_SV_VL4_u_b_false = 15;
> +  uint8_t res_8_SV_VL5_u_b_false = 15;
> +  uint8_t res_8_SV_VL6_u_b_false = 15;
> +  uint8_t res_8_SV_VL7_u_b_false = 15;
> +  uint8_t res_8_SV_VL8_u_b_false = 15;
> +  uint8_t res_8_SV_VL16_u_b_false = 15;
> +  int16_t res_16_SV_VL1__b_false = 7;
> +  int16_t res_16_SV_VL2__b_false = 7;
> +  int16_t res_16_SV_VL3__b_false = 7;
> +  int16_t res_16_SV_VL4__b_false = 7;
> +  int16_t res_16_SV_VL5__b_false = 7;
> +  int16_t res_16_SV_VL6__b_false = 7;
> +  int16_t res_16_SV_VL7__b_false = 7;
> +  int16_t res_16_SV_VL8__b_false = 7;
> +  int16_t res_16_SV_VL16__b_false = 7;
> +  uint16_t res_16_SV_VL1_u_b_false = 7;
> +  uint16_t res_16_SV_VL2_u_b_false = 7;
> +  uint16_t res_16_SV_VL3_u_b_false = 7;
> +  uint16_t res_16_SV_VL4_u_b_false = 7;
> +  uint16_t res_16_SV_VL5_u_b_false = 7;
> +  uint16_t res_16_SV_VL6_u_b_false = 7;
> +  uint16_t res_16_SV_VL7_u_b_false = 7;
> +  uint16_t res_16_SV_VL8_u_b_false = 7;
> +  uint16_t res_16_SV_VL16_u_b_false = 7;
> +  int32_t res_32_SV_VL1__b_false = 3;
> +  int32_t res_32_SV_VL2__b_false = 3;
> +  int32_t res_32_SV_VL3__b_false = 3;
> +  int32_t res_32_SV_VL4__b_false = 3;
> +  int32_t res_32_SV_VL5__b_false = 3;
> +  int32_t res_32_SV_VL6__b_false = 3;
> +  int32_t res_32_SV_VL7__b_false = 3;
> +  int32_t res_32_SV_VL8__b_false = 3;
> +  int32_t res_32_SV_VL16__b_false = 3;
> +  uint32_t res_32_SV_VL1_u_b_false = 3;
> +  uint32_t res_32_SV_VL2_u_b_false = 3;
> +  uint32_t res_32_SV_VL3_u_b_false = 3;
> +  uint32_t res_32_SV_VL4_u_b_false = 3;
> +  uint32_t res_32_SV_VL5_u_b_false = 3;
> +  uint32_t res_32_SV_VL6_u_b_false = 3;
> +  uint32_t res_32_SV_VL7_u_b_false = 3;
> +  uint32_t res_32_SV_VL8_u_b_false = 3;
> +  uint32_t res_32_SV_VL16_u_b_false = 3;
> +  int64_t res_64_SV_VL1__b_false = 1;
> +  int64_t res_64_SV_VL2__b_false = 1;
> +  int64_t res_64_SV_VL3__b_false = 1;
> +  int64_t res_64_SV_VL4__b_false = 1;
> +  int64_t res_64_SV_VL5__b_false = 1;
> +  int64_t res_64_SV_VL6__b_false = 1;
> +  int64_t res_64_SV_VL7__b_false = 1;
> +  int64_t res_64_SV_VL8__b_false = 1;
> +  int64_t res_64_SV_VL16__b_false = 1;
> +  uint64_t res_64_SV_VL1_u_b_false = 1;
> +  uint64_t res_64_SV_VL2_u_b_false = 1;
> +  uint64_t res_64_SV_VL3_u_b_false = 1;
> +  uint64_t res_64_SV_VL4_u_b_false = 1;
> +  uint64_t res_64_SV_VL5_u_b_false = 1;
> +  uint64_t res_64_SV_VL6_u_b_false = 1;
> +  uint64_t res_64_SV_VL7_u_b_false = 1;
> +  uint64_t res_64_SV_VL8_u_b_false = 1;
> +  uint64_t res_64_SV_VL16_u_b_false = 1;
> +
> +#undef SVELAST_DEF
> +#define SVELAST_DEF(size, pat, sign, ab, su) \
> +     if (NAME (foo, size, pat, sign, ab) \
> +          (svindex_ ## su ## size (0, 1)) != \
> +             NAME (res, size, pat, sign, ab)) \
> +       __builtin_abort (); \
> +     if (NAMEF (foo, size, pat, sign, ab) \
> +          (svindex_ ## su ## size (0, 1)) != \
> +             NAMEF (res, size, pat, sign, ab)) \
> +       __builtin_abort ();
> +
> +  ALL_POS ()
> +
> +  return 0;
> +}
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c
> new file mode 100644
> index 00000000000..f6ba7ea7d89
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c
> @@ -0,0 +1,314 @@
> +/* { dg-do run { target aarch64_sve256_hw } } */
> +/* { dg-options "-O3 -msve-vector-bits=256 -std=gnu99" } */
> +
> +#include "svlast.c"
> +
> +int
> +main (void)
> +{
> +  int8_t res_8_SV_VL1__a = 1;
> +  int8_t res_8_SV_VL2__a = 2;
> +  int8_t res_8_SV_VL3__a = 3;
> +  int8_t res_8_SV_VL4__a = 4;
> +  int8_t res_8_SV_VL5__a = 5;
> +  int8_t res_8_SV_VL6__a = 6;
> +  int8_t res_8_SV_VL7__a = 7;
> +  int8_t res_8_SV_VL8__a = 8;
> +  int8_t res_8_SV_VL16__a = 16;
> +  uint8_t res_8_SV_VL1_u_a = 1;
> +  uint8_t res_8_SV_VL2_u_a = 2;
> +  uint8_t res_8_SV_VL3_u_a = 3;
> +  uint8_t res_8_SV_VL4_u_a = 4;
> +  uint8_t res_8_SV_VL5_u_a = 5;
> +  uint8_t res_8_SV_VL6_u_a = 6;
> +  uint8_t res_8_SV_VL7_u_a = 7;
> +  uint8_t res_8_SV_VL8_u_a = 8;
> +  uint8_t res_8_SV_VL16_u_a = 16;
> +  int16_t res_16_SV_VL1__a = 1;
> +  int16_t res_16_SV_VL2__a = 2;
> +  int16_t res_16_SV_VL3__a = 3;
> +  int16_t res_16_SV_VL4__a = 4;
> +  int16_t res_16_SV_VL5__a = 5;
> +  int16_t res_16_SV_VL6__a = 6;
> +  int16_t res_16_SV_VL7__a = 7;
> +  int16_t res_16_SV_VL8__a = 8;
> +  int16_t res_16_SV_VL16__a = 0;
> +  uint16_t res_16_SV_VL1_u_a = 1;
> +  uint16_t res_16_SV_VL2_u_a = 2;
> +  uint16_t res_16_SV_VL3_u_a = 3;
> +  uint16_t res_16_SV_VL4_u_a = 4;
> +  uint16_t res_16_SV_VL5_u_a = 5;
> +  uint16_t res_16_SV_VL6_u_a = 6;
> +  uint16_t res_16_SV_VL7_u_a = 7;
> +  uint16_t res_16_SV_VL8_u_a = 8;
> +  uint16_t res_16_SV_VL16_u_a = 0;
> +  int32_t res_32_SV_VL1__a = 1;
> +  int32_t res_32_SV_VL2__a = 2;
> +  int32_t res_32_SV_VL3__a = 3;
> +  int32_t res_32_SV_VL4__a = 4;
> +  int32_t res_32_SV_VL5__a = 5;
> +  int32_t res_32_SV_VL6__a = 6;
> +  int32_t res_32_SV_VL7__a = 7;
> +  int32_t res_32_SV_VL8__a = 0;
> +  int32_t res_32_SV_VL16__a = 0;
> +  uint32_t res_32_SV_VL1_u_a = 1;
> +  uint32_t res_32_SV_VL2_u_a = 2;
> +  uint32_t res_32_SV_VL3_u_a = 3;
> +  uint32_t res_32_SV_VL4_u_a = 4;
> +  uint32_t res_32_SV_VL5_u_a = 5;
> +  uint32_t res_32_SV_VL6_u_a = 6;
> +  uint32_t res_32_SV_VL7_u_a = 7;
> +  uint32_t res_32_SV_VL8_u_a = 0;
> +  uint32_t res_32_SV_VL16_u_a = 0;
> +  int64_t res_64_SV_VL1__a = 1;
> +  int64_t res_64_SV_VL2__a = 2;
> +  int64_t res_64_SV_VL3__a = 3;
> +  int64_t res_64_SV_VL4__a = 0;
> +  int64_t res_64_SV_VL5__a = 0;
> +  int64_t res_64_SV_VL6__a = 0;
> +  int64_t res_64_SV_VL7__a = 0;
> +  int64_t res_64_SV_VL8__a = 0;
> +  int64_t res_64_SV_VL16__a = 0;
> +  uint64_t res_64_SV_VL1_u_a = 1;
> +  uint64_t res_64_SV_VL2_u_a = 2;
> +  uint64_t res_64_SV_VL3_u_a = 3;
> +  uint64_t res_64_SV_VL4_u_a = 0;
> +  uint64_t res_64_SV_VL5_u_a = 0;
> +  uint64_t res_64_SV_VL6_u_a = 0;
> +  uint64_t res_64_SV_VL7_u_a = 0;
> +  uint64_t res_64_SV_VL8_u_a = 0;
> +  uint64_t res_64_SV_VL16_u_a = 0;
> +  int8_t res_8_SV_VL1__b = 0;
> +  int8_t res_8_SV_VL2__b = 1;
> +  int8_t res_8_SV_VL3__b = 2;
> +  int8_t res_8_SV_VL4__b = 3;
> +  int8_t res_8_SV_VL5__b = 4;
> +  int8_t res_8_SV_VL6__b = 5;
> +  int8_t res_8_SV_VL7__b = 6;
> +  int8_t res_8_SV_VL8__b = 7;
> +  int8_t res_8_SV_VL16__b = 15;
> +  uint8_t res_8_SV_VL1_u_b = 0;
> +  uint8_t res_8_SV_VL2_u_b = 1;
> +  uint8_t res_8_SV_VL3_u_b = 2;
> +  uint8_t res_8_SV_VL4_u_b = 3;
> +  uint8_t res_8_SV_VL5_u_b = 4;
> +  uint8_t res_8_SV_VL6_u_b = 5;
> +  uint8_t res_8_SV_VL7_u_b = 6;
> +  uint8_t res_8_SV_VL8_u_b = 7;
> +  uint8_t res_8_SV_VL16_u_b = 15;
> +  int16_t res_16_SV_VL1__b = 0;
> +  int16_t res_16_SV_VL2__b = 1;
> +  int16_t res_16_SV_VL3__b = 2;
> +  int16_t res_16_SV_VL4__b = 3;
> +  int16_t res_16_SV_VL5__b = 4;
> +  int16_t res_16_SV_VL6__b = 5;
> +  int16_t res_16_SV_VL7__b = 6;
> +  int16_t res_16_SV_VL8__b = 7;
> +  int16_t res_16_SV_VL16__b = 15;
> +  uint16_t res_16_SV_VL1_u_b = 0;
> +  uint16_t res_16_SV_VL2_u_b = 1;
> +  uint16_t res_16_SV_VL3_u_b = 2;
> +  uint16_t res_16_SV_VL4_u_b = 3;
> +  uint16_t res_16_SV_VL5_u_b = 4;
> +  uint16_t res_16_SV_VL6_u_b = 5;
> +  uint16_t res_16_SV_VL7_u_b = 6;
> +  uint16_t res_16_SV_VL8_u_b = 7;
> +  uint16_t res_16_SV_VL16_u_b = 15;
> +  int32_t res_32_SV_VL1__b = 0;
> +  int32_t res_32_SV_VL2__b = 1;
> +  int32_t res_32_SV_VL3__b = 2;
> +  int32_t res_32_SV_VL4__b = 3;
> +  int32_t res_32_SV_VL5__b = 4;
> +  int32_t res_32_SV_VL6__b = 5;
> +  int32_t res_32_SV_VL7__b = 6;
> +  int32_t res_32_SV_VL8__b = 7;
> +  int32_t res_32_SV_VL16__b = 7;
> +  uint32_t res_32_SV_VL1_u_b = 0;
> +  uint32_t res_32_SV_VL2_u_b = 1;
> +  uint32_t res_32_SV_VL3_u_b = 2;
> +  uint32_t res_32_SV_VL4_u_b = 3;
> +  uint32_t res_32_SV_VL5_u_b = 4;
> +  uint32_t res_32_SV_VL6_u_b = 5;
> +  uint32_t res_32_SV_VL7_u_b = 6;
> +  uint32_t res_32_SV_VL8_u_b = 7;
> +  uint32_t res_32_SV_VL16_u_b = 7;
> +  int64_t res_64_SV_VL1__b = 0;
> +  int64_t res_64_SV_VL2__b = 1;
> +  int64_t res_64_SV_VL3__b = 2;
> +  int64_t res_64_SV_VL4__b = 3;
> +  int64_t res_64_SV_VL5__b = 3;
> +  int64_t res_64_SV_VL6__b = 3;
> +  int64_t res_64_SV_VL7__b = 3;
> +  int64_t res_64_SV_VL8__b = 3;
> +  int64_t res_64_SV_VL16__b = 3;
> +  uint64_t res_64_SV_VL1_u_b = 0;
> +  uint64_t res_64_SV_VL2_u_b = 1;
> +  uint64_t res_64_SV_VL3_u_b = 2;
> +  uint64_t res_64_SV_VL4_u_b = 3;
> +  uint64_t res_64_SV_VL5_u_b = 3;
> +  uint64_t res_64_SV_VL6_u_b = 3;
> +  uint64_t res_64_SV_VL7_u_b = 3;
> +  uint64_t res_64_SV_VL8_u_b = 3;
> +  uint64_t res_64_SV_VL16_u_b = 3;
> +
> +  int8_t res_8_SV_VL1__a_false = 0;
> +  int8_t res_8_SV_VL2__a_false = 0;
> +  int8_t res_8_SV_VL3__a_false = 0;
> +  int8_t res_8_SV_VL4__a_false = 0;
> +  int8_t res_8_SV_VL5__a_false = 0;
> +  int8_t res_8_SV_VL6__a_false = 0;
> +  int8_t res_8_SV_VL7__a_false = 0;
> +  int8_t res_8_SV_VL8__a_false = 0;
> +  int8_t res_8_SV_VL16__a_false = 0;
> +  uint8_t res_8_SV_VL1_u_a_false = 0;
> +  uint8_t res_8_SV_VL2_u_a_false = 0;
> +  uint8_t res_8_SV_VL3_u_a_false = 0;
> +  uint8_t res_8_SV_VL4_u_a_false = 0;
> +  uint8_t res_8_SV_VL5_u_a_false = 0;
> +  uint8_t res_8_SV_VL6_u_a_false = 0;
> +  uint8_t res_8_SV_VL7_u_a_false = 0;
> +  uint8_t res_8_SV_VL8_u_a_false = 0;
> +  uint8_t res_8_SV_VL16_u_a_false = 0;
> +  int16_t res_16_SV_VL1__a_false = 0;
> +  int16_t res_16_SV_VL2__a_false = 0;
> +  int16_t res_16_SV_VL3__a_false = 0;
> +  int16_t res_16_SV_VL4__a_false = 0;
> +  int16_t res_16_SV_VL5__a_false = 0;
> +  int16_t res_16_SV_VL6__a_false = 0;
> +  int16_t res_16_SV_VL7__a_false = 0;
> +  int16_t res_16_SV_VL8__a_false = 0;
> +  int16_t res_16_SV_VL16__a_false = 0;
> +  uint16_t res_16_SV_VL1_u_a_false = 0;
> +  uint16_t res_16_SV_VL2_u_a_false = 0;
> +  uint16_t res_16_SV_VL3_u_a_false = 0;
> +  uint16_t res_16_SV_VL4_u_a_false = 0;
> +  uint16_t res_16_SV_VL5_u_a_false = 0;
> +  uint16_t res_16_SV_VL6_u_a_false = 0;
> +  uint16_t res_16_SV_VL7_u_a_false = 0;
> +  uint16_t res_16_SV_VL8_u_a_false = 0;
> +  uint16_t res_16_SV_VL16_u_a_false = 0;
> +  int32_t res_32_SV_VL1__a_false = 0;
> +  int32_t res_32_SV_VL2__a_false = 0;
> +  int32_t res_32_SV_VL3__a_false = 0;
> +  int32_t res_32_SV_VL4__a_false = 0;
> +  int32_t res_32_SV_VL5__a_false = 0;
> +  int32_t res_32_SV_VL6__a_false = 0;
> +  int32_t res_32_SV_VL7__a_false = 0;
> +  int32_t res_32_SV_VL8__a_false = 0;
> +  int32_t res_32_SV_VL16__a_false = 0;
> +  uint32_t res_32_SV_VL1_u_a_false = 0;
> +  uint32_t res_32_SV_VL2_u_a_false = 0;
> +  uint32_t res_32_SV_VL3_u_a_false = 0;
> +  uint32_t res_32_SV_VL4_u_a_false = 0;
> +  uint32_t res_32_SV_VL5_u_a_false = 0;
> +  uint32_t res_32_SV_VL6_u_a_false = 0;
> +  uint32_t res_32_SV_VL7_u_a_false = 0;
> +  uint32_t res_32_SV_VL8_u_a_false = 0;
> +  uint32_t res_32_SV_VL16_u_a_false = 0;
> +  int64_t res_64_SV_VL1__a_false = 0;
> +  int64_t res_64_SV_VL2__a_false = 0;
> +  int64_t res_64_SV_VL3__a_false = 0;
> +  int64_t res_64_SV_VL4__a_false = 0;
> +  int64_t res_64_SV_VL5__a_false = 0;
> +  int64_t res_64_SV_VL6__a_false = 0;
> +  int64_t res_64_SV_VL7__a_false = 0;
> +  int64_t res_64_SV_VL8__a_false = 0;
> +  int64_t res_64_SV_VL16__a_false = 0;
> +  uint64_t res_64_SV_VL1_u_a_false = 0;
> +  uint64_t res_64_SV_VL2_u_a_false = 0;
> +  uint64_t res_64_SV_VL3_u_a_false = 0;
> +  uint64_t res_64_SV_VL4_u_a_false = 0;
> +  uint64_t res_64_SV_VL5_u_a_false = 0;
> +  uint64_t res_64_SV_VL6_u_a_false = 0;
> +  uint64_t res_64_SV_VL7_u_a_false = 0;
> +  uint64_t res_64_SV_VL8_u_a_false = 0;
> +  uint64_t res_64_SV_VL16_u_a_false = 0;
> +  int8_t res_8_SV_VL1__b_false = 31;
> +  int8_t res_8_SV_VL2__b_false = 31;
> +  int8_t res_8_SV_VL3__b_false = 31;
> +  int8_t res_8_SV_VL4__b_false = 31;
> +  int8_t res_8_SV_VL5__b_false = 31;
> +  int8_t res_8_SV_VL6__b_false = 31;
> +  int8_t res_8_SV_VL7__b_false = 31;
> +  int8_t res_8_SV_VL8__b_false = 31;
> +  int8_t res_8_SV_VL16__b_false = 31;
> +  uint8_t res_8_SV_VL1_u_b_false = 31;
> +  uint8_t res_8_SV_VL2_u_b_false = 31;
> +  uint8_t res_8_SV_VL3_u_b_false = 31;
> +  uint8_t res_8_SV_VL4_u_b_false = 31;
> +  uint8_t res_8_SV_VL5_u_b_false = 31;
> +  uint8_t res_8_SV_VL6_u_b_false = 31;
> +  uint8_t res_8_SV_VL7_u_b_false = 31;
> +  uint8_t res_8_SV_VL8_u_b_false = 31;
> +  uint8_t res_8_SV_VL16_u_b_false = 31;
> +  int16_t res_16_SV_VL1__b_false = 15;
> +  int16_t res_16_SV_VL2__b_false = 15;
> +  int16_t res_16_SV_VL3__b_false = 15;
> +  int16_t res_16_SV_VL4__b_false = 15;
> +  int16_t res_16_SV_VL5__b_false = 15;
> +  int16_t res_16_SV_VL6__b_false = 15;
> +  int16_t res_16_SV_VL7__b_false = 15;
> +  int16_t res_16_SV_VL8__b_false = 15;
> +  int16_t res_16_SV_VL16__b_false = 15;
> +  uint16_t res_16_SV_VL1_u_b_false = 15;
> +  uint16_t res_16_SV_VL2_u_b_false = 15;
> +  uint16_t res_16_SV_VL3_u_b_false = 15;
> +  uint16_t res_16_SV_VL4_u_b_false = 15;
> +  uint16_t res_16_SV_VL5_u_b_false = 15;
> +  uint16_t res_16_SV_VL6_u_b_false = 15;
> +  uint16_t res_16_SV_VL7_u_b_false = 15;
> +  uint16_t res_16_SV_VL8_u_b_false = 15;
> +  uint16_t res_16_SV_VL16_u_b_false = 15;
> +  int32_t res_32_SV_VL1__b_false = 7;
> +  int32_t res_32_SV_VL2__b_false = 7;
> +  int32_t res_32_SV_VL3__b_false = 7;
> +  int32_t res_32_SV_VL4__b_false = 7;
> +  int32_t res_32_SV_VL5__b_false = 7;
> +  int32_t res_32_SV_VL6__b_false = 7;
> +  int32_t res_32_SV_VL7__b_false = 7;
> +  int32_t res_32_SV_VL8__b_false = 7;
> +  int32_t res_32_SV_VL16__b_false = 7;
> +  uint32_t res_32_SV_VL1_u_b_false = 7;
> +  uint32_t res_32_SV_VL2_u_b_false = 7;
> +  uint32_t res_32_SV_VL3_u_b_false = 7;
> +  uint32_t res_32_SV_VL4_u_b_false = 7;
> +  uint32_t res_32_SV_VL5_u_b_false = 7;
> +  uint32_t res_32_SV_VL6_u_b_false = 7;
> +  uint32_t res_32_SV_VL7_u_b_false = 7;
> +  uint32_t res_32_SV_VL8_u_b_false = 7;
> +  uint32_t res_32_SV_VL16_u_b_false = 7;
> +  int64_t res_64_SV_VL1__b_false = 3;
> +  int64_t res_64_SV_VL2__b_false = 3;
> +  int64_t res_64_SV_VL3__b_false = 3;
> +  int64_t res_64_SV_VL4__b_false = 3;
> +  int64_t res_64_SV_VL5__b_false = 3;
> +  int64_t res_64_SV_VL6__b_false = 3;
> +  int64_t res_64_SV_VL7__b_false = 3;
> +  int64_t res_64_SV_VL8__b_false = 3;
> +  int64_t res_64_SV_VL16__b_false = 3;
> +  uint64_t res_64_SV_VL1_u_b_false = 3;
> +  uint64_t res_64_SV_VL2_u_b_false = 3;
> +  uint64_t res_64_SV_VL3_u_b_false = 3;
> +  uint64_t res_64_SV_VL4_u_b_false = 3;
> +  uint64_t res_64_SV_VL5_u_b_false = 3;
> +  uint64_t res_64_SV_VL6_u_b_false = 3;
> +  uint64_t res_64_SV_VL7_u_b_false = 3;
> +  uint64_t res_64_SV_VL8_u_b_false = 3;
> +  uint64_t res_64_SV_VL16_u_b_false = 3;
> +
> +
> +#undef SVELAST_DEF
> +#define SVELAST_DEF(size, pat, sign, ab, su) \
> +     if (NAME (foo, size, pat, sign, ab) \
> +          (svindex_ ## su ## size (0 ,1)) != \
> +             NAME (res, size, pat, sign, ab)) \
> +       __builtin_abort (); \
> +     if (NAMEF (foo, size, pat, sign, ab) \
> +          (svindex_ ## su ## size (0 ,1)) != \
> +             NAMEF (res, size, pat, sign, ab)) \
> +       __builtin_abort ();
> +
> +  ALL_POS ()
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4.c
> index 1e38371842f..91fdd3c202e 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4.c
> @@ -186,8 +186,6 @@ CALLER (f16, __SVFloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, all
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_1024.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_1024.c
> index 491c35af221..7d824caae1b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_1024.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_1024.c
> @@ -186,8 +186,6 @@ CALLER (f16, __SVFloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl128
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c
> index eebb913273a..e0aa3a5fa68 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c
> @@ -186,8 +186,6 @@ CALLER (f16, __SVFloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl16
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_2048.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_2048.c
> index 73c3b2ec045..3238015d9eb 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_2048.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_2048.c
> @@ -186,8 +186,6 @@ CALLER (f16, __SVFloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl256
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_256.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_256.c
> index 29744c81402..50861098934 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_256.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_256.c
> @@ -186,8 +186,6 @@ CALLER (f16, __SVFloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl32
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_512.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_512.c
> index cf25c31bcbf..300dacce955 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_512.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_512.c
> @@ -186,8 +186,6 @@ CALLER (f16, __SVFloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl64
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5.c
> index 9ad3e227654..0a840a38384 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5.c
> @@ -186,8 +186,6 @@ CALLER (f16, svfloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, all
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_1024.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_1024.c
> index d573e5fc69c..18cefbff1e6 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_1024.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_1024.c
> @@ -186,8 +186,6 @@ CALLER (f16, svfloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl128
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c
> index 200b0eb8242..c622ed55674 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c
> @@ -186,8 +186,6 @@ CALLER (f16, svfloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl16
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_2048.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_2048.c
> index f6f8858fd47..3286280687d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_2048.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_2048.c
> @@ -186,8 +186,6 @@ CALLER (f16, svfloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl256
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_256.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_256.c
> index e62f59cc885..3c6afa2fdf1 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_256.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_256.c
> @@ -186,8 +186,6 @@ CALLER (f16, svfloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl32
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_512.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_512.c
> index 483558cb576..bb7d3ebf9d4 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_512.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_512.c
> @@ -186,8 +186,6 @@ CALLER (f16, svfloat16_t)
>  ** caller_bf16:
>  **   ...
>  **   bl      callee_bf16
> -**   ptrue   (p[0-7])\.b, vl64
> -**   lasta   h0, \1, z0\.h
>  **   ldp     x29, x30, \[sp\], 16
>  **   ret
>  */

Reply via email to