> -----Original Message-----
> From: Yury Khrustalev <yury.khrusta...@arm.com>
> Sent: Wednesday, July 16, 2025 1:14 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina
> <tamar.christ...@arm.com>; Mark Rutland <mark.rutl...@arm.com>
> Subject: [PATCH v2] aarch64: Adapt unwinder to linux's SME signal behaviour
> 
> From: Richard Sandiford <richard.sandif...@arm.com>
> 
> SME uses a lazy save system to manage ZA.  The idea is that,
> if a function with ZA state wants to call a "normal" function,
> it can leave its state in ZA and instead set up a lazy save buffer.
> If, unexpectedly, that normal function contains a nested use of ZA,
> that nested use of ZA must commit the lazy save first.
> 
> This lazy save system uses a special system register called TPIDR2_EL0.
> See:
> 
>   https://github.com/ARM-software/abi-
> aa/blob/main/aapcs64/aapcs64.rst#66the-za-lazy-saving-scheme
> 
> for details.
> 
> The ABI specifies that, on entry to an exception handler, the following
> things must be true:
> 
> * PSTATE.SM must be 0 (the processor must be in non-streaming mode)
> 
> * PSTATE.ZA must be 0 (ZA must be off)
> 
> * TPIDR2_EL0 must be 0 (there must be no uncommitted lazy save)
> 
> This is normally done by making _Unwind_RaiseException & friends
> commit any lazy save before they unwind.  This also has the side
> effect of ensuring that TPIDR2_EL0 is never left pointing to a
> lazy save buffer that has been unwound.
> 
> However, things get more complicated with signals.  If:
> 
> (a) a signal is raised while ZA is dormant (that is, while there is an
>     uncommitted lazy save);
> 
> (b) the signal handler throws an exception; and
> 
> (c) that exception is caught outside the signal handler
> 
> something must ensure that the lazy save from (a) is committed.
> 
> This would be simple if the signal handler was entered with ZA and
> TPIDR2_EL0 intact.  However, for various good reasons that are out
> of scope here, this is not done.  Instead, Linux now clears both
> TPIDR2_EL0 and PSTATE.ZA before entering a signal handler, see:
> 
>   https://lore.kernel.org/all/20250417190113.3778111-1-
> mark.rutl...@arm.com/
> 
> for details.
> 
> Therefore, it is the unwinder that must simulate a commit of the lazy
> save from (a).  It can do this by reading the previous values of
> TPIDR2_EL0 and ZA from the sigcontext.
> 
> The SME-related sigcontext structures were only added to linux's
> asm/sigcontext.h relatively recently and we can't rely on GCC being
> built against such recent kernel header files.  The patch therefore uses
> defines relevant macros if they are not defined and provide types that
> comply with ABI layout of the corresponding linux types.
> 
> The patch includes some ugly casting in an attempt to support big-endian
> ILP32, even though SME on big-endian ILP32 linux should never be a thing.
> We can remove it if we also remove ILP32 support from GCC.
> 
> Co-authored-by: Yury Khrustalev <yury.khrusta...@arm.com>
> 
> gcc/
>       * doc/sourcebuild.texi (aarch64_sme_hw): Document.
> 
> gcc/testsuite/
>       * lib/target-supports.exp (add_options_for_aarch64_sme)
>       (check_effective_target_aarch64_sme_hw): New procedures.
>       * g++.target/aarch64/sme/sme_throw_1.C: New test.
>       * g++.target/aarch64/sme/sme_throw_2.C: Likewise.
> 
> libgcc/
>       * config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
>       If a signal was raised while there was an uncommitted lazy save,
>       commit the save as part of the unwind process.
> 
> ---
> 
> base commit: 2ae2203da59
> 
> The MAGIC constants and type definitions are used under the copyright
> exception granted in the Linux-syscall-note.
> 
> Changes in v2:
>  - Added ifdef guards for code related to ILP32
>  - Fixed code style as per GNU guidelines.
>  - v1: https://inbox.sourceware.org/gcc-patches/20250619133948.3104505-1-
> yury.khrusta...@arm.com/
> 

OK thanks, Also ok for backport after a bit of stew on trunk.

Thanks,
Tamar

> ---
>  gcc/doc/sourcebuild.texi                      |   3 +
>  .../g++.target/aarch64/sme/sme_throw_1.C      |  55 +++++++++
>  .../g++.target/aarch64/sme/sme_throw_2.C      |   4 +
>  gcc/testsuite/lib/target-supports.exp         |  23 ++++
>  libgcc/config/aarch64/linux-unwind.h          | 108 +++++++++++++++++-
>  5 files changed, 192 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.target/aarch64/sme/sme_throw_1.C
>  create mode 100644 gcc/testsuite/g++.target/aarch64/sme/sme_throw_2.C
> 
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 85fb810d96c..a9193040b37 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2379,6 +2379,9 @@ whether it does so by default).
>  @item aarch64_sve2p1_hw
>  AArch64 target that is able to generate and execute SVE2.1 code (regardless 
> of
>  whether it does so by default).
> +@item aarch64_sme_hw
> +AArch64 target that is able to generate and execute SME code (regardless of
> +whether it does so by default).
> 
>  @item aarch64_fjcvtzs_hw
>  AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
> diff --git a/gcc/testsuite/g++.target/aarch64/sme/sme_throw_1.C
> b/gcc/testsuite/g++.target/aarch64/sme/sme_throw_1.C
> new file mode 100644
> index 00000000000..76f1e8b8ee7
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/sme/sme_throw_1.C
> @@ -0,0 +1,55 @@
> +/* { dg-do run { target { aarch64*-linux-gnu* && aarch64_sme_hw } } } */
> +
> +#include <signal.h>
> +#include <arm_sme.h>
> +
> +static bool caught;
> +
> +[[gnu::noipa]] void thrower(int)
> +{
> +  throw 1;
> +}
> +
> +[[gnu::noipa]] void bar()
> +{
> +  *(volatile int *)0 = 0;
> +}
> +
> +[[gnu::noipa]] void foo()
> +{
> +  try
> +    {
> +      bar();
> +    }
> +  catch (int)
> +    {
> +      caught = true;
> +    }
> +}
> +
> +__arm_new("za") __arm_locally_streaming void sme_user()
> +{
> +  svbool_t all = svptrue_b8();
> +  for (unsigned int i = 0; i < svcntb(); ++i)
> +    {
> +      svint8_t expected = svindex_s8(i + 1, i);
> +      svwrite_hor_za8_m(0, i, all, expected);
> +    }
> +  foo();
> +  for (unsigned int i = 0; i < svcntb(); ++i)
> +    {
> +      svint8_t expected = svindex_s8(i + 1, i);
> +      svint8_t actual = svread_hor_za8_m(svdup_s8(0), all, 0, i);
> +      if (svptest_any(all, svcmpne(all, expected, actual)))
> +     __builtin_abort();
> +    }
> +  if (!caught)
> +    __builtin_abort();
> +}
> +
> +int main()
> +{
> +  signal(SIGSEGV, thrower);
> +  sme_user();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/g++.target/aarch64/sme/sme_throw_2.C
> b/gcc/testsuite/g++.target/aarch64/sme/sme_throw_2.C
> new file mode 100644
> index 00000000000..db3197c7c07
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/sme/sme_throw_2.C
> @@ -0,0 +1,4 @@
> +/* { dg-do run { target { aarch64*-linux-gnu* && aarch64_sme_hw } } } */
> +/* { dg-options "-O2" } */
> +
> +#include "sme_throw_1.C"
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-
> supports.exp
> index 4486a6ac99b..65d2e67a85b 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -5800,6 +5800,13 @@ proc add_options_for_aarch64_sve { flags } {
>      return "$flags -march=armv8.2-a+sve"
>  }
> 
> +proc add_options_for_aarch64_sme { flags } {
> +    if { ![istarget aarch64*-*-*] || [check_effective_target_aarch64_sme] } {
> +     return "$flags"
> +    }
> +    return "$flags -march=armv9-a+sme"
> +}
> +
>  # Return 1 if this is an ARM target supporting the FP16 alternative
>  # format.  Some multilibs may be incompatible with the options needed.  Also
>  # set et_arm_fp16_alternative_flags to the best options to add.
> @@ -6539,6 +6546,22 @@ foreach N { 128 256 512 1024 2048 } {
>      }]
>  }
> 
> +# Return true if this is an AArch64 target that can run SME code.
> +
> +proc check_effective_target_aarch64_sme_hw { } {
> +    if { ![istarget aarch64*-*-*] } {
> +     return 0
> +    }
> +    return [check_runtime aarch64_sme_hw_available {
> +     int
> +     main (void)
> +     {
> +       asm volatile ("rdsvl x0, #1");
> +       return 0;
> +     }
> +    } [add_options_for_aarch64_sme ""]]
> +}
> +
>  proc check_effective_target_arm_neonv2_hw { } {
>      return [check_runtime arm_neon_hwv2_available {
>       #include "arm_neon.h"
> diff --git a/libgcc/config/aarch64/linux-unwind.h 
> b/libgcc/config/aarch64/linux-
> unwind.h
> index e41ca6a6a6e..f5b73a0777f 100644
> --- a/libgcc/config/aarch64/linux-unwind.h
> +++ b/libgcc/config/aarch64/linux-unwind.h
> @@ -27,7 +27,7 @@
> 
>  #include <signal.h>
>  #include <sys/ucontext.h>
> -
> +#include <stdint.h>
> 
>  /* Since insns are always stored LE, on a BE system the opcodes will
>     be loaded byte-reversed.  Therefore, define two sets of opcodes,
> @@ -43,6 +43,22 @@
> 
>  #define MD_FALLBACK_FRAME_STATE_FOR aarch64_fallback_frame_state
> 
> +#ifndef FPSIMD_MAGIC
> +#define FPSIMD_MAGIC 0x46508001
> +#endif
> +
> +#ifndef TPIDR2_MAGIC
> +#define TPIDR2_MAGIC 0x54504902
> +#endif
> +
> +#ifndef ZA_MAGIC
> +#define ZA_MAGIC 0x54366345
> +#endif
> +
> +#ifndef EXTRA_MAGIC
> +#define EXTRA_MAGIC 0x45585401
> +#endif
> +
>  static _Unwind_Reason_Code
>  aarch64_fallback_frame_state (struct _Unwind_Context *context,
>                             _Unwind_FrameState * fs)
> @@ -58,6 +74,21 @@ aarch64_fallback_frame_state (struct _Unwind_Context
> *context,
>      ucontext_t uc;
>    };
> 
> +  struct tpidr2_block
> +  {
> +    uint64_t za_save_buffer;
> +    uint16_t num_za_save_slices;
> +    uint8_t reserved[6];
> +  };
> +
> +  struct za_block
> +  {
> +    struct _aarch64_ctx head;
> +    uint16_t vl;
> +    uint16_t reserved[3];
> +    uint64_t data;
> +  };
> +
>    struct rt_sigframe *rt_;
>    _Unwind_Ptr new_cfa;
>    unsigned *pc = context->ra;
> @@ -103,11 +134,15 @@ aarch64_fallback_frame_state (struct
> _Unwind_Context *context,
>       field can be used to skip over unrecognized context extensions.
>       The end of the context sequence is marked by a context with magic
>       0 or size 0.  */
> +  struct tpidr2_block *tpidr2 = 0;
> +  struct za_block *za_ctx = 0;
> +
>    for (extension_marker = (struct _aarch64_ctx *) &sc->__reserved;
>         extension_marker->magic;
>         extension_marker = (struct _aarch64_ctx *)
>         ((unsigned char *) extension_marker + extension_marker->size))
>      {
> +    restart:
>        if (extension_marker->magic == FPSIMD_MAGIC)
>       {
>         struct fpsimd_context *ctx =
> @@ -139,12 +174,83 @@ aarch64_fallback_frame_state (struct
> _Unwind_Context *context,
>             fs->regs.reg[AARCH64_DWARF_V0 + i].loc.offset = offset;
>           }
>       }
> +      else if (extension_marker->magic == TPIDR2_MAGIC)
> +     {
> +       /* A TPIDR2 context.
> +
> +          All the casting is to support big-endian ILP32.  We could read
> +          directly into TPIDR2 otherwise.  */
> +       struct { struct _aarch64_ctx h; uint64_t tpidr2; } *ctx
> +               = (void *)extension_marker;
> +#if defined (__ILP32__)
> +       tpidr2 = (struct tpidr2_block *) (uintptr_t) ctx->tpidr2;
> +#else
> +       tpidr2 = (struct tpidr2_block *) ctx->tpidr2;
> +#endif
> +     }
> +      else if (extension_marker->magic == ZA_MAGIC)
> +     /* A ZA context.  We interpret this later.  */
> +     za_ctx = (void *)extension_marker;
> +      else if (extension_marker->magic == EXTRA_MAGIC)
> +     {
> +       /* Extra context.  The ABI guarantees that the next _aarch64_ctx
> +          in the current list will be the zero terminator, so we can simply
> +          switch to the new list and continue from there.  The new list is
> +          also zero-terminated.
> +
> +          As above, the casting is to support big-endian ILP32.  */
> +       struct { struct _aarch64_ctx h; uint64_t next; } *ctx
> +               = (void *)extension_marker;
> +#if defined (__ILP32__)
> +       extension_marker = (struct _aarch64_ctx *) (uintptr_t) ctx->next;
> +#else
> +       extension_marker = (struct _aarch64_ctx *) ctx->next;
> +#endif
> +       goto restart;
> +     }
>        else
>       {
>         /* There is context provided that we do not recognize!  */
>       }
>      }
> 
> +  /* Signal handlers are entered with ZA in the off state (TPIDR2_ELO==0 and
> +     PSTATE.ZA==0).  The normal process when transitioning from ZA being
> +     dormant to ZA being off is to commit the lazy save; see the AAPCS64
> +     for details.  However, this is not done when entering a signal handler.
> +     Instead, linux saves the old contents of ZA and TPIDR2_EL0 to the
> +     sigcontext without interpreting them further.
> +
> +     Therefore, if a signal handler throws an exception to code outside the
> +     signal handler, the unwinder must commit the lazy save after the fact.
> +     Committing a lazy save means:
> +
> +     (1) Storing the contents of ZA into the buffer provided by TPIDR2_EL0.
> +     (2) Setting TPIDR2_EL0 to zero.
> +     (3) Turning ZA off.
> +
> +     (2) and (3) have already been done by the call to 
> __libgcc_arm_za_disable.
> +     (1) involves copying data from the ZA sigcontext entry to the
> +     corresponding lazy save buffer.  */
> +  if (tpidr2 && za_ctx && tpidr2->za_save_buffer)
> +    {
> +      /* There is a 16-bit vector length (measured in bytes) at ZA_CTX + 8.
> +      The data itself starts at ZA_CTX + 16.
> +      As above, the casting is to support big-endian ILP32.  */
> +      uint16_t vl = za_ctx->vl;
> +#if defined (__ILP32__)
> +      void *save_buffer = (void *) (uintptr_t) tpidr2->za_save_buffer;
> +      const void *za_buffer = (void *) (uintptr_t) &za_ctx->data;
> +#else
> +      void *save_buffer = (void *) tpidr2->za_save_buffer;
> +      const void *za_buffer = (void *) &za_ctx->data;
> +#endif
> +      uint64_t num_slices = tpidr2->num_za_save_slices;
> +      if (num_slices > vl)
> +     num_slices = vl;
> +      memcpy (save_buffer, za_buffer, num_slices * vl);
> +    }
> +
>    fs->regs.how[31] = REG_SAVED_OFFSET;
>    fs->regs.reg[31].loc.offset = (_Unwind_Ptr) & (sc->sp) - new_cfa;
> 
> --
> 2.39.5

Reply via email to