This patch adds support for switching to the appropriate SME mode
for each call.  Switching to streaming mode requires an SMSTART SM
instruction and switching to non-streaming mode requires an SMSTOP SM
instruction.  If the call is being made from streaming-compatible code,
these switches are conditional on the current mode being the opposite
of the one that the call needs.

Since changing PSTATE.SM changes the vector length and effectively
changes the ISA, the code to do the switching has to be emitted late.
The patch does this using a new pass that runs next to late prologue/
epilogue insertion.  (It doesn't use md_reorg because later additions
need the CFG.)

If a streaming-compatible function needs to switch mode for a call,
it must restore the original mode afterwards.  The old mode must
therefore be available immediately after the call.  The easiest
way of ensuring this is to force the use of a hard frame pointer
and ensure that the old state is saved at an in-range offset
from there.

Changing modes clobbers the Z and P registers, so we need to
save and restore live Z and P state around each mode switch.
However, mode switches are not expected to be performance
critical, so it seemed better to err on the side of being
correct rather than trying to optimise the save and restore
with surrounding code.

gcc/
        * config/aarch64/aarch64-passes.def
        (pass_late_thread_prologue_and_epilogue): New pass.
        * config/aarch64/aarch64-sme.md: New file.
        * config/aarch64/aarch64.md: Include it.
        (*tb<optab><mode>1): Rename to...
        (@aarch64_tb<optab><mode>): ...this.
        (call, call_value, sibcall, sibcall_value): Don't require operand 2
        to be a CONST_INT.
        * config/aarch64/aarch64-protos.h (aarch64_emit_call_insn): Return
        the insn.
        (make_pass_switch_sm_state): Declare.
        * config/aarch64/aarch64.h (TARGET_STREAMING_COMPATIBLE): New macro.
        (CALL_USED_REGISTER): Mark VG as call-preserved.
        (aarch64_frame::old_svcr_offset): New member variable.
        (machine_function::call_switches_sm_state): Likewise.
        (CUMULATIVE_ARGS::num_sme_mode_switch_args): Likewise.
        (CUMULATIVE_ARGS::sme_mode_switch_args): Likewise.
        * config/aarch64/aarch64.cc: Include tree-pass.h and cfgbuild.h.
        (aarch64_cfun_incoming_pstate_sm): New function.
        (aarch64_call_switches_pstate_sm): Likewise.
        (aarch64_reg_save_mode): Return DImode for VG_REGNUM.
        (aarch64_callee_isa_mode): New function.
        (aarch64_insn_callee_isa_mode): Likewise.
        (aarch64_guard_switch_pstate_sm): Likewise.
        (aarch64_switch_pstate_sm): Likewise.
        (aarch64_sme_mode_switch_regs): New class.
        (aarch64_record_sme_mode_switch_args): New function.
        (aarch64_finish_sme_mode_switch_args): Likewise.
        (aarch64_function_arg): Handle the end marker by returning a
        PARALLEL that contains the ABI cookie that we used previously
        alongside the result of aarch64_finish_sme_mode_switch_args.
        (aarch64_init_cumulative_args): Initialize num_sme_mode_switch_args.
        (aarch64_function_arg_advance): If a call would switch SM state,
        record all argument registers that would need to be saved around
        the mode switch.
        (aarch64_need_old_pstate_sm): New function.
        (aarch64_layout_frame): Decide whether the frame needs to store the
        incoming value of PSTATE.SM and allocate a save slot for it if so.
        If a function switches SME state, arrange to save the old value
        of the DWARF VG register.  Handle the case where this is the only
        register save slot above the FP.
        (aarch64_save_callee_saves): Handles saves of the DWARF VG register.
        (aarch64_get_separate_components): Prevent such saves from being
        shrink-wrapped.
        (aarch64_old_svcr_mem): New function.
        (aarch64_read_old_svcr): Likewise.
        (aarch64_guard_switch_pstate_sm): Likewise.
        (aarch64_expand_prologue): Handle saves of the DWARF VG register.
        Initialize any SVCR save slot.
        (aarch64_expand_call): Allow the cookie to be PARALLEL that contains
        both the UNSPEC_CALLEE_ABI value and a list of registers that need
        to be preserved across a change to PSTATE.SM.  If the call does
        involve such a change to PSTATE.SM, record the registers that
        would be clobbered by this process.  Also emit an instruction
        to mark the temporary change in VG.  Update call_switches_pstate_sm.
        (aarch64_emit_call_insn): Return the emitted instruction.
        (aarch64_frame_pointer_required): New function.
        (aarch64_conditional_register_usage): Prevent VG_REGNUM from being
        treated as a register operand.
        (aarch64_switch_pstate_sm_for_call): New function.
        (pass_data_switch_pstate_sm): New pass variable.
        (pass_switch_pstate_sm): New pass class.
        (make_pass_switch_pstate_sm): New function.
        (TARGET_FRAME_POINTER_REQUIRED): Define.
        * config/aarch64/t-aarch64 (s-check-sve-md): Add aarch64-sme.md.

gcc/testsuite/
        * gcc.target/aarch64/sme/call_sm_switch_1.c: New test.
        * gcc.target/aarch64/sme/call_sm_switch_2.c: Likewise.
        * gcc.target/aarch64/sme/call_sm_switch_3.c: Likewise.
        * gcc.target/aarch64/sme/call_sm_switch_4.c: Likewise.
        * gcc.target/aarch64/sme/call_sm_switch_5.c: Likewise.
        * gcc.target/aarch64/sme/call_sm_switch_6.c: Likewise.
        * gcc.target/aarch64/sme/call_sm_switch_7.c: Likewise.
        * gcc.target/aarch64/sme/call_sm_switch_8.c: Likewise.
        * gcc.target/aarch64/sme/call_sm_switch_9.c: Likewise.
        * gcc.target/aarch64/sme/call_sm_switch_10.c: Likewise.
---
 gcc/config/aarch64/aarch64-passes.def         |   1 +
 gcc/config/aarch64/aarch64-protos.h           |   3 +-
 gcc/config/aarch64/aarch64-sme.md             | 171 ++++
 gcc/config/aarch64/aarch64.cc                 | 883 +++++++++++++++++-
 gcc/config/aarch64/aarch64.h                  |  25 +-
 gcc/config/aarch64/aarch64.md                 |  13 +-
 gcc/config/aarch64/t-aarch64                  |   5 +-
 .../gcc.target/aarch64/sme/call_sm_switch_1.c | 233 +++++
 .../aarch64/sme/call_sm_switch_10.c           |  37 +
 .../gcc.target/aarch64/sme/call_sm_switch_2.c |  43 +
 .../gcc.target/aarch64/sme/call_sm_switch_3.c | 166 ++++
 .../gcc.target/aarch64/sme/call_sm_switch_4.c |  43 +
 .../gcc.target/aarch64/sme/call_sm_switch_5.c | 318 +++++++
 .../gcc.target/aarch64/sme/call_sm_switch_6.c |  45 +
 .../gcc.target/aarch64/sme/call_sm_switch_7.c | 516 ++++++++++
 .../gcc.target/aarch64/sme/call_sm_switch_8.c |  87 ++
 .../gcc.target/aarch64/sme/call_sm_switch_9.c | 103 ++
 17 files changed, 2668 insertions(+), 24 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-sme.md
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c

diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 6ace797b738..662a13fd5e6 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -20,6 +20,7 @@
 
 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
 INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
+INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
pass_switch_pstate_sm);
 INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
 INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
 INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index abc94e482af..d3a2c693f85 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -913,7 +913,7 @@ void aarch64_sve_expand_vector_init (rtx, rtx);
 void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
                                   const_tree, unsigned, bool = false);
 void aarch64_init_expanders (void);
-void aarch64_emit_call_insn (rtx);
+rtx_call_insn *aarch64_emit_call_insn (rtx);
 void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
@@ -1054,6 +1054,7 @@ rtl_opt_pass *make_pass_track_speculation (gcc::context 
*);
 rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *);
 rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
 rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt);
+rtl_opt_pass *make_pass_switch_pstate_sm (gcc::context *ctxt);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
new file mode 100644
index 00000000000..52427b4f17a
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -0,0 +1,171 @@
+;; Machine description for AArch64 SME.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; The file is organised into the following sections (search for the full
+;; line):
+;;
+;; == State management
+;; ---- Test current state
+;; ---- PSTATE.SM management
+
+;; =========================================================================
+;; == State management
+;; =========================================================================
+;;
+;; Many of the instructions in this section are only valid when SME is
+;; present.  However, they don't have a TARGET_SME condition since
+;; (a) they are only emitted under direct control of aarch64 code and
+;; (b) they are sometimes used conditionally, particularly in streaming-
+;; compatible code.
+;;
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- Test current state
+;; -------------------------------------------------------------------------
+
+(define_c_enum "unspec" [
+  UNSPEC_OLD_VG_SAVED
+  UNSPEC_UPDATE_VG
+  UNSPEC_GET_SME_STATE
+  UNSPEC_READ_SVCR
+])
+
+;; A marker instruction to say that the old value of the DWARF VG register
+;; has been saved to the stack, for CFI purposes.  Operand 0 is the old
+;; value of the register and operand 1 is the save slot.
+(define_insn "aarch64_old_vg_saved"
+  [(set (reg:DI VG_REGNUM)
+       (unspec:DI [(match_operand 0)
+                   (match_operand 1)] UNSPEC_OLD_VG_SAVED))]
+  ""
+  ""
+  [(set_attr "type" "no_insn")]
+)
+
+;; A marker to indicate places where a call temporarily changes VG.
+(define_insn "aarch64_update_vg"
+  [(set (reg:DI VG_REGNUM)
+       (unspec:DI [(reg:DI VG_REGNUM)] UNSPEC_UPDATE_VG))]
+  ""
+  ""
+  [(set_attr "type" "no_insn")]
+)
+
+(define_insn "aarch64_get_sme_state"
+  [(set (reg:TI R0_REGNUM)
+       (unspec_volatile:TI [(const_int 0)] UNSPEC_GET_SME_STATE))
+   (clobber (reg:DI R16_REGNUM))
+   (clobber (reg:DI R17_REGNUM))
+   (clobber (reg:DI R18_REGNUM))
+   (clobber (reg:DI R30_REGNUM))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "bl\t__arm_sme_state"
+)
+
+(define_insn "aarch64_read_svcr"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+       (unspec_volatile:DI [(const_int 0)] UNSPEC_READ_SVCR))]
+  ""
+  "mrs\t%0, svcr"
+)
+
+;; -------------------------------------------------------------------------
+;; ---- PSTATE.SM management
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - SMSTART SM
+;; - SMSTOP SM
+;; -------------------------------------------------------------------------
+
+(define_c_enum "unspec" [
+  UNSPEC_SMSTART_SM
+  UNSPEC_SMSTOP_SM
+])
+
+;; Turn on streaming mode.  This clobbers all SVE state.
+;;
+;; Depend on VG_REGNUM to ensure that the VG save slot has already been
+;; initialized.
+(define_insn "aarch64_smstart_sm"
+  [(unspec_volatile [(const_int 0)] UNSPEC_SMSTART_SM)
+   (use (reg:DI VG_REGNUM))
+   (clobber (reg:V4x16QI V0_REGNUM))
+   (clobber (reg:V4x16QI V4_REGNUM))
+   (clobber (reg:V4x16QI V8_REGNUM))
+   (clobber (reg:V4x16QI V12_REGNUM))
+   (clobber (reg:V4x16QI V16_REGNUM))
+   (clobber (reg:V4x16QI V20_REGNUM))
+   (clobber (reg:V4x16QI V24_REGNUM))
+   (clobber (reg:V4x16QI V28_REGNUM))
+   (clobber (reg:VNx16BI P0_REGNUM))
+   (clobber (reg:VNx16BI P1_REGNUM))
+   (clobber (reg:VNx16BI P2_REGNUM))
+   (clobber (reg:VNx16BI P3_REGNUM))
+   (clobber (reg:VNx16BI P4_REGNUM))
+   (clobber (reg:VNx16BI P5_REGNUM))
+   (clobber (reg:VNx16BI P6_REGNUM))
+   (clobber (reg:VNx16BI P7_REGNUM))
+   (clobber (reg:VNx16BI P8_REGNUM))
+   (clobber (reg:VNx16BI P9_REGNUM))
+   (clobber (reg:VNx16BI P10_REGNUM))
+   (clobber (reg:VNx16BI P11_REGNUM))
+   (clobber (reg:VNx16BI P12_REGNUM))
+   (clobber (reg:VNx16BI P13_REGNUM))
+   (clobber (reg:VNx16BI P14_REGNUM))
+   (clobber (reg:VNx16BI P15_REGNUM))]
+  ""
+  "smstart\tsm"
+)
+
+;; Turn off streaming mode.  This clobbers all SVE state.
+;;
+;; Depend on VG_REGNUM to ensure that the VG save slot has already been
+;; initialized.
+(define_insn "aarch64_smstop_sm"
+  [(unspec_volatile [(const_int 0)] UNSPEC_SMSTOP_SM)
+   (use (reg:DI VG_REGNUM))
+   (clobber (reg:V4x16QI V0_REGNUM))
+   (clobber (reg:V4x16QI V4_REGNUM))
+   (clobber (reg:V4x16QI V8_REGNUM))
+   (clobber (reg:V4x16QI V12_REGNUM))
+   (clobber (reg:V4x16QI V16_REGNUM))
+   (clobber (reg:V4x16QI V20_REGNUM))
+   (clobber (reg:V4x16QI V24_REGNUM))
+   (clobber (reg:V4x16QI V28_REGNUM))
+   (clobber (reg:VNx16BI P0_REGNUM))
+   (clobber (reg:VNx16BI P1_REGNUM))
+   (clobber (reg:VNx16BI P2_REGNUM))
+   (clobber (reg:VNx16BI P3_REGNUM))
+   (clobber (reg:VNx16BI P4_REGNUM))
+   (clobber (reg:VNx16BI P5_REGNUM))
+   (clobber (reg:VNx16BI P6_REGNUM))
+   (clobber (reg:VNx16BI P7_REGNUM))
+   (clobber (reg:VNx16BI P8_REGNUM))
+   (clobber (reg:VNx16BI P9_REGNUM))
+   (clobber (reg:VNx16BI P10_REGNUM))
+   (clobber (reg:VNx16BI P11_REGNUM))
+   (clobber (reg:VNx16BI P12_REGNUM))
+   (clobber (reg:VNx16BI P13_REGNUM))
+   (clobber (reg:VNx16BI P14_REGNUM))
+   (clobber (reg:VNx16BI P15_REGNUM))]
+  ""
+  "smstop\tsm"
+)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index af9f3876532..6d5e9056c65 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -85,6 +85,8 @@
 #include "config/arm/aarch-common.h"
 #include "config/arm/aarch-common-protos.h"
 #include "ssa.h"
+#include "tree-pass.h"
+#include "cfgbuild.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -4165,6 +4167,26 @@ aarch64_fndecl_isa_mode (const_tree fndecl)
   return aarch64_fndecl_pstate_sm (fndecl);
 }
 
+/* Return the state of PSTATE.SM on entry to the current function.
+   This might be different from the state of PSTATE.SM in the function
+   body.  */
+
+static aarch64_feature_flags
+aarch64_cfun_incoming_pstate_sm ()
+{
+  return aarch64_fntype_pstate_sm (TREE_TYPE (cfun->decl));
+}
+
+/* Return true if a call from the current function to a function with
+   ISA mode CALLEE_MODE would involve a change to PSTATE.SM around
+   the BL instruction.  */
+
+static bool
+aarch64_call_switches_pstate_sm (aarch64_feature_flags callee_mode)
+{
+  return (callee_mode & ~AARCH64_ISA_MODE & AARCH64_FL_SM_STATE) != 0;
+}
+
 /* Implement TARGET_COMPATIBLE_VECTOR_TYPES_P.  */
 
 static bool
@@ -4188,7 +4210,7 @@ aarch64_emit_cfi_for_reg_p (unsigned int regno)
 static machine_mode
 aarch64_reg_save_mode (unsigned int regno)
 {
-  if (GP_REGNUM_P (regno))
+  if (GP_REGNUM_P (regno) || regno == VG_REGNUM)
     return DImode;
 
   if (FP_REGNUM_P (regno))
@@ -4247,6 +4269,16 @@ aarch64_callee_abi (rtx cookie)
   return function_abis[UINTVAL (cookie) >> AARCH64_NUM_ISA_MODES];
 }
 
+/* COOKIE is a CONST_INT from an UNSPEC_CALLEE_ABI rtx.  Return the
+   required ISA mode on entry to the callee, which is also the ISA
+   mode on return from the callee.  */
+
+static aarch64_feature_flags
+aarch64_callee_isa_mode (rtx cookie)
+{
+  return UINTVAL (cookie) & AARCH64_FL_ISA_MODES;
+}
+
 /* INSN is a call instruction.  Return the CONST_INT stored in its
    UNSPEC_CALLEE_ABI rtx.  */
 
@@ -4269,6 +4301,15 @@ aarch64_insn_callee_abi (const rtx_insn *insn)
   return aarch64_callee_abi (aarch64_insn_callee_cookie (insn));
 }
 
+/* INSN is a call instruction.  Return the required ISA mode on entry to
+   the callee, which is also the ISA mode on return from the callee.  */
+
+static aarch64_feature_flags
+aarch64_insn_callee_isa_mode (const rtx_insn *insn)
+{
+  return aarch64_callee_isa_mode (aarch64_insn_callee_cookie (insn));
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
    the lower 64 bits of a 128-bit register.  Tell the compiler the callee
    clobbers the top 64 bits when restoring the bottom 64 bits.  */
@@ -6482,6 +6523,437 @@ aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, 
bool frame_related_p,
                      temp1, temp2, frame_related_p, emit_move_imm);
 }
 
+/* A streaming-compatible function needs to switch temporarily to the known
+   PSTATE.SM mode described by LOCAL_MODE.  The low bit of OLD_SVCR contains
+   the runtime state of PSTATE.SM in the streaming-compatible code, before
+   the start of the switch to LOCAL_MODE.
+
+   Emit instructions to branch around the mode switch if PSTATE.SM already
+   matches LOCAL_MODE.  Return the label that the branch jumps to.  */
+
+static rtx_insn *
+aarch64_guard_switch_pstate_sm (rtx old_svcr, aarch64_feature_flags local_mode)
+{
+  local_mode &= AARCH64_FL_SM_STATE;
+  gcc_assert (local_mode != 0);
+  auto already_ok_cond = (local_mode & AARCH64_FL_SM_ON ? NE : EQ);
+  auto *label = gen_label_rtx ();
+  auto *jump = emit_jump_insn (gen_aarch64_tb (already_ok_cond, DImode, DImode,
+                                              old_svcr, const0_rtx, label));
+  JUMP_LABEL (jump) = label;
+  return label;
+}
+
+/* Emit code to switch from the PSTATE.SM state in OLD_MODE to the PSTATE.SM
+   state in NEW_MODE.  This is known to involve either an SMSTART SM or
+   an SMSTOP SM.  */
+
+static void
+aarch64_switch_pstate_sm (aarch64_feature_flags old_mode,
+                         aarch64_feature_flags new_mode)
+{
+  old_mode &= AARCH64_FL_SM_STATE;
+  new_mode &= AARCH64_FL_SM_STATE;
+  gcc_assert (old_mode != new_mode);
+
+  if ((new_mode & AARCH64_FL_SM_ON)
+      || (new_mode == 0 && (old_mode & AARCH64_FL_SM_OFF)))
+    emit_insn (gen_aarch64_smstart_sm ());
+  else
+    emit_insn (gen_aarch64_smstop_sm ());
+}
+
+/* As a side-effect, SMSTART SM and SMSTOP SM clobber the contents of all
+   FP and predicate registers.  This class emits code to preserve any
+   necessary registers around the mode switch.
+
+   The class uses four approaches to saving and restoring contents, enumerated
+   by group_type:
+
+   - GPR: save and restore the contents of FP registers using GPRs.
+     This is used if the FP register contains no more than 64 significant
+     bits.  The registers used are FIRST_GPR onwards.
+
+   - MEM_128: save and restore 128-bit SIMD registers using memory.
+
+   - MEM_SVE_PRED: save and restore full SVE predicate registers using memory.
+
+   - MEM_SVE_DATA: save and restore full SVE vector registers using memory.
+
+   The save slots within each memory group are consecutive, with the
+   MEM_SVE_PRED slots occupying a region below the MEM_SVE_DATA slots.
+
+   There will only be two mode switches for each use of SME, so they should
+   not be particularly performance-sensitive.  It's also rare for SIMD, SVE
+   or predicate registers to be live across mode switches.  We therefore
+   don't preallocate the save slots but instead allocate them locally on
+   demand.  This makes the code emitted by the class self-contained.  */
+
+class aarch64_sme_mode_switch_regs
+{
+public:
+  static const unsigned int FIRST_GPR = R10_REGNUM;
+
+  void add_reg (machine_mode, unsigned int);
+  void add_call_args (rtx_call_insn *);
+  void add_call_result (rtx_call_insn *);
+
+  void emit_prologue ();
+  void emit_epilogue ();
+
+  /* The number of GPRs needed to save FP registers, starting from
+     FIRST_GPR.  */
+  unsigned int num_gprs () { return m_group_count[GPR]; }
+
+private:
+  enum sequence { PROLOGUE, EPILOGUE };
+  enum group_type { GPR, MEM_128, MEM_SVE_PRED, MEM_SVE_DATA, NUM_GROUPS };
+
+  /* Information about the save location for one FP, SIMD, SVE data, or
+     SVE predicate register.  */
+  struct save_location {
+    /* The register to be saved.  */
+    rtx reg;
+
+    /* Which group the save location belongs to.  */
+    group_type group;
+
+    /* A zero-based index of the register within the group.  */
+    unsigned int index;
+  };
+
+  unsigned int sve_data_headroom ();
+  rtx get_slot_mem (machine_mode, poly_int64);
+  void emit_stack_adjust (sequence, poly_int64);
+  void emit_mem_move (sequence, const save_location &, poly_int64);
+
+  void emit_gpr_moves (sequence);
+  void emit_mem_128_moves (sequence);
+  void emit_sve_sp_adjust (sequence);
+  void emit_sve_pred_moves (sequence);
+  void emit_sve_data_moves (sequence);
+
+  /* All save locations, in no particular order.  */
+  auto_vec<save_location, 12> m_save_locations;
+
+  /* The number of registers in each group.  */
+  unsigned int m_group_count[NUM_GROUPS] = {};
+};
+
+/* Record that (reg:MODE REGNO) needs to be preserved around the mode
+   switch.  */
+
+void
+aarch64_sme_mode_switch_regs::add_reg (machine_mode mode, unsigned int regno)
+{
+  if (!FP_REGNUM_P (regno) && !PR_REGNUM_P (regno))
+    return;
+
+  unsigned int end_regno = end_hard_regno (mode, regno);
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  gcc_assert ((vec_flags & VEC_STRUCT) || end_regno == regno + 1);
+  for (; regno < end_regno; regno++)
+    {
+      machine_mode submode = mode;
+      if (vec_flags & VEC_STRUCT)
+       {
+         if (vec_flags & VEC_SVE_DATA)
+           submode = SVE_BYTE_MODE;
+         else if (vec_flags & VEC_PARTIAL)
+           submode = V8QImode;
+         else
+           submode = V16QImode;
+       }
+      save_location loc;
+      loc.reg = gen_rtx_REG (submode, regno);
+      if (vec_flags == VEC_SVE_PRED)
+       {
+         gcc_assert (PR_REGNUM_P (regno));
+         loc.group = MEM_SVE_PRED;
+       }
+      else
+       {
+         gcc_assert (FP_REGNUM_P (regno));
+         if (known_le (GET_MODE_SIZE (submode), 8))
+           loc.group = GPR;
+         else if (known_eq (GET_MODE_SIZE (submode), 16))
+           loc.group = MEM_128;
+         else
+           loc.group = MEM_SVE_DATA;
+       }
+      loc.index = m_group_count[loc.group]++;
+      m_save_locations.quick_push (loc);
+    }
+}
+
+/* Record that the arguments to CALL_INSN need to be preserved around
+   the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::add_call_args (rtx_call_insn *call_insn)
+{
+  for (rtx node = CALL_INSN_FUNCTION_USAGE (call_insn);
+       node; node = XEXP (node, 1))
+    {
+      rtx item = XEXP (node, 0);
+      if (GET_CODE (item) != USE)
+       continue;
+      item = XEXP (item, 0);
+      if (!REG_P (item))
+       continue;
+      add_reg (GET_MODE (item), REGNO (item));
+    }
+}
+
+/* Record that the return value from CALL_INSN (if any) needs to be
+   preserved around the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::add_call_result (rtx_call_insn *call_insn)
+{
+  rtx pat = PATTERN (call_insn);
+  gcc_assert (GET_CODE (pat) == PARALLEL);
+  pat = XVECEXP (pat, 0, 0);
+  if (GET_CODE (pat) == CALL)
+    return;
+  rtx dest = SET_DEST (pat);
+  if (GET_CODE (dest) == PARALLEL)
+    for (int i = 0; i < XVECLEN (dest, 0); ++i)
+      {
+       rtx x = XVECEXP (dest, 0, i);
+       gcc_assert (GET_CODE (x) == EXPR_LIST);
+       rtx reg = XEXP (x, 0);
+       add_reg (GET_MODE (reg), REGNO (reg));
+      }
+  else
+    add_reg (GET_MODE (dest), REGNO (dest));
+}
+
+/* Emit code to save registers before the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_prologue ()
+{
+  emit_sve_sp_adjust (PROLOGUE);
+  emit_sve_pred_moves (PROLOGUE);
+  emit_sve_data_moves (PROLOGUE);
+  emit_mem_128_moves (PROLOGUE);
+  emit_gpr_moves (PROLOGUE);
+}
+
+/* Emit code to restore registers after the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_epilogue ()
+{
+  emit_gpr_moves (EPILOGUE);
+  emit_mem_128_moves (EPILOGUE);
+  emit_sve_pred_moves (EPILOGUE);
+  emit_sve_data_moves (EPILOGUE);
+  emit_sve_sp_adjust (EPILOGUE);
+}
+
+/* The SVE predicate registers are stored below the SVE data registers,
+   with the predicate save area being padded to a data-register-sized
+   boundary.  Return the size of this padded area as a whole number
+   of data register slots.  */
+
+unsigned int
+aarch64_sme_mode_switch_regs::sve_data_headroom ()
+{
+  return CEIL (m_group_count[MEM_SVE_PRED], 8);
+}
+
+/* Return a memory reference of mode MODE to OFFSET bytes from the
+   stack pointer.  */
+
+rtx
+aarch64_sme_mode_switch_regs::get_slot_mem (machine_mode mode,
+                                           poly_int64 offset)
+{
+  rtx addr = plus_constant (Pmode, stack_pointer_rtx, offset);
+  return gen_rtx_MEM (mode, addr);
+}
+
+/* Allocate or deallocate SIZE bytes of stack space: SEQ decides which.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_stack_adjust (sequence seq,
+                                                poly_int64 size)
+{
+  if (seq == PROLOGUE)
+    size = -size;
+  emit_insn (gen_rtx_SET (stack_pointer_rtx,
+                         plus_constant (Pmode, stack_pointer_rtx, size)));
+}
+
+/* Save or restore the register in LOC, whose slot is OFFSET bytes from
+   the stack pointer.  SEQ chooses between saving and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_mem_move (sequence seq,
+                                            const save_location &loc,
+                                            poly_int64 offset)
+{
+  rtx mem = get_slot_mem (GET_MODE (loc.reg), offset);
+  if (seq == PROLOGUE)
+    emit_move_insn (mem, loc.reg);
+  else
+    emit_move_insn (loc.reg, mem);
+}
+
+/* Emit instructions to save or restore the GPR group.  SEQ chooses between
+   saving and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_gpr_moves (sequence seq)
+{
+  for (auto &loc : m_save_locations)
+    if (loc.group == GPR)
+      {
+       gcc_assert (loc.index < 8);
+       rtx gpr = gen_rtx_REG (GET_MODE (loc.reg), FIRST_GPR + loc.index);
+       if (seq == PROLOGUE)
+         emit_move_insn (gpr, loc.reg);
+       else
+         emit_move_insn (loc.reg, gpr);
+      }
+}
+
+/* Emit instructions to save or restore the MEM_128 group.  SEQ chooses
+   between saving and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_mem_128_moves (sequence seq)
+{
+  HOST_WIDE_INT count = m_group_count[MEM_128];
+  if (count == 0)
+    return;
+
+  auto sp = stack_pointer_rtx;
+  auto sp_adjust = (seq == PROLOGUE ? -count : count) * 16;
+
+  /* Pick a common mode that supports LDR & STR with pre/post-modification
+     and LDP & STP with pre/post-modification.  */
+  auto mode = TFmode;
+
+  /* An instruction pattern that should be emitted at the end.  */
+  rtx last_pat = NULL_RTX;
+
+  /* A previous MEM_128 location that hasn't been handled yet.  */
+  save_location *prev_loc = nullptr;
+
+  /* Look for LDP/STPs and record any leftover LDR/STR in PREV_LOC.  */
+  for (auto &loc : m_save_locations)
+    if (loc.group == MEM_128)
+      {
+       if (!prev_loc)
+         {
+           prev_loc = &loc;
+           continue;
+         }
+       gcc_assert (loc.index == prev_loc->index + 1);
+
+       /* The offset of the base of the save area from the current
+          stack pointer.  */
+       HOST_WIDE_INT bias = 0;
+       if (prev_loc->index == 0 && seq == PROLOGUE)
+         bias = sp_adjust;
+
+       /* Get the two sets in the LDP/STP.  */
+       rtx ops[] = {
+         gen_rtx_REG (mode, REGNO (prev_loc->reg)),
+         get_slot_mem (mode, prev_loc->index * 16 + bias),
+         gen_rtx_REG (mode, REGNO (loc.reg)),
+         get_slot_mem (mode, loc.index * 16 + bias)
+       };
+       unsigned int lhs = (seq == PROLOGUE);
+       rtx set1 = gen_rtx_SET (ops[lhs], ops[1 - lhs]);
+       rtx set2 = gen_rtx_SET (ops[lhs + 2], ops[3 - lhs]);
+
+       /* Combine the sets with any stack allocation/deallocation.  */
+       rtvec vec;
+       if (prev_loc->index == 0)
+         {
+           rtx plus_sp = plus_constant (Pmode, sp, sp_adjust);
+           vec = gen_rtvec (3, gen_rtx_SET (sp, plus_sp), set1, set2);
+         }
+       else
+         vec = gen_rtvec (2, set1, set2);
+       rtx pat = gen_rtx_PARALLEL (VOIDmode, vec);
+
+       /* Queue a deallocation to the end, otherwise emit the
+          instruction now.  */
+       if (seq == EPILOGUE && prev_loc->index == 0)
+         last_pat = pat;
+       else
+         emit_insn (pat);
+       prev_loc = nullptr;
+      }
+
+  /* Handle any leftover LDR/STR.  */
+  if (prev_loc)
+    {
+      rtx reg = gen_rtx_REG (mode, REGNO (prev_loc->reg));
+      rtx addr;
+      if (prev_loc->index != 0)
+       addr = plus_constant (Pmode, sp, prev_loc->index * 16);
+      else if (seq == PROLOGUE)
+       {
+         rtx allocate = plus_constant (Pmode, sp, -count * 16);
+         addr = gen_rtx_PRE_MODIFY (Pmode, sp, allocate);
+       }
+      else
+       {
+         rtx deallocate = plus_constant (Pmode, sp, count * 16);
+         addr = gen_rtx_POST_MODIFY (Pmode, sp, deallocate);
+       }
+      rtx mem = gen_rtx_MEM (mode, addr);
+      if (seq == PROLOGUE)
+       emit_move_insn (mem, reg);
+      else
+       emit_move_insn (reg, mem);
+    }
+
+  if (last_pat)
+    emit_insn (last_pat);
+}
+
+/* Allocate or deallocate the stack space needed by the SVE groups.
+   SEQ chooses between allocating and deallocating.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_sve_sp_adjust (sequence seq)
+{
+  if (unsigned int count = m_group_count[MEM_SVE_DATA] + sve_data_headroom ())
+    emit_stack_adjust (seq, count * BYTES_PER_SVE_VECTOR);
+}
+
+/* Save or restore the MEM_SVE_DATA group.  SEQ chooses between saving
+   and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_sve_data_moves (sequence seq)
+{
+  for (auto &loc : m_save_locations)
+    if (loc.group == MEM_SVE_DATA)
+      {
+       auto index = loc.index + sve_data_headroom ();
+       emit_mem_move (seq, loc, index * BYTES_PER_SVE_VECTOR);
+      }
+}
+
+/* Save or restore the MEM_SVE_PRED group.  SEQ chooses between saving
+   and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_sve_pred_moves (sequence seq)
+{
+  for (auto &loc : m_save_locations)
+    if (loc.group == MEM_SVE_PRED)
+      emit_mem_move (seq, loc, loc.index * BYTES_PER_SVE_PRED);
+}
+
 /* Set DEST to (vec_series BASE STEP).  */
 
 static void
@@ -8180,6 +8652,40 @@ on_stack:
   return;
 }
 
+/* Add the current argument register to the set of those that need
+   to be saved and restored around a change to PSTATE.SM.  */
+
+static void
+aarch64_record_sme_mode_switch_args (CUMULATIVE_ARGS *pcum)
+{
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, pcum->aapcs_reg, NONCONST)
+    {
+      rtx x = *iter;
+      if (REG_P (x) && (FP_REGNUM_P (REGNO (x)) || PR_REGNUM_P (REGNO (x))))
+       {
+         unsigned int i = pcum->num_sme_mode_switch_args++;
+         gcc_assert (i < ARRAY_SIZE (pcum->sme_mode_switch_args));
+         pcum->sme_mode_switch_args[i] = x;
+       }
+    }
+}
+
+/* Return a parallel that contains all the registers that need to be
+   saved around a change to PSTATE.SM.  Return const0_rtx if there is
+   no such mode switch, or if no registers need to be saved.  */
+
+static rtx
+aarch64_finish_sme_mode_switch_args (CUMULATIVE_ARGS *pcum)
+{
+  if (!pcum->num_sme_mode_switch_args)
+    return const0_rtx;
+
+  auto argvec = gen_rtvec_v (pcum->num_sme_mode_switch_args,
+                            pcum->sme_mode_switch_args);
+  return gen_rtx_PARALLEL (VOIDmode, argvec);
+}
+
 /* Implement TARGET_FUNCTION_ARG.  */
 
 static rtx
@@ -8191,7 +8697,13 @@ aarch64_function_arg (cumulative_args_t pcum_v, const 
function_arg_info &arg)
              || pcum->pcs_variant == ARM_PCS_SVE);
 
   if (arg.end_marker_p ())
-    return aarch64_gen_callee_cookie (pcum->isa_mode, pcum->pcs_variant);
+    {
+      rtx abi_cookie = aarch64_gen_callee_cookie (pcum->isa_mode,
+                                                 pcum->pcs_variant);
+      rtx sme_mode_switch_args = aarch64_finish_sme_mode_switch_args (pcum);
+      return gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, abi_cookie,
+                                                   sme_mode_switch_args));
+    }
 
   aarch64_layout_arg (pcum_v, arg);
   return pcum->aapcs_reg;
@@ -8226,6 +8738,7 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
   pcum->aapcs_stack_words = 0;
   pcum->aapcs_stack_size = 0;
   pcum->silent_p = silent_p;
+  pcum->num_sme_mode_switch_args = 0;
 
   if (!silent_p
       && !TARGET_FLOAT
@@ -8266,6 +8779,10 @@ aarch64_function_arg_advance (cumulative_args_t pcum_v,
       aarch64_layout_arg (pcum_v, arg);
       gcc_assert ((pcum->aapcs_reg != NULL_RTX)
                  != (pcum->aapcs_stack_words != 0));
+      if (pcum->aapcs_reg
+         && aarch64_call_switches_pstate_sm (pcum->isa_mode))
+       aarch64_record_sme_mode_switch_args (pcum);
+
       pcum->aapcs_arg_processed = false;
       pcum->aapcs_ncrn = pcum->aapcs_nextncrn;
       pcum->aapcs_nvrn = pcum->aapcs_nextnvrn;
@@ -8720,6 +9237,30 @@ aarch64_save_regs_above_locals_p ()
   return crtl->stack_protect_guard;
 }
 
+/* Return true if the current function needs to record the incoming
+   value of PSTATE.SM.  */
+static bool
+aarch64_need_old_pstate_sm ()
+{
+  /* Exit early if the incoming value of PSTATE.SM is known at
+     compile time.  */
+  if (aarch64_cfun_incoming_pstate_sm () != 0)
+    return false;
+
+  if (cfun->machine->call_switches_pstate_sm)
+    for (auto insn = get_insns (); insn; insn = NEXT_INSN (insn))
+      if (auto *call = dyn_cast<rtx_call_insn *> (insn))
+       if (!SIBLING_CALL_P (call))
+         {
+           /* Return true if there is a call to a non-streaming-compatible
+              function.  */
+           auto callee_isa_mode = aarch64_insn_callee_isa_mode (call);
+           if (aarch64_call_switches_pstate_sm (callee_isa_mode))
+             return true;
+         }
+  return false;
+}
+
 /* Mark the registers that need to be saved by the callee and calculate
    the size of the callee-saved registers area and frame record (both FP
    and LR may be omitted).  */
@@ -8753,6 +9294,7 @@ aarch64_layout_frame (void)
   /* First mark all the registers that really need to be saved...  */
   for (regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
     frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
+  frame.old_svcr_offset = SLOT_NOT_REQUIRED;
 
   /* ... that includes the eh data registers (if needed)...  */
   if (crtl->calls_eh_return)
@@ -8905,6 +9447,21 @@ aarch64_layout_frame (void)
     if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
       allocate_gpr_slot (regno);
 
+  if (aarch64_need_old_pstate_sm ())
+    {
+      frame.old_svcr_offset = offset;
+      offset += UNITS_PER_WORD;
+    }
+
+  /* If the current function changes the SVE vector length, ensure that the
+     old value of the DWARF VG register is saved and available in the CFI,
+     so that outer frames with VL-sized offsets can be processed correctly.  */
+  if (cfun->machine->call_switches_pstate_sm)
+    {
+      frame.reg_offset[VG_REGNUM] = offset;
+      offset += UNITS_PER_WORD;
+    }
+
   poly_int64 max_int_offset = offset;
   offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
   bool has_align_gap = maybe_ne (offset, max_int_offset);
@@ -8942,8 +9499,6 @@ aarch64_layout_frame (void)
       if (push_regs.size () > 1)
        frame.wb_push_candidate2 = push_regs[1];
     }
-  else
-    gcc_assert (known_eq (saved_regs_size, below_hard_fp_saved_regs_size));
 
   /* With stack-clash, a register must be saved in non-leaf functions.
      The saving of the bottommost register counts as an implicit probe,
@@ -9051,7 +9606,8 @@ aarch64_layout_frame (void)
       frame.initial_adjust = frame.frame_size - frame.bytes_below_saved_regs;
       frame.final_adjust = frame.bytes_below_saved_regs;
     }
-  else if (frame.bytes_above_hard_fp.is_constant (&const_above_fp)
+  else if (frame.wb_push_candidate1 != INVALID_REGNUM
+          && frame.bytes_above_hard_fp.is_constant (&const_above_fp)
           && const_above_fp < max_push_offset)
     {
       /* Frame with large area below the saved registers, or with SVE saves,
@@ -9486,7 +10042,13 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp,
 
       machine_mode mode = aarch64_reg_save_mode (regno);
       rtx reg = gen_rtx_REG (mode, regno);
+      rtx move_src = reg;
       offset = frame.reg_offset[regno] - bytes_below_sp;
+      if (regno == VG_REGNUM)
+       {
+         move_src = gen_rtx_REG (DImode, IP0_REGNUM);
+         emit_move_insn (move_src, gen_int_mode (aarch64_sve_vg, DImode));
+       }
       rtx base_rtx = stack_pointer_rtx;
       poly_int64 sp_offset = offset;
 
@@ -9494,7 +10056,7 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp,
       if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
        aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg,
                                             offset, ptrue);
-      else if (GP_REGNUM_P (regno)
+      else if (GP_REGNUM_P (REGNO (reg))
               && (!offset.is_constant (&const_offset) || const_offset >= 512))
        {
          poly_int64 fp_offset = frame.bytes_below_hard_fp - bytes_below_sp;
@@ -9517,6 +10079,7 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp,
 
       unsigned int regno2;
       if (!aarch64_sve_mode_p (mode)
+         && reg == move_src
          && i + 1 < regs.size ()
          && (regno2 = regs[i + 1], !skip_save_p (regno2))
          && known_eq (GET_MODE_SIZE (mode),
@@ -9548,17 +10111,24 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp,
        }
       else if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
        {
-         insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, reg));
+         insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, move_src));
          need_cfa_note_p = true;
        }
       else if (aarch64_sve_mode_p (mode))
-       insn = emit_insn (gen_rtx_SET (mem, reg));
+       insn = emit_insn (gen_rtx_SET (mem, move_src));
       else
-       insn = emit_move_insn (mem, reg);
+       insn = emit_move_insn (mem, move_src);
 
       RTX_FRAME_RELATED_P (insn) = frame_related_p;
       if (frame_related_p && need_cfa_note_p)
        aarch64_add_cfa_expression (insn, reg, stack_pointer_rtx, sp_offset);
+      else if (frame_related_p && move_src != reg)
+       add_reg_note (insn, REG_FRAME_RELATED_EXPR, gen_rtx_SET (mem, reg));
+
+      /* Emit a fake instruction to indicate that the VG save slot has
+        been initialized.  */
+      if (regno == VG_REGNUM)
+       emit_insn (gen_aarch64_old_vg_saved (move_src, mem));
     }
 }
 
@@ -9781,6 +10351,10 @@ aarch64_get_separate_components (void)
        bitmap_clear_bit (components, frame.hard_fp_save_and_probe);
     }
 
+  /* The VG save sequence needs a temporary GPR.  Punt for now on trying
+     to find one.  */
+  bitmap_clear_bit (components, VG_REGNUM);
+
   return components;
 }
 
@@ -10276,6 +10850,47 @@ aarch64_epilogue_uses (int regno)
   return 0;
 }
 
+/* The current function's frame has a save slot for the incoming state
+   of SVCR.  Return a legitimate memory for the slot, based on the hard
+   frame pointer.  */
+
+static rtx
+aarch64_old_svcr_mem ()
+{
+  gcc_assert (frame_pointer_needed
+             && known_ge (cfun->machine->frame.old_svcr_offset, 0));
+  rtx base = hard_frame_pointer_rtx;
+  poly_int64 offset = (0
+                      /* hard fp -> bottom of frame.  */
+                      - cfun->machine->frame.bytes_below_hard_fp
+                      /* bottom of frame -> save slot.  */
+                      + cfun->machine->frame.old_svcr_offset);
+  return gen_frame_mem (DImode, plus_constant (Pmode, base, offset));
+}
+
+/* The current function's frame has a save slot for the incoming state
+   of SVCR.  Load the slot into register REGNO and return the register.  */
+
+static rtx
+aarch64_read_old_svcr (unsigned int regno)
+{
+  rtx svcr = gen_rtx_REG (DImode, regno);
+  emit_move_insn (svcr, aarch64_old_svcr_mem ());
+  return svcr;
+}
+
+/* Like the rtx version of aarch64_guard_switch_pstate_sm, but first
+   load the incoming value of SVCR from its save slot into temporary
+   register REGNO.  */
+
+static rtx_insn *
+aarch64_guard_switch_pstate_sm (unsigned int regno,
+                               aarch64_feature_flags local_mode)
+{
+  rtx old_svcr = aarch64_read_old_svcr (regno);
+  return aarch64_guard_switch_pstate_sm (old_svcr, local_mode);
+}
+
 /* AArch64 stack frames generated by this compiler look like:
 
        +-------------------------------+
@@ -10490,6 +11105,12 @@ aarch64_expand_prologue (void)
 
   aarch64_save_callee_saves (bytes_below_sp, frame.saved_gprs, true,
                             emit_frame_chain);
+  if (maybe_ge (frame.reg_offset[VG_REGNUM], 0))
+    {
+      unsigned int saved_regs[] = { VG_REGNUM };
+      aarch64_save_callee_saves (bytes_below_sp, saved_regs, true,
+                                emit_frame_chain);
+    }
   if (maybe_ne (sve_callee_adjust, 0))
     {
       gcc_assert (!flag_stack_clash_protection
@@ -10511,6 +11132,40 @@ aarch64_expand_prologue (void)
                                          !frame_pointer_needed, true);
   if (emit_frame_chain && maybe_ne (final_adjust, 0))
     aarch64_emit_stack_tie (hard_frame_pointer_rtx);
+
+  /* Save the incoming value of PSTATE.SM, if required.  */
+  if (known_ge (frame.old_svcr_offset, 0))
+    {
+      rtx mem = aarch64_old_svcr_mem ();
+      MEM_VOLATILE_P (mem) = 1;
+      if (TARGET_SME)
+       {
+         rtx reg = gen_rtx_REG (DImode, IP0_REGNUM);
+         emit_insn (gen_aarch64_read_svcr (reg));
+         emit_move_insn (mem, reg);
+       }
+      else
+       {
+         rtx old_r0 = NULL_RTX, old_r1 = NULL_RTX;
+         auto &args = crtl->args.info;
+         if (args.aapcs_ncrn > 0)
+           {
+             old_r0 = gen_rtx_REG (DImode, PROBE_STACK_FIRST_REGNUM);
+             emit_move_insn (old_r0, gen_rtx_REG (DImode, R0_REGNUM));
+           }
+         if (args.aapcs_ncrn > 1)
+           {
+             old_r1 = gen_rtx_REG (DImode, PROBE_STACK_SECOND_REGNUM);
+             emit_move_insn (old_r1, gen_rtx_REG (DImode, R1_REGNUM));
+           }
+         emit_insn (gen_aarch64_get_sme_state ());
+         emit_move_insn (mem, gen_rtx_REG (DImode, R0_REGNUM));
+         if (old_r0)
+           emit_move_insn (gen_rtx_REG (DImode, R0_REGNUM), old_r0);
+         if (old_r1)
+           emit_move_insn (gen_rtx_REG (DImode, R1_REGNUM), old_r1);
+       }
+    }
 }
 
 /* Return TRUE if we can use a simple_return insn.
@@ -11758,17 +12413,33 @@ aarch64_start_call_args (cumulative_args_t ca_v)
    RESULT is the register in which the result is returned.  It's NULL for
    "call" and "sibcall".
    MEM is the location of the function call.
-   CALLEE_ABI is a const_int that gives the arm_pcs of the callee.
+   COOKIE is either:
+     - a const_int that gives the argument to the call's UNSPEC_CALLEE_ABI.
+     - a PARALLEL that contains such a const_int as its first element.
+       The second element is a PARALLEL that lists all the argument
+       registers that need to be saved and restored around a change
+       in PSTATE.SM, or const0_rtx if no such switch is needed.
    SIBCALL indicates whether this function call is normal call or sibling call.
    It will generate different pattern accordingly.  */
 
 void
-aarch64_expand_call (rtx result, rtx mem, rtx callee_abi, bool sibcall)
+aarch64_expand_call (rtx result, rtx mem, rtx cookie, bool sibcall)
 {
   rtx call, callee, tmp;
   rtvec vec;
   machine_mode mode;
 
+  rtx callee_abi = cookie;
+  rtx sme_mode_switch_args = const0_rtx;
+  if (GET_CODE (cookie) == PARALLEL)
+    {
+      callee_abi = XVECEXP (cookie, 0, 0);
+      sme_mode_switch_args = XVECEXP (cookie, 0, 1);
+    }
+
+  gcc_assert (CONST_INT_P (callee_abi));
+  auto callee_isa_mode = aarch64_callee_isa_mode (callee_abi);
+
   gcc_assert (MEM_P (mem));
   callee = XEXP (mem, 0);
   mode = GET_MODE (callee);
@@ -11793,26 +12464,75 @@ aarch64_expand_call (rtx result, rtx mem, rtx 
callee_abi, bool sibcall)
   else
     tmp = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNUM));
 
-  gcc_assert (CONST_INT_P (callee_abi));
   callee_abi = gen_rtx_UNSPEC (DImode, gen_rtvec (1, callee_abi),
                               UNSPEC_CALLEE_ABI);
 
   vec = gen_rtvec (3, call, callee_abi, tmp);
   call = gen_rtx_PARALLEL (VOIDmode, vec);
 
-  aarch64_emit_call_insn (call);
+  auto call_insn = aarch64_emit_call_insn (call);
+
+  /* Check whether the call requires a change to PSTATE.SM.  We can't
+     emit the instructions to change PSTATE.SM yet, since they involve
+     a change in vector length and a change in instruction set, which
+     cannot be represented in RTL.
+
+     For now, just record which registers will be clobbered and used
+     by the changes to PSTATE.SM.  */
+  if (!sibcall && aarch64_call_switches_pstate_sm (callee_isa_mode))
+    {
+      aarch64_sme_mode_switch_regs args_switch;
+      if (sme_mode_switch_args != const0_rtx)
+       {
+         unsigned int num_args = XVECLEN (sme_mode_switch_args, 0);
+         for (unsigned int i = 0; i < num_args; ++i)
+           {
+             rtx x = XVECEXP (sme_mode_switch_args, 0, i);
+             args_switch.add_reg (GET_MODE (x), REGNO (x));
+           }
+       }
+
+      aarch64_sme_mode_switch_regs result_switch;
+      if (result)
+       result_switch.add_call_result (call_insn);
+
+      unsigned int num_gprs = MAX (args_switch.num_gprs (),
+                                  result_switch.num_gprs ());
+      for (unsigned int i = 0; i < num_gprs; ++i)
+       clobber_reg (&CALL_INSN_FUNCTION_USAGE (call_insn),
+                    gen_rtx_REG (DImode, args_switch.FIRST_GPR + i));
+
+      for (int regno = V0_REGNUM; regno < V0_REGNUM + 32; regno += 4)
+       clobber_reg (&CALL_INSN_FUNCTION_USAGE (call_insn),
+                    gen_rtx_REG (V4x16QImode, regno));
+
+      for (int regno = P0_REGNUM; regno < P0_REGNUM + 16; regno += 1)
+       clobber_reg (&CALL_INSN_FUNCTION_USAGE (call_insn),
+                    gen_rtx_REG (VNx16BImode, regno));
+
+      /* Ensure that the VG save slot has been initialized.  Also emit
+        an instruction to model the effect of the temporary clobber
+        of VG, so that the prologue/epilogue pass sees the need to
+        save the old value.  */
+      use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn),
+              gen_rtx_REG (DImode, VG_REGNUM));
+      emit_insn_before (gen_aarch64_update_vg (), call_insn);
+
+      cfun->machine->call_switches_pstate_sm = true;
+    }
 }
 
 /* Emit call insn with PAT and do aarch64-specific handling.  */
 
-void
+rtx_call_insn *
 aarch64_emit_call_insn (rtx pat)
 {
-  rtx insn = emit_call_insn (pat);
+  auto insn = emit_call_insn (pat);
 
   rtx *fusage = &CALL_INSN_FUNCTION_USAGE (insn);
   clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
   clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));
+  return as_a<rtx_call_insn *> (insn);
 }
 
 machine_mode
@@ -13224,6 +13944,16 @@ aarch64_secondary_memory_needed (machine_mode mode, 
reg_class_t class1,
   return false;
 }
 
+/* Implement TARGET_FRAME_POINTER_REQUIRED.  */
+
+static bool
+aarch64_frame_pointer_required ()
+{
+  /* If the function needs to record the incoming value of PSTATE.SM,
+     make sure that the slot is accessible from the frame pointer.  */
+  return aarch64_need_old_pstate_sm ();
+}
+
 static bool
 aarch64_can_eliminate (const int from ATTRIBUTE_UNUSED, const int to)
 {
@@ -20805,7 +21535,8 @@ aarch64_conditional_register_usage (void)
        call_used_regs[i] = 1;
       }
 
-  /* Only allow the FFR and FFRT to be accessed via special patterns.  */
+  /* Only allow these registers to be accessed via special patterns.  */
+  CLEAR_HARD_REG_BIT (operand_reg_set, VG_REGNUM);
   CLEAR_HARD_REG_BIT (operand_reg_set, FFR_REGNUM);
   CLEAR_HARD_REG_BIT (operand_reg_set, FFRT_REGNUM);
 
@@ -28376,6 +29107,123 @@ aarch64_pars_overlap_p (rtx par1, rtx par2)
   return false;
 }
 
+/* If CALL involves a change in PSTATE.SM, emit the instructions needed
+   to switch to the new mode and the instructions needed to restore the
+   original mode.  Return true if something changed.  */
+static bool
+aarch64_switch_pstate_sm_for_call (rtx_call_insn *call)
+{
+  /* Mode switches for sibling calls are handled via the epilogue.  */
+  if (SIBLING_CALL_P (call))
+    return false;
+
+  auto callee_isa_mode = aarch64_insn_callee_isa_mode (call);
+  if (!aarch64_call_switches_pstate_sm (callee_isa_mode))
+    return false;
+
+  /* Switch mode before the call, preserving any argument registers
+     across the switch.  */
+  start_sequence ();
+  rtx_insn *args_guard_label = nullptr;
+  if (TARGET_STREAMING_COMPATIBLE)
+    args_guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM,
+                                                      callee_isa_mode);
+  aarch64_sme_mode_switch_regs args_switch;
+  args_switch.add_call_args (call);
+  args_switch.emit_prologue ();
+  aarch64_switch_pstate_sm (AARCH64_ISA_MODE, callee_isa_mode);
+  args_switch.emit_epilogue ();
+  if (args_guard_label)
+    emit_label (args_guard_label);
+  auto args_seq = get_insns ();
+  end_sequence ();
+  emit_insn_before (args_seq, call);
+
+  if (find_reg_note (call, REG_NORETURN, NULL_RTX))
+    return true;
+
+  /* Switch mode after the call, preserving any return registers across
+     the switch.  */
+  start_sequence ();
+  rtx_insn *return_guard_label = nullptr;
+  if (TARGET_STREAMING_COMPATIBLE)
+    return_guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM,
+                                                        callee_isa_mode);
+  aarch64_sme_mode_switch_regs return_switch;
+  return_switch.add_call_result (call);
+  return_switch.emit_prologue ();
+  aarch64_switch_pstate_sm (callee_isa_mode, AARCH64_ISA_MODE);
+  return_switch.emit_epilogue ();
+  if (return_guard_label)
+    emit_label (return_guard_label);
+  auto result_seq = get_insns ();
+  end_sequence ();
+  emit_insn_after (result_seq, call);
+  return true;
+}
+
+namespace {
+
+const pass_data pass_data_switch_pstate_sm =
+{
+  RTL_PASS, // type
+  "smstarts", // name
+  OPTGROUP_NONE, // optinfo_flags
+  TV_NONE, // tv_id
+  0, // properties_required
+  0, // properties_provided
+  0, // properties_destroyed
+  0, // todo_flags_start
+  TODO_df_finish, // todo_flags_finish
+};
+
+class pass_switch_pstate_sm : public rtl_opt_pass
+{
+public:
+  pass_switch_pstate_sm (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_switch_pstate_sm, ctxt)
+  {}
+
+  // opt_pass methods:
+  bool gate (function *) override final;
+  unsigned int execute (function *) override final;
+};
+
+bool
+pass_switch_pstate_sm::gate (function *)
+{
+  return cfun->machine->call_switches_pstate_sm;
+}
+
+/* Emit any instructions needed to switch PSTATE.SM.  */
+unsigned int
+pass_switch_pstate_sm::execute (function *fn)
+{
+  basic_block bb;
+
+  auto_sbitmap blocks (last_basic_block_for_fn (cfun));
+  bitmap_clear (blocks);
+  FOR_EACH_BB_FN (bb, fn)
+    {
+      rtx_insn *insn;
+      FOR_BB_INSNS (bb, insn)
+       if (auto *call = dyn_cast<rtx_call_insn *> (insn))
+         if (aarch64_switch_pstate_sm_for_call (call))
+           bitmap_set_bit (blocks, bb->index);
+    }
+  find_many_sub_basic_blocks (blocks);
+  clear_aux_for_blocks ();
+  return 0;
+}
+
+}
+
+rtl_opt_pass *
+make_pass_switch_pstate_sm (gcc::context *ctxt)
+{
+  return new pass_switch_pstate_sm (ctxt);
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -28563,6 +29411,9 @@ aarch64_run_selftests (void)
 #undef TARGET_CALLEE_COPIES
 #define TARGET_CALLEE_COPIES hook_bool_CUMULATIVE_ARGS_arg_info_false
 
+#undef TARGET_FRAME_POINTER_REQUIRED
+#define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required
+
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE aarch64_can_eliminate
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 0ea8b2d3524..693acde7eb9 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -256,6 +256,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 /* The current function is a normal non-streaming function.  */
 #define TARGET_NON_STREAMING (AARCH64_ISA_SM_OFF)
 
+/* The current function has a streaming-compatible body.  */
+#define TARGET_STREAMING_COMPATIBLE \
+  ((aarch64_isa_flags & AARCH64_FL_SM_STATE) == 0)
+
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (AARCH64_ISA_CRYPTO)
 
@@ -477,7 +481,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
     0, 0, 0, 0,   0, 0, 0, 0,  /* V8 - V15 */          \
     1, 1, 1, 1,   1, 1, 1, 1,   /* V16 - V23 */         \
     1, 1, 1, 1,   1, 1, 1, 1,   /* V24 - V31 */         \
-    1, 1, 1, 1,                        /* SFP, AP, CC, VG */   \
+    1, 1, 1, 0,                        /* SFP, AP, CC, VG */   \
     1, 1, 1, 1,   1, 1, 1, 1,  /* P0 - P7 */           \
     1, 1, 1, 1,   1, 1, 1, 1,  /* P8 - P15 */          \
     1, 1                       /* FFR and FFRT */      \
@@ -814,6 +818,13 @@ struct GTY (()) aarch64_frame
   vec<unsigned, va_gc_atomic> *saved_fprs;
   vec<unsigned, va_gc_atomic> *saved_prs;
 
+  /* The offset from the base of the frame of a 64-bit slot whose low
+     bit contains the incoming value of PSTATE.SM.  This slot must be
+     within reach of the hard frame pointer.
+
+     The offset is -1 if such a slot isn't needed.  */
+  poly_int64 old_svcr_offset;
+
   /* The number of extra stack bytes taken up by register varargs.
      This area is allocated by the callee at the very top of the
      frame.  This value is rounded up to a multiple of
@@ -922,6 +933,12 @@ typedef struct GTY (()) machine_function
   /* One entry for each general purpose register.  */
   rtx call_via[SP_REGNUM];
   bool label_is_assembled;
+
+  /* True if we've expanded at least one call to a function that changes
+     PSTATE.SM.  This should only be used for saving compile time: false
+     guarantees that no such mode switch exists.  */
+  bool call_switches_pstate_sm;
+
   /* A set of all decls that have been passed to a vld1 intrinsic in the
      current function.  This is used to help guide the vector cost model.  */
   hash_set<tree> *vector_load_decls;
@@ -990,6 +1007,12 @@ typedef struct
                                   stack arg area so far.  */
   bool silent_p;               /* True if we should act silently, rather than
                                   raise an error for invalid calls.  */
+
+  /* A list of registers that need to be saved and restored around a
+     change to PSTATE.SM.  An auto_vec would be more convenient, but those
+     can't be copied.  */
+  unsigned int num_sme_mode_switch_args;
+  rtx sme_mode_switch_args[12];
 } CUMULATIVE_ARGS;
 #endif
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9585879a1b1..9b586b5170b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -956,7 +956,7 @@ (define_expand "tbranch_<code><mode>3"
                                         operands[1]);
 })
 
-(define_insn "*tb<optab><ALLI:mode><GPI:mode>1"
+(define_insn "@aarch64_tb<optab><ALLI:mode><GPI:mode>"
   [(set (pc) (if_then_else
              (EQL (zero_extract:GPI (match_operand:ALLI 0 "register_operand" 
"r")
                                     (const_int 1)
@@ -1043,7 +1043,7 @@ (define_expand "call"
   [(parallel
      [(call (match_operand 0 "memory_operand")
            (match_operand 1 "general_operand"))
-      (unspec:DI [(match_operand 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
+      (unspec:DI [(match_operand 2)] UNSPEC_CALLEE_ABI)
       (clobber (reg:DI LR_REGNUM))])]
   ""
   "
@@ -1070,7 +1070,7 @@ (define_expand "call_value"
      [(set (match_operand 0 "")
           (call (match_operand 1 "memory_operand")
                 (match_operand 2 "general_operand")))
-     (unspec:DI [(match_operand 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
+     (unspec:DI [(match_operand 3)] UNSPEC_CALLEE_ABI)
      (clobber (reg:DI LR_REGNUM))])]
   ""
   "
@@ -1097,7 +1097,7 @@ (define_expand "sibcall"
   [(parallel
      [(call (match_operand 0 "memory_operand")
            (match_operand 1 "general_operand"))
-      (unspec:DI [(match_operand 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
+      (unspec:DI [(match_operand 2)] UNSPEC_CALLEE_ABI)
       (return)])]
   ""
   {
@@ -1111,7 +1111,7 @@ (define_expand "sibcall_value"
      [(set (match_operand 0 "")
           (call (match_operand 1 "memory_operand")
                 (match_operand 2 "general_operand")))
-      (unspec:DI [(match_operand 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
+      (unspec:DI [(match_operand 3)] UNSPEC_CALLEE_ABI)
       (return)])]
   ""
   {
@@ -8048,3 +8048,6 @@ (define_insn "patchable_area"
 
 ;; SVE2.
 (include "aarch64-sve2.md")
+
+;; SME and extensions
+(include "aarch64-sme.md")
diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index a4e0aa03274..cff56dc9f55 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -186,9 +186,12 @@ MULTILIB_DIRNAMES   = $(subst $(comma), 
,$(TM_MULTILIB_CONFIG))
 insn-conditions.md: s-check-sve-md
 s-check-sve-md: $(srcdir)/config/aarch64/check-sve-md.awk \
                $(srcdir)/config/aarch64/aarch64-sve.md \
-               $(srcdir)/config/aarch64/aarch64-sve2.md
+               $(srcdir)/config/aarch64/aarch64-sve2.md \
+               $(srcdir)/config/aarch64/aarch64-sme.md
        $(AWK) -f $(srcdir)/config/aarch64/check-sve-md.awk \
          $(srcdir)/config/aarch64/aarch64-sve.md
        $(AWK) -f $(srcdir)/config/aarch64/check-sve-md.awk \
          $(srcdir)/config/aarch64/aarch64-sve2.md
+       $(AWK) -f $(srcdir)/config/aarch64/check-sve-md.awk \
+         $(srcdir)/config/aarch64/aarch64-sme.md
        $(STAMP) s-check-sve-md
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
new file mode 100644
index 00000000000..a2de55773af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
@@ -0,0 +1,233 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+void ns_callee ();
+ void s_callee () [[arm::streaming]];
+ void sc_callee () [[arm::streaming_compatible]];
+
+void ns_callee_stack (int, int, int, int, int, int, int, int, int);
+
+struct callbacks {
+  void (*ns_ptr) ();
+   void (*s_ptr) () [[arm::streaming]];
+   void (*sc_ptr) () [[arm::streaming_compatible]];
+};
+
+/*
+** n_caller:   { target lp64 }
+**     stp     x30, (x19|x2[0-8]), \[sp, #?-96\]!
+**     cntd    x16
+**     str     x16, \[sp, #?16\]
+**     stp     d8, d9, \[sp, #?32\]
+**     stp     d10, d11, \[sp, #?48\]
+**     stp     d12, d13, \[sp, #?64\]
+**     stp     d14, d15, \[sp, #?80\]
+**     mov     \1, x0
+**     bl      ns_callee
+**     smstart sm
+**     bl      s_callee
+**     smstop  sm
+**     bl      sc_callee
+**     ldr     (x[0-9]+), \[\1\]
+**     blr     \2
+**     ldr     (x[0-9]+), \[\1, #?8\]
+**     smstart sm
+**     blr     \3
+**     smstop  sm
+**     ldr     (x[0-9]+), \[\1, #?16\]
+**     blr     \4
+**     ldp     d8, d9, \[sp, #?32\]
+**     ldp     d10, d11, \[sp, #?48\]
+**     ldp     d12, d13, \[sp, #?64\]
+**     ldp     d14, d15, \[sp, #?80\]
+**     ldp     x30, \1, \[sp\], #?96
+**     ret
+*/
+void
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+/*
+** s_caller:   { target lp64 }
+**     stp     x30, (x19|x2[0-8]), \[sp, #?-96\]!
+**     cntd    x16
+**     str     x16, \[sp, #?16\]
+**     stp     d8, d9, \[sp, #?32\]
+**     stp     d10, d11, \[sp, #?48\]
+**     stp     d12, d13, \[sp, #?64\]
+**     stp     d14, d15, \[sp, #?80\]
+**     mov     \1, x0
+**     smstop  sm
+**     bl      ns_callee
+**     smstart sm
+**     bl      s_callee
+**     bl      sc_callee
+**     ldr     (x[0-9]+), \[\1\]
+**     smstop  sm
+**     blr     \2
+**     smstart sm
+**     ldr     (x[0-9]+), \[\1, #?8\]
+**     blr     \3
+**     ldr     (x[0-9]+), \[\1, #?16\]
+**     blr     \4
+**     ldp     d8, d9, \[sp, #?32\]
+**     ldp     d10, d11, \[sp, #?48\]
+**     ldp     d12, d13, \[sp, #?64\]
+**     ldp     d14, d15, \[sp, #?80\]
+**     ldp     x30, \1, \[sp\], #?96
+**     ret
+*/
+void
+s_caller (struct callbacks *c) [[arm::streaming]]
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+/*
+** sc_caller_sme:
+**     stp     x29, x30, \[sp, #?-96\]!
+**     mov     x29, sp
+**     cntd    x16
+**     str     x16, \[sp, #?24\]
+**     stp     d8, d9, \[sp, #?32\]
+**     stp     d10, d11, \[sp, #?48\]
+**     stp     d12, d13, \[sp, #?64\]
+**     stp     d14, d15, \[sp, #?80\]
+**     mrs     x16, svcr
+**     str     x16, \[x29, #?16\]
+**     ldr     x16, \[x29, #?16\]
+**     tbz     x16, 0, .*
+**     smstop  sm
+**     bl      ns_callee
+**     ldr     x16, \[x29, #?16\]
+**     tbz     x16, 0, .*
+**     smstart sm
+**     ldr     x16, \[x29, #?16\]
+**     tbnz    x16, 0, .*
+**     smstart sm
+**     bl      s_callee
+**     ldr     x16, \[x29, #?16\]
+**     tbnz    x16, 0, .*
+**     smstop  sm
+**     bl      sc_callee
+**     ldp     d8, d9, \[sp, #?32\]
+**     ldp     d10, d11, \[sp, #?48\]
+**     ldp     d12, d13, \[sp, #?64\]
+**     ldp     d14, d15, \[sp, #?80\]
+**     ldp     x29, x30, \[sp\], #?96
+**     ret
+*/
+void
+sc_caller_sme () [[arm::streaming_compatible]]
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+}
+
+#pragma GCC target "+nosme"
+
+/*
+** sc_caller:
+**     stp     x29, x30, \[sp, #?-96\]!
+**     mov     x29, sp
+**     cntd    x16
+**     str     x16, \[sp, #?24\]
+**     stp     d8, d9, \[sp, #?32\]
+**     stp     d10, d11, \[sp, #?48\]
+**     stp     d12, d13, \[sp, #?64\]
+**     stp     d14, d15, \[sp, #?80\]
+**     bl      __arm_sme_state
+**     str     x0, \[x29, #?16\]
+**     ...
+**     bl      sc_callee
+**     ldp     d8, d9, \[sp, #?32\]
+**     ldp     d10, d11, \[sp, #?48\]
+**     ldp     d12, d13, \[sp, #?64\]
+**     ldp     d14, d15, \[sp, #?80\]
+**     ldp     x29, x30, \[sp\], #?96
+**     ret
+*/
+void
+sc_caller () [[arm::streaming_compatible]]
+{
+  ns_callee ();
+  sc_callee ();
+}
+
+/*
+** sc_caller_x0:
+**     ...
+**     mov     x10, x0
+**     bl      __arm_sme_state
+**     ...
+**     str     wzr, \[x10\]
+**     ...
+*/
+void
+sc_caller_x0 (int *ptr) [[arm::streaming_compatible]]
+{
+  *ptr = 0;
+  ns_callee ();
+  sc_callee ();
+}
+
+/*
+** sc_caller_x1:
+**     ...
+**     mov     x10, x0
+**     mov     x11, x1
+**     bl      __arm_sme_state
+**     ...
+**     str     w11, \[x10\]
+**     ...
+*/
+void
+sc_caller_x1 (int *ptr, int a) [[arm::streaming_compatible]]
+{
+  *ptr = a;
+  ns_callee ();
+  sc_callee ();
+}
+
+/*
+** sc_caller_stack:
+**     sub     sp, sp, #112
+**     stp     x29, x30, \[sp, #?16\]
+**     add     x29, sp, #?16
+**     ...
+**     stp     d8, d9, \[sp, #?48\]
+**     ...
+**     bl      __arm_sme_state
+**     str     x0, \[x29, #?16\]
+**     ...
+**     bl      ns_callee_stack
+**     ldr     x16, \[x29, #?16\]
+**     tbz     x16, 0, .*
+**     smstart sm
+**     ...
+*/
+void
+sc_caller_stack () [[arm::streaming_compatible]]
+{
+  ns_callee_stack (0, 0, 0, 0, 0, 0, 0, 0, 0);
+}
+
+/* { dg-final { scan-assembler {n_caller:(?:(?!ret).)*\.cfi_offset 46, -80\n} 
} } */
+/* { dg-final { scan-assembler {s_caller:(?:(?!ret).)*\.cfi_offset 46, -80\n} 
} } */
+/* { dg-final { scan-assembler {sc_caller_sme:(?:(?!ret).)*\.cfi_offset 46, 
-72\n} } } */
+/* { dg-final { scan-assembler {sc_caller:(?:(?!ret).)*\.cfi_offset 46, -72\n} 
} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c
new file mode 100644
index 00000000000..49c5e4a6acb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c
@@ -0,0 +1,37 @@
+// { dg-options "" }
+
+#pragma GCC target "+nosme"
+
+void ns_callee ();
+ void s_callee () [[arm::streaming]];
+ void sc_callee () [[arm::streaming_compatible]];
+
+struct callbacks {
+  void (*ns_ptr) ();
+   void (*s_ptr) () [[arm::streaming]];
+   void (*sc_ptr) () [[arm::streaming_compatible]];
+};
+
+void
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee (); // { dg-error "calling a streaming function requires the ISA 
extension 'sme'" }
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr (); // { dg-error "calling a streaming function requires the ISA 
extension 'sme'" }
+  c->sc_ptr ();
+}
+
+void
+sc_caller_sme (struct callbacks *c) [[arm::streaming_compatible]]
+{
+  ns_callee ();
+  s_callee (); // { dg-error "calling a streaming function requires the ISA 
extension 'sme'" }
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr (); // { dg-error "calling a streaming function requires the ISA 
extension 'sme'" }
+  c->sc_ptr ();
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c
new file mode 100644
index 00000000000..890fcbc5b1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c
@@ -0,0 +1,43 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+
+void ns_callee ();
+ void s_callee () [[arm::streaming]];
+ void sc_callee () [[arm::streaming_compatible]];
+
+struct callbacks {
+  void (*ns_ptr) ();
+   void (*s_ptr) () [[arm::streaming]];
+   void (*sc_ptr) () [[arm::streaming_compatible]];
+};
+
+void
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->sc_ptr ();
+}
+
+void
+s_caller (struct callbacks *c) [[arm::streaming]]
+{
+  s_callee ();
+  sc_callee ();
+
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+void
+sc_caller (struct callbacks *c) [[arm::streaming_compatible]]
+{
+  sc_callee ();
+
+  c->sc_ptr ();
+}
+
+// { dg-final { scan-assembler-not {[dpqz][0-9]+,} } }
+// { dg-final { scan-assembler-not {smstart\tsm} } }
+// { dg-final { scan-assembler-not {smstop\tsm} } }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c
new file mode 100644
index 00000000000..ed999d08560
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c
@@ -0,0 +1,166 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+__attribute__((aarch64_vector_pcs)) void ns_callee ();
+__attribute__((aarch64_vector_pcs)) void s_callee () [[arm::streaming]];
+__attribute__((aarch64_vector_pcs)) void sc_callee () 
[[arm::streaming_compatible]];
+
+struct callbacks {
+  __attribute__((aarch64_vector_pcs)) void (*ns_ptr) ();
+  __attribute__((aarch64_vector_pcs)) void (*s_ptr) () [[arm::streaming]];
+  __attribute__((aarch64_vector_pcs)) void (*sc_ptr) () 
[[arm::streaming_compatible]];
+};
+
+/*
+** n_caller:   { target lp64 }
+**     stp     x30, (x19|x2[0-8]), \[sp, #?-288\]!
+**     cntd    x16
+**     str     x16, \[sp, #?16\]
+**     stp     q8, q9, \[sp, #?32\]
+**     stp     q10, q11, \[sp, #?64\]
+**     stp     q12, q13, \[sp, #?96\]
+**     stp     q14, q15, \[sp, #?128\]
+**     stp     q16, q17, \[sp, #?160\]
+**     stp     q18, q19, \[sp, #?192\]
+**     stp     q20, q21, \[sp, #?224\]
+**     stp     q22, q23, \[sp, #?256\]
+**     mov     \1, x0
+**     bl      ns_callee
+**     smstart sm
+**     bl      s_callee
+**     smstop  sm
+**     bl      sc_callee
+**     ldr     (x[0-9]+), \[\1\]
+**     blr     \2
+**     ldr     (x[0-9]+), \[\1, #?8\]
+**     smstart sm
+**     blr     \3
+**     smstop  sm
+**     ldr     (x[0-9]+), \[\1, #?16\]
+**     blr     \4
+**     ldp     q8, q9, \[sp, #?32\]
+**     ldp     q10, q11, \[sp, #?64\]
+**     ldp     q12, q13, \[sp, #?96\]
+**     ldp     q14, q15, \[sp, #?128\]
+**     ldp     q16, q17, \[sp, #?160\]
+**     ldp     q18, q19, \[sp, #?192\]
+**     ldp     q20, q21, \[sp, #?224\]
+**     ldp     q22, q23, \[sp, #?256\]
+**     ldp     x30, \1, \[sp\], #?288
+**     ret
+*/
+void __attribute__((aarch64_vector_pcs))
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+/*
+** s_caller:   { target lp64 }
+**     stp     x30, (x19|x2[0-8]), \[sp, #?-288\]!
+**     cntd    x16
+**     str     x16, \[sp, #?16\]
+**     stp     q8, q9, \[sp, #?32\]
+**     stp     q10, q11, \[sp, #?64\]
+**     stp     q12, q13, \[sp, #?96\]
+**     stp     q14, q15, \[sp, #?128\]
+**     stp     q16, q17, \[sp, #?160\]
+**     stp     q18, q19, \[sp, #?192\]
+**     stp     q20, q21, \[sp, #?224\]
+**     stp     q22, q23, \[sp, #?256\]
+**     mov     \1, x0
+**     smstop  sm
+**     bl      ns_callee
+**     smstart sm
+**     bl      s_callee
+**     bl      sc_callee
+**     ldr     (x[0-9]+), \[\1\]
+**     smstop  sm
+**     blr     \2
+**     smstart sm
+**     ldr     (x[0-9]+), \[\1, #?8\]
+**     blr     \3
+**     ldr     (x[0-9]+), \[\1, #?16\]
+**     blr     \4
+**     ldp     q8, q9, \[sp, #?32\]
+**     ldp     q10, q11, \[sp, #?64\]
+**     ldp     q12, q13, \[sp, #?96\]
+**     ldp     q14, q15, \[sp, #?128\]
+**     ldp     q16, q17, \[sp, #?160\]
+**     ldp     q18, q19, \[sp, #?192\]
+**     ldp     q20, q21, \[sp, #?224\]
+**     ldp     q22, q23, \[sp, #?256\]
+**     ldp     x30, \1, \[sp\], #?288
+**     ret
+*/
+void __attribute__((aarch64_vector_pcs))
+s_caller (struct callbacks *c) [[arm::streaming]]
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+/*
+** sc_caller:
+**     stp     x29, x30, \[sp, #?-288\]!
+**     mov     x29, sp
+**     cntd    x16
+**     str     x16, \[sp, #?24\]
+**     stp     q8, q9, \[sp, #?32\]
+**     stp     q10, q11, \[sp, #?64\]
+**     stp     q12, q13, \[sp, #?96\]
+**     stp     q14, q15, \[sp, #?128\]
+**     stp     q16, q17, \[sp, #?160\]
+**     stp     q18, q19, \[sp, #?192\]
+**     stp     q20, q21, \[sp, #?224\]
+**     stp     q22, q23, \[sp, #?256\]
+**     mrs     x16, svcr
+**     str     x16, \[x29, #?16\]
+**     ldr     x16, \[x29, #?16\]
+**     tbz     x16, 0, .*
+**     smstop  sm
+**     bl      ns_callee
+**     ldr     x16, \[x29, #?16\]
+**     tbz     x16, 0, .*
+**     smstart sm
+**     ldr     x16, \[x29, #?16\]
+**     tbnz    x16, 0, .*
+**     smstart sm
+**     bl      s_callee
+**     ldr     x16, \[x29, #?16\]
+**     tbnz    x16, 0, .*
+**     smstop  sm
+**     bl      sc_callee
+**     ldp     q8, q9, \[sp, #?32\]
+**     ldp     q10, q11, \[sp, #?64\]
+**     ldp     q12, q13, \[sp, #?96\]
+**     ldp     q14, q15, \[sp, #?128\]
+**     ldp     q16, q17, \[sp, #?160\]
+**     ldp     q18, q19, \[sp, #?192\]
+**     ldp     q20, q21, \[sp, #?224\]
+**     ldp     q22, q23, \[sp, #?256\]
+**     ldp     x29, x30, \[sp\], #?288
+**     ret
+*/
+void __attribute__((aarch64_vector_pcs))
+sc_caller () [[arm::streaming_compatible]]
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+}
+
+/* { dg-final { scan-assembler {n_caller:(?:(?!ret).)*\.cfi_offset 46, -272\n} 
} } */
+/* { dg-final { scan-assembler {s_caller:(?:(?!ret).)*\.cfi_offset 46, -272\n} 
} } */
+/* { dg-final { scan-assembler {sc_caller:(?:(?!ret).)*\.cfi_offset 46, 
-264\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c
new file mode 100644
index 00000000000..f93a67f974a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c
@@ -0,0 +1,43 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+
+__attribute__((aarch64_vector_pcs)) void ns_callee ();
+__attribute__((aarch64_vector_pcs)) void s_callee () [[arm::streaming]];
+__attribute__((aarch64_vector_pcs)) void sc_callee () 
[[arm::streaming_compatible]];
+
+struct callbacks {
+  __attribute__((aarch64_vector_pcs)) void (*ns_ptr) ();
+  __attribute__((aarch64_vector_pcs)) void (*s_ptr) () [[arm::streaming]];
+  __attribute__((aarch64_vector_pcs)) void (*sc_ptr) () 
[[arm::streaming_compatible]];
+};
+
+void __attribute__((aarch64_vector_pcs))
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->sc_ptr ();
+}
+
+void __attribute__((aarch64_vector_pcs))
+s_caller (struct callbacks *c) [[arm::streaming]]
+{
+  s_callee ();
+  sc_callee ();
+
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+void __attribute__((aarch64_vector_pcs))
+sc_caller (struct callbacks *c) [[arm::streaming_compatible]]
+{
+  sc_callee ();
+
+  c->sc_ptr ();
+}
+
+// { dg-final { scan-assembler-not {[dpqz][0-9]+,} } }
+// { dg-final { scan-assembler-not {smstart\tsm} } }
+// { dg-final { scan-assembler-not {smstop\tsm} } }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
new file mode 100644
index 00000000000..be9b5cc0410
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
@@ -0,0 +1,318 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+#include <arm_sve.h>
+
+svbool_t ns_callee ();
+ svbool_t s_callee () [[arm::streaming]];
+ svbool_t sc_callee () [[arm::streaming_compatible]];
+
+struct callbacks {
+  svbool_t (*ns_ptr) ();
+   svbool_t (*s_ptr) () [[arm::streaming]];
+   svbool_t (*sc_ptr) () [[arm::streaming_compatible]];
+};
+
+/*
+** n_caller:   { target lp64 }
+**     stp     x30, (x19|x2[0-8]), \[sp, #?-32\]!
+**     cntd    x16
+**     str     x16, \[sp, #?16\]
+**     addvl   sp, sp, #-18
+**     str     p4, \[sp\]
+**     str     p5, \[sp, #1, mul vl\]
+**     str     p6, \[sp, #2, mul vl\]
+**     str     p7, \[sp, #3, mul vl\]
+**     str     p8, \[sp, #4, mul vl\]
+**     str     p9, \[sp, #5, mul vl\]
+**     str     p10, \[sp, #6, mul vl\]
+**     str     p11, \[sp, #7, mul vl\]
+**     str     p12, \[sp, #8, mul vl\]
+**     str     p13, \[sp, #9, mul vl\]
+**     str     p14, \[sp, #10, mul vl\]
+**     str     p15, \[sp, #11, mul vl\]
+**     str     z8, \[sp, #2, mul vl\]
+**     str     z9, \[sp, #3, mul vl\]
+**     str     z10, \[sp, #4, mul vl\]
+**     str     z11, \[sp, #5, mul vl\]
+**     str     z12, \[sp, #6, mul vl\]
+**     str     z13, \[sp, #7, mul vl\]
+**     str     z14, \[sp, #8, mul vl\]
+**     str     z15, \[sp, #9, mul vl\]
+**     str     z16, \[sp, #10, mul vl\]
+**     str     z17, \[sp, #11, mul vl\]
+**     str     z18, \[sp, #12, mul vl\]
+**     str     z19, \[sp, #13, mul vl\]
+**     str     z20, \[sp, #14, mul vl\]
+**     str     z21, \[sp, #15, mul vl\]
+**     str     z22, \[sp, #16, mul vl\]
+**     str     z23, \[sp, #17, mul vl\]
+**     mov     \1, x0
+**     bl      ns_callee
+**     smstart sm
+**     bl      s_callee
+**     addvl   sp, sp, #-1
+**     str     p0, \[sp\]
+**     smstop  sm
+**     ldr     p0, \[sp\]
+**     addvl   sp, sp, #1
+**     bl      sc_callee
+**     ldr     (x[0-9]+), \[\1\]
+**     blr     \2
+**     ldr     (x[0-9]+), \[\1, #?8\]
+**     smstart sm
+**     blr     \3
+**     addvl   sp, sp, #-1
+**     str     p0, \[sp\]
+**     smstop  sm
+**     ldr     p0, \[sp\]
+**     addvl   sp, sp, #1
+**     ldr     (x[0-9]+), \[\1, #?16\]
+**     blr     \4
+**     ldr     z8, \[sp, #2, mul vl\]
+**     ldr     z9, \[sp, #3, mul vl\]
+**     ldr     z10, \[sp, #4, mul vl\]
+**     ldr     z11, \[sp, #5, mul vl\]
+**     ldr     z12, \[sp, #6, mul vl\]
+**     ldr     z13, \[sp, #7, mul vl\]
+**     ldr     z14, \[sp, #8, mul vl\]
+**     ldr     z15, \[sp, #9, mul vl\]
+**     ldr     z16, \[sp, #10, mul vl\]
+**     ldr     z17, \[sp, #11, mul vl\]
+**     ldr     z18, \[sp, #12, mul vl\]
+**     ldr     z19, \[sp, #13, mul vl\]
+**     ldr     z20, \[sp, #14, mul vl\]
+**     ldr     z21, \[sp, #15, mul vl\]
+**     ldr     z22, \[sp, #16, mul vl\]
+**     ldr     z23, \[sp, #17, mul vl\]
+**     ldr     p4, \[sp\]
+**     ldr     p5, \[sp, #1, mul vl\]
+**     ldr     p6, \[sp, #2, mul vl\]
+**     ldr     p7, \[sp, #3, mul vl\]
+**     ldr     p8, \[sp, #4, mul vl\]
+**     ldr     p9, \[sp, #5, mul vl\]
+**     ldr     p10, \[sp, #6, mul vl\]
+**     ldr     p11, \[sp, #7, mul vl\]
+**     ldr     p12, \[sp, #8, mul vl\]
+**     ldr     p13, \[sp, #9, mul vl\]
+**     ldr     p14, \[sp, #10, mul vl\]
+**     ldr     p15, \[sp, #11, mul vl\]
+**     addvl   sp, sp, #18
+**     ldp     x30, \1, \[sp\], #?32
+**     ret
+*/
+svbool_t
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  return c->sc_ptr ();
+}
+
+/*
+** s_caller:   { target lp64 }
+**     stp     x30, (x19|x2[0-8]), \[sp, #?-32\]!
+**     cntd    x16
+**     str     x16, \[sp, #?16\]
+**     addvl   sp, sp, #-18
+**     str     p4, \[sp\]
+**     str     p5, \[sp, #1, mul vl\]
+**     str     p6, \[sp, #2, mul vl\]
+**     str     p7, \[sp, #3, mul vl\]
+**     str     p8, \[sp, #4, mul vl\]
+**     str     p9, \[sp, #5, mul vl\]
+**     str     p10, \[sp, #6, mul vl\]
+**     str     p11, \[sp, #7, mul vl\]
+**     str     p12, \[sp, #8, mul vl\]
+**     str     p13, \[sp, #9, mul vl\]
+**     str     p14, \[sp, #10, mul vl\]
+**     str     p15, \[sp, #11, mul vl\]
+**     str     z8, \[sp, #2, mul vl\]
+**     str     z9, \[sp, #3, mul vl\]
+**     str     z10, \[sp, #4, mul vl\]
+**     str     z11, \[sp, #5, mul vl\]
+**     str     z12, \[sp, #6, mul vl\]
+**     str     z13, \[sp, #7, mul vl\]
+**     str     z14, \[sp, #8, mul vl\]
+**     str     z15, \[sp, #9, mul vl\]
+**     str     z16, \[sp, #10, mul vl\]
+**     str     z17, \[sp, #11, mul vl\]
+**     str     z18, \[sp, #12, mul vl\]
+**     str     z19, \[sp, #13, mul vl\]
+**     str     z20, \[sp, #14, mul vl\]
+**     str     z21, \[sp, #15, mul vl\]
+**     str     z22, \[sp, #16, mul vl\]
+**     str     z23, \[sp, #17, mul vl\]
+**     mov     \1, x0
+**     smstop  sm
+**     bl      ns_callee
+**     addvl   sp, sp, #-1
+**     str     p0, \[sp\]
+**     smstart sm
+**     ldr     p0, \[sp\]
+**     addvl   sp, sp, #1
+**     bl      s_callee
+**     bl      sc_callee
+**     ldr     (x[0-9]+), \[\1\]
+**     smstop  sm
+**     blr     \2
+**     addvl   sp, sp, #-1
+**     str     p0, \[sp\]
+**     smstart sm
+**     ldr     p0, \[sp\]
+**     addvl   sp, sp, #1
+**     ldr     (x[0-9]+), \[\1, #?8\]
+**     blr     \3
+**     ldr     (x[0-9]+), \[\1, #?16\]
+**     blr     \4
+**     ldr     z8, \[sp, #2, mul vl\]
+**     ldr     z9, \[sp, #3, mul vl\]
+**     ldr     z10, \[sp, #4, mul vl\]
+**     ldr     z11, \[sp, #5, mul vl\]
+**     ldr     z12, \[sp, #6, mul vl\]
+**     ldr     z13, \[sp, #7, mul vl\]
+**     ldr     z14, \[sp, #8, mul vl\]
+**     ldr     z15, \[sp, #9, mul vl\]
+**     ldr     z16, \[sp, #10, mul vl\]
+**     ldr     z17, \[sp, #11, mul vl\]
+**     ldr     z18, \[sp, #12, mul vl\]
+**     ldr     z19, \[sp, #13, mul vl\]
+**     ldr     z20, \[sp, #14, mul vl\]
+**     ldr     z21, \[sp, #15, mul vl\]
+**     ldr     z22, \[sp, #16, mul vl\]
+**     ldr     z23, \[sp, #17, mul vl\]
+**     ldr     p4, \[sp\]
+**     ldr     p5, \[sp, #1, mul vl\]
+**     ldr     p6, \[sp, #2, mul vl\]
+**     ldr     p7, \[sp, #3, mul vl\]
+**     ldr     p8, \[sp, #4, mul vl\]
+**     ldr     p9, \[sp, #5, mul vl\]
+**     ldr     p10, \[sp, #6, mul vl\]
+**     ldr     p11, \[sp, #7, mul vl\]
+**     ldr     p12, \[sp, #8, mul vl\]
+**     ldr     p13, \[sp, #9, mul vl\]
+**     ldr     p14, \[sp, #10, mul vl\]
+**     ldr     p15, \[sp, #11, mul vl\]
+**     addvl   sp, sp, #18
+**     ldp     x30, \1, \[sp\], #?32
+**     ret
+*/
+svbool_t
+s_caller (struct callbacks *c) [[arm::streaming]]
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  return c->sc_ptr ();
+}
+
+/*
+** sc_caller:
+**     stp     x29, x30, \[sp, #?-32\]!
+**     mov     x29, sp
+**     cntd    x16
+**     str     x16, \[sp, #?24\]
+**     addvl   sp, sp, #-18
+**     str     p4, \[sp\]
+**     str     p5, \[sp, #1, mul vl\]
+**     str     p6, \[sp, #2, mul vl\]
+**     str     p7, \[sp, #3, mul vl\]
+**     str     p8, \[sp, #4, mul vl\]
+**     str     p9, \[sp, #5, mul vl\]
+**     str     p10, \[sp, #6, mul vl\]
+**     str     p11, \[sp, #7, mul vl\]
+**     str     p12, \[sp, #8, mul vl\]
+**     str     p13, \[sp, #9, mul vl\]
+**     str     p14, \[sp, #10, mul vl\]
+**     str     p15, \[sp, #11, mul vl\]
+**     str     z8, \[sp, #2, mul vl\]
+**     str     z9, \[sp, #3, mul vl\]
+**     str     z10, \[sp, #4, mul vl\]
+**     str     z11, \[sp, #5, mul vl\]
+**     str     z12, \[sp, #6, mul vl\]
+**     str     z13, \[sp, #7, mul vl\]
+**     str     z14, \[sp, #8, mul vl\]
+**     str     z15, \[sp, #9, mul vl\]
+**     str     z16, \[sp, #10, mul vl\]
+**     str     z17, \[sp, #11, mul vl\]
+**     str     z18, \[sp, #12, mul vl\]
+**     str     z19, \[sp, #13, mul vl\]
+**     str     z20, \[sp, #14, mul vl\]
+**     str     z21, \[sp, #15, mul vl\]
+**     str     z22, \[sp, #16, mul vl\]
+**     str     z23, \[sp, #17, mul vl\]
+**     mrs     x16, svcr
+**     str     x16, \[x29, #?16\]
+**     ldr     x16, \[x29, #?16\]
+**     tbz     x16, 0, .*
+**     smstop  sm
+**     bl      ns_callee
+**     ldr     x16, \[x29, #?16\]
+**     tbz     x16, 0, .*
+**     addvl   sp, sp, #-1
+**     str     p0, \[sp\]
+**     smstart sm
+**     ldr     p0, \[sp\]
+**     addvl   sp, sp, #1
+**     ldr     x16, \[x29, #?16\]
+**     tbnz    x16, 0, .*
+**     smstart sm
+**     bl      s_callee
+**     ldr     x16, \[x29, #?16\]
+**     tbnz    x16, 0, .*
+**     addvl   sp, sp, #-1
+**     str     p0, \[sp\]
+**     smstop  sm
+**     ldr     p0, \[sp\]
+**     addvl   sp, sp, #1
+**     bl      sc_callee
+**     ldr     z8, \[sp, #2, mul vl\]
+**     ldr     z9, \[sp, #3, mul vl\]
+**     ldr     z10, \[sp, #4, mul vl\]
+**     ldr     z11, \[sp, #5, mul vl\]
+**     ldr     z12, \[sp, #6, mul vl\]
+**     ldr     z13, \[sp, #7, mul vl\]
+**     ldr     z14, \[sp, #8, mul vl\]
+**     ldr     z15, \[sp, #9, mul vl\]
+**     ldr     z16, \[sp, #10, mul vl\]
+**     ldr     z17, \[sp, #11, mul vl\]
+**     ldr     z18, \[sp, #12, mul vl\]
+**     ldr     z19, \[sp, #13, mul vl\]
+**     ldr     z20, \[sp, #14, mul vl\]
+**     ldr     z21, \[sp, #15, mul vl\]
+**     ldr     z22, \[sp, #16, mul vl\]
+**     ldr     z23, \[sp, #17, mul vl\]
+**     ldr     p4, \[sp\]
+**     ldr     p5, \[sp, #1, mul vl\]
+**     ldr     p6, \[sp, #2, mul vl\]
+**     ldr     p7, \[sp, #3, mul vl\]
+**     ldr     p8, \[sp, #4, mul vl\]
+**     ldr     p9, \[sp, #5, mul vl\]
+**     ldr     p10, \[sp, #6, mul vl\]
+**     ldr     p11, \[sp, #7, mul vl\]
+**     ldr     p12, \[sp, #8, mul vl\]
+**     ldr     p13, \[sp, #9, mul vl\]
+**     ldr     p14, \[sp, #10, mul vl\]
+**     ldr     p15, \[sp, #11, mul vl\]
+**     addvl   sp, sp, #18
+**     ldp     x29, x30, \[sp\], #?32
+**     ret
+*/
+svbool_t
+sc_caller () [[arm::streaming_compatible]]
+{
+  ns_callee ();
+  s_callee ();
+  return sc_callee ();
+}
+
+/* { dg-final { scan-assembler {n_caller:(?:(?!ret).)*\.cfi_offset 46, -16\n} 
} } */
+/* { dg-final { scan-assembler {s_caller:(?:(?!ret).)*\.cfi_offset 46, -16\n} 
} } */
+/* { dg-final { scan-assembler {sc_caller:(?:(?!ret).)*\.cfi_offset 46, -8\n} 
} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c
new file mode 100644
index 00000000000..0f6bc4f6c9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c
@@ -0,0 +1,45 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+
+#include <arm_sve.h>
+
+svbool_t ns_callee ();
+ svbool_t s_callee () [[arm::streaming]];
+ svbool_t sc_callee () [[arm::streaming_compatible]];
+
+struct callbacks {
+  svbool_t (*ns_ptr) ();
+   svbool_t (*s_ptr) () [[arm::streaming]];
+   svbool_t (*sc_ptr) () [[arm::streaming_compatible]];
+};
+
+svbool_t
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  return c->sc_ptr ();
+}
+
+svbool_t
+s_caller (struct callbacks *c) [[arm::streaming]]
+{
+  s_callee ();
+  sc_callee ();
+
+  c->s_ptr ();
+  return c->sc_ptr ();
+}
+
+svbool_t
+sc_caller (struct callbacks *c) [[arm::streaming_compatible]]
+{
+  sc_callee ();
+
+  return c->sc_ptr ();
+}
+
+// { dg-final { scan-assembler-not {[dpqz][0-9]+,} } }
+// { dg-final { scan-assembler-not {smstart\tsm} } }
+// { dg-final { scan-assembler-not {smstop\tsm} } }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c
new file mode 100644
index 00000000000..6482a489fc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c
@@ -0,0 +1,516 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+#include <arm_neon.h>
+#include <arm_sve.h>
+
+double produce_d0 ();
+void consume_d0 (double);
+
+/*
+** test_d0:
+**     ...
+**     smstop  sm
+**     bl      produce_d0
+**     fmov    x10, d0
+**     smstart sm
+**     fmov    d0, x10
+**     fmov    x10, d0
+**     smstop  sm
+**     fmov    d0, x10
+**     bl      consume_d0
+**     ...
+*/
+void
+test_d0 () [[arm::streaming]]
+{
+  double res = produce_d0 ();
+  asm volatile ("");
+  consume_d0 (res);
+}
+
+int8x8_t produce_d0_vec ();
+void consume_d0_vec (int8x8_t);
+
+/*
+** test_d0_vec:
+**     ...
+**     smstop  sm
+**     bl      produce_d0_vec
+** (
+**     fmov    x10, d0
+** |
+**     umov    x10, v0.d\[0\]
+** )
+**     smstart sm
+**     fmov    d0, x10
+** (
+**     fmov    x10, d0
+** |
+**     umov    x10, v0.d\[0\]
+** )
+**     smstop  sm
+**     fmov    d0, x10
+**     bl      consume_d0_vec
+**     ...
+*/
+void
+test_d0_vec () [[arm::streaming]]
+{
+  int8x8_t res = produce_d0_vec ();
+  asm volatile ("");
+  consume_d0_vec (res);
+}
+
+int8x16_t produce_q0 ();
+void consume_q0 (int8x16_t);
+
+/*
+** test_q0:
+**     ...
+**     smstop  sm
+**     bl      produce_q0
+**     str     q0, \[sp, #?-16\]!
+**     smstart sm
+**     ldr     q0, \[sp\], #?16
+**     str     q0, \[sp, #?-16\]!
+**     smstop  sm
+**     ldr     q0, \[sp\], #?16
+**     bl      consume_q0
+**     ...
+*/
+void
+test_q0 () [[arm::streaming]]
+{
+  int8x16_t res = produce_q0 ();
+  asm volatile ("");
+  consume_q0 (res);
+}
+
+int8x16x2_t produce_q1 ();
+void consume_q1 (int8x16x2_t);
+
+/*
+** test_q1:
+**     ...
+**     smstop  sm
+**     bl      produce_q1
+**     stp     q0, q1, \[sp, #?-32\]!
+**     smstart sm
+**     ldp     q0, q1, \[sp\], #?32
+**     stp     q0, q1, \[sp, #?-32\]!
+**     smstop  sm
+**     ldp     q0, q1, \[sp\], #?32
+**     bl      consume_q1
+**     ...
+*/
+void
+test_q1 () [[arm::streaming]]
+{
+  int8x16x2_t res = produce_q1 ();
+  asm volatile ("");
+  consume_q1 (res);
+}
+
+int8x16x3_t produce_q2 ();
+void consume_q2 (int8x16x3_t);
+
+/*
+** test_q2:
+**     ...
+**     smstop  sm
+**     bl      produce_q2
+**     stp     q0, q1, \[sp, #?-48\]!
+**     str     q2, \[sp, #?32\]
+**     smstart sm
+**     ldr     q2, \[sp, #?32\]
+**     ldp     q0, q1, \[sp\], #?48
+**     stp     q0, q1, \[sp, #?-48\]!
+**     str     q2, \[sp, #?32\]
+**     smstop  sm
+**     ldr     q2, \[sp, #?32\]
+**     ldp     q0, q1, \[sp\], #?48
+**     bl      consume_q2
+**     ...
+*/
+void
+test_q2 () [[arm::streaming]]
+{
+  int8x16x3_t res = produce_q2 ();
+  asm volatile ("");
+  consume_q2 (res);
+}
+
+int8x16x4_t produce_q3 ();
+void consume_q3 (int8x16x4_t);
+
+/*
+** test_q3:
+**     ...
+**     smstop  sm
+**     bl      produce_q3
+**     stp     q0, q1, \[sp, #?-64\]!
+**     stp     q2, q3, \[sp, #?32\]
+**     smstart sm
+**     ldp     q2, q3, \[sp, #?32\]
+**     ldp     q0, q1, \[sp\], #?64
+**     stp     q0, q1, \[sp, #?-64\]!
+**     stp     q2, q3, \[sp, #?32\]
+**     smstop  sm
+**     ldp     q2, q3, \[sp, #?32\]
+**     ldp     q0, q1, \[sp\], #?64
+**     bl      consume_q3
+**     ...
+*/
+void
+test_q3 () [[arm::streaming]]
+{
+  int8x16x4_t res = produce_q3 ();
+  asm volatile ("");
+  consume_q3 (res);
+}
+
+svint8_t produce_z0 ();
+void consume_z0 (svint8_t);
+
+/*
+** test_z0:
+**     ...
+**     smstop  sm
+**     bl      produce_z0
+**     addvl   sp, sp, #-1
+**     str     z0, \[sp\]
+**     smstart sm
+**     ldr     z0, \[sp\]
+**     addvl   sp, sp, #1
+**     addvl   sp, sp, #-1
+**     str     z0, \[sp\]
+**     smstop  sm
+**     ldr     z0, \[sp\]
+**     addvl   sp, sp, #1
+**     bl      consume_z0
+**     ...
+*/
+void
+test_z0 () [[arm::streaming]]
+{
+  svint8_t res = produce_z0 ();
+  asm volatile ("");
+  consume_z0 (res);
+}
+
+svint8x4_t produce_z3 ();
+void consume_z3 (svint8x4_t);
+
+/*
+** test_z3:
+**     ...
+**     smstop  sm
+**     bl      produce_z3
+**     addvl   sp, sp, #-4
+**     str     z0, \[sp\]
+**     str     z1, \[sp, #1, mul vl\]
+**     str     z2, \[sp, #2, mul vl\]
+**     str     z3, \[sp, #3, mul vl\]
+**     smstart sm
+**     ldr     z0, \[sp\]
+**     ldr     z1, \[sp, #1, mul vl\]
+**     ldr     z2, \[sp, #2, mul vl\]
+**     ldr     z3, \[sp, #3, mul vl\]
+**     addvl   sp, sp, #4
+**     addvl   sp, sp, #-4
+**     str     z0, \[sp\]
+**     str     z1, \[sp, #1, mul vl\]
+**     str     z2, \[sp, #2, mul vl\]
+**     str     z3, \[sp, #3, mul vl\]
+**     smstop  sm
+**     ldr     z0, \[sp\]
+**     ldr     z1, \[sp, #1, mul vl\]
+**     ldr     z2, \[sp, #2, mul vl\]
+**     ldr     z3, \[sp, #3, mul vl\]
+**     addvl   sp, sp, #4
+**     bl      consume_z3
+**     ...
+*/
+void
+test_z3 () [[arm::streaming]]
+{
+  svint8x4_t res = produce_z3 ();
+  asm volatile ("");
+  consume_z3 (res);
+}
+
+svbool_t produce_p0 ();
+void consume_p0 (svbool_t);
+
+/*
+** test_p0:
+**     ...
+**     smstop  sm
+**     bl      produce_p0
+**     addvl   sp, sp, #-1
+**     str     p0, \[sp\]
+**     smstart sm
+**     ldr     p0, \[sp\]
+**     addvl   sp, sp, #1
+**     addvl   sp, sp, #-1
+**     str     p0, \[sp\]
+**     smstop  sm
+**     ldr     p0, \[sp\]
+**     addvl   sp, sp, #1
+**     bl      consume_p0
+**     ...
+*/
+void
+test_p0 () [[arm::streaming]]
+{
+  svbool_t res = produce_p0 ();
+  asm volatile ("");
+  consume_p0 (res);
+}
+
+void consume_d7 (double, double, double, double, double, double, double,
+                double);
+
+/*
+** test_d7:
+**     ...
+**     fmov    x10, d0
+**     fmov    x11, d1
+**     fmov    x12, d2
+**     fmov    x13, d3
+**     fmov    x14, d4
+**     fmov    x15, d5
+**     fmov    x16, d6
+**     fmov    x17, d7
+**     smstop  sm
+**     fmov    d0, x10
+**     fmov    d1, x11
+**     fmov    d2, x12
+**     fmov    d3, x13
+**     fmov    d4, x14
+**     fmov    d5, x15
+**     fmov    d6, x16
+**     fmov    d7, x17
+**     bl      consume_d7
+**     ...
+*/
+void
+test_d7 () [[arm::streaming]]
+{
+  consume_d7 (1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0);
+}
+
+void consume_d7_vec (int8x8_t, int8x8_t, int8x8_t, int8x8_t, int8x8_t,
+                    int8x8_t, int8x8_t, int8x8_t);
+
+/*
+** test_d7_vec:
+**     ...
+** (
+**     fmov    x10, d0
+**     fmov    x11, d1
+**     fmov    x12, d2
+**     fmov    x13, d3
+**     fmov    x14, d4
+**     fmov    x15, d5
+**     fmov    x16, d6
+**     fmov    x17, d7
+** |
+**     umov    x10, v0.d\[0\]
+**     umov    x11, v1.d\[0\]
+**     umov    x12, v2.d\[0\]
+**     umov    x13, v3.d\[0\]
+**     umov    x14, v4.d\[0\]
+**     umov    x15, v5.d\[0\]
+**     umov    x16, v6.d\[0\]
+**     umov    x17, v7.d\[0\]
+** )
+**     smstop  sm
+**     fmov    d0, x10
+**     fmov    d1, x11
+**     fmov    d2, x12
+**     fmov    d3, x13
+**     fmov    d4, x14
+**     fmov    d5, x15
+**     fmov    d6, x16
+**     fmov    d7, x17
+**     bl      consume_d7_vec
+**     ...
+*/
+void
+test_d7_vec (int8x8_t *ptr) [[arm::streaming]]
+{
+  consume_d7_vec (*ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr);
+}
+
+void consume_q7 (int8x16_t, int8x16_t, int8x16_t, int8x16_t, int8x16_t,
+                int8x16_t, int8x16_t, int8x16_t);
+
+/*
+** test_q7:
+**     ...
+**     stp     q0, q1, \[sp, #?-128\]!
+**     stp     q2, q3, \[sp, #?32\]
+**     stp     q4, q5, \[sp, #?64\]
+**     stp     q6, q7, \[sp, #?96\]
+**     smstop  sm
+**     ldp     q2, q3, \[sp, #?32\]
+**     ldp     q4, q5, \[sp, #?64\]
+**     ldp     q6, q7, \[sp, #?96\]
+**     ldp     q0, q1, \[sp\], #?128
+**     bl      consume_q7
+**     ...
+*/
+void
+test_q7 (int8x16_t *ptr) [[arm::streaming]]
+{
+  consume_q7 (*ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr);
+}
+
+void consume_z7 (svint8_t, svint8_t, svint8_t, svint8_t, svint8_t,
+                svint8_t, svint8_t, svint8_t);
+
+/*
+** test_z7:
+**     ...
+**     addvl   sp, sp, #-8
+**     str     z0, \[sp\]
+**     str     z1, \[sp, #1, mul vl\]
+**     str     z2, \[sp, #2, mul vl\]
+**     str     z3, \[sp, #3, mul vl\]
+**     str     z4, \[sp, #4, mul vl\]
+**     str     z5, \[sp, #5, mul vl\]
+**     str     z6, \[sp, #6, mul vl\]
+**     str     z7, \[sp, #7, mul vl\]
+**     smstop  sm
+**     ldr     z0, \[sp\]
+**     ldr     z1, \[sp, #1, mul vl\]
+**     ldr     z2, \[sp, #2, mul vl\]
+**     ldr     z3, \[sp, #3, mul vl\]
+**     ldr     z4, \[sp, #4, mul vl\]
+**     ldr     z5, \[sp, #5, mul vl\]
+**     ldr     z6, \[sp, #6, mul vl\]
+**     ldr     z7, \[sp, #7, mul vl\]
+**     addvl   sp, sp, #8
+**     bl      consume_z7
+**     ...
+*/
+void
+test_z7 (svint8_t *ptr) [[arm::streaming]]
+{
+  consume_z7 (*ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr);
+}
+
+void consume_p3 (svbool_t, svbool_t, svbool_t, svbool_t);
+
+/*
+** test_p3:
+**     ...
+**     addvl   sp, sp, #-1
+**     str     p0, \[sp\]
+**     str     p1, \[sp, #1, mul vl\]
+**     str     p2, \[sp, #2, mul vl\]
+**     str     p3, \[sp, #3, mul vl\]
+**     smstop  sm
+**     ldr     p0, \[sp\]
+**     ldr     p1, \[sp, #1, mul vl\]
+**     ldr     p2, \[sp, #2, mul vl\]
+**     ldr     p3, \[sp, #3, mul vl\]
+**     addvl   sp, sp, #1
+**     bl      consume_p3
+**     ...
+*/
+void
+test_p3 (svbool_t *ptr) [[arm::streaming]]
+{
+  consume_p3 (*ptr, *ptr, *ptr, *ptr);
+}
+
+void consume_mixed (float, double, float32x4_t, svfloat32_t,
+                   float, double, float64x2_t, svfloat64_t,
+                   svbool_t, svbool_t, svbool_t, svbool_t);
+
+/*
+** test_mixed:
+**     ...
+**     addvl   sp, sp, #-3
+**     str     p0, \[sp\]
+**     str     p1, \[sp, #1, mul vl\]
+**     str     p2, \[sp, #2, mul vl\]
+**     str     p3, \[sp, #3, mul vl\]
+**     str     z3, \[sp, #1, mul vl\]
+**     str     z7, \[sp, #2, mul vl\]
+**     stp     q2, q6, \[sp, #?-32\]!
+**     fmov    w10, s0
+**     fmov    x11, d1
+**     fmov    w12, s4
+**     fmov    x13, d5
+**     smstop  sm
+**     fmov    s0, w10
+**     fmov    d1, x11
+**     fmov    s4, w12
+**     fmov    d5, x13
+**     ldp     q2, q6, \[sp\], #?32
+**     ldr     p0, \[sp\]
+**     ldr     p1, \[sp, #1, mul vl\]
+**     ldr     p2, \[sp, #2, mul vl\]
+**     ldr     p3, \[sp, #3, mul vl\]
+**     ldr     z3, \[sp, #1, mul vl\]
+**     ldr     z7, \[sp, #2, mul vl\]
+**     addvl   sp, sp, #3
+**     bl      consume_mixed
+**     ...
+*/
+void
+test_mixed (float32x4_t *float32x4_ptr,
+           svfloat32_t *svfloat32_ptr,
+           float64x2_t *float64x2_ptr,
+           svfloat64_t *svfloat64_ptr,
+           svbool_t *svbool_ptr) [[arm::streaming]]
+{
+  consume_mixed (1.0f, 2.0, *float32x4_ptr, *svfloat32_ptr,
+                3.0f, 4.0, *float64x2_ptr, *svfloat64_ptr,
+                *svbool_ptr, *svbool_ptr, *svbool_ptr, *svbool_ptr);
+}
+
+void consume_varargs (float, ...);
+
+/*
+** test_varargs:
+**     ...
+**     stp     q3, q7, \[sp, #?-32\]!
+**     fmov    w10, s0
+**     fmov    x11, d1
+** (
+**     fmov    x12, d2
+** |
+**     umov    x12, v2.d\[0\]
+** )
+**     fmov    x13, d4
+**     fmov    x14, d5
+** (
+**     fmov    x15, d6
+** |
+**     umov    x15, v6.d\[0\]
+** )
+**     smstop  sm
+**     fmov    s0, w10
+**     fmov    d1, x11
+**     fmov    d2, x12
+**     fmov    d4, x13
+**     fmov    d5, x14
+**     fmov    d6, x15
+**     ldp     q3, q7, \[sp\], #?32
+**     bl      consume_varargs
+**     ...
+*/
+void
+test_varargs (float32x2_t *float32x2_ptr,
+             float32x4_t *float32x4_ptr,
+             float64x1_t *float64x1_ptr,
+             float64x2_t *float64x2_ptr) [[arm::streaming]]
+{
+  consume_varargs (1.0f, 2.0, *float32x2_ptr, *float32x4_ptr,
+                  3.0f, 4.0, *float64x1_ptr, *float64x2_ptr);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c
new file mode 100644
index 00000000000..f44724df32f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c
@@ -0,0 +1,87 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls 
-msve-vector-bits=128" }
+// { dg-final { check-function-bodies "**" "" } }
+
+#include <arm_sve.h>
+
+svint8_t produce_z0 ();
+void consume_z0 (svint8_t);
+
+/*
+** test_z0:
+**     ...
+**     smstop  sm
+**     bl      produce_z0
+**     str     q0, \[sp, #?-16\]!
+**     smstart sm
+**     ldr     q0, \[sp\], #?16
+**     str     q0, \[sp, #?-16\]!
+**     smstop  sm
+**     ldr     q0, \[sp\], #?16
+**     bl      consume_z0
+**     ...
+*/
+void
+test_z0 () [[arm::streaming]]
+{
+  svint8_t res = produce_z0 ();
+  asm volatile ("");
+  consume_z0 (res);
+}
+
+svint8x4_t produce_z3 ();
+void consume_z3 (svint8x4_t);
+
+/*
+** test_z3:
+**     ...
+**     smstop  sm
+**     bl      produce_z3
+**     stp     q0, q1, \[sp, #?-64\]!
+**     stp     q2, q3, \[sp, #?32\]
+**     smstart sm
+**     ldp     q2, q3, \[sp, #?32\]
+**     ldp     q0, q1, \[sp\], #?64
+**     stp     q0, q1, \[sp, #?-64\]!
+**     stp     q2, q3, \[sp, #?32\]
+**     smstop  sm
+**     ldp     q2, q3, \[sp, #?32\]
+**     ldp     q0, q1, \[sp\], #?64
+**     bl      consume_z3
+**     ...
+*/
+void
+test_z3 () [[arm::streaming]]
+{
+  svint8x4_t res = produce_z3 ();
+  asm volatile ("");
+  consume_z3 (res);
+}
+
+svbool_t produce_p0 ();
+void consume_p0 (svbool_t);
+
+/*
+** test_p0:
+**     ...
+**     smstop  sm
+**     bl      produce_p0
+**     sub     sp, sp, #?16
+**     str     p0, \[sp\]
+**     smstart sm
+**     ldr     p0, \[sp\]
+**     add     sp, sp, #?16
+**     sub     sp, sp, #?16
+**     str     p0, \[sp\]
+**     smstop  sm
+**     ldr     p0, \[sp\]
+**     add     sp, sp, #?16
+**     bl      consume_p0
+**     ...
+*/
+void
+test_p0 () [[arm::streaming]]
+{
+  svbool_t res = produce_p0 ();
+  asm volatile ("");
+  consume_p0 (res);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c
new file mode 100644
index 00000000000..83b4073eef3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c
@@ -0,0 +1,103 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls 
-msve-vector-bits=256" }
+// { dg-final { check-function-bodies "**" "" } }
+
+#include <arm_sve.h>
+
+svint8_t produce_z0 ();
+void consume_z0 (svint8_t);
+
+/*
+** test_z0:
+**     ...
+**     smstop  sm
+**     bl      produce_z0
+**     sub     sp, sp, #?32
+**     str     z0, \[sp\]
+**     smstart sm
+**     ldr     z0, \[sp\]
+**     add     sp, sp, #?32
+**     sub     sp, sp, #?32
+**     str     z0, \[sp\]
+**     smstop  sm
+**     ldr     z0, \[sp\]
+**     add     sp, sp, #?32
+**     bl      consume_z0
+**     ...
+*/
+void
+test_z0 () [[arm::streaming]]
+{
+  svint8_t res = produce_z0 ();
+  asm volatile ("");
+  consume_z0 (res);
+}
+
+svint8x4_t produce_z3 ();
+void consume_z3 (svint8x4_t);
+
+/*
+** test_z3:
+**     ...
+**     smstop  sm
+**     bl      produce_z3
+**     sub     sp, sp, #?128
+**     str     z0, \[sp\]
+**     str     z1, \[sp, #1, mul vl\]
+**     str     z2, \[sp, #2, mul vl\]
+**     str     z3, \[sp, #3, mul vl\]
+**     smstart sm
+**     ldr     z0, \[sp\]
+**     ldr     z1, \[sp, #1, mul vl\]
+**     ldr     z2, \[sp, #2, mul vl\]
+**     ldr     z3, \[sp, #3, mul vl\]
+**     add     sp, sp, #?128
+**     sub     sp, sp, #?128
+**     str     z0, \[sp\]
+**     str     z1, \[sp, #1, mul vl\]
+**     str     z2, \[sp, #2, mul vl\]
+**     str     z3, \[sp, #3, mul vl\]
+**     smstop  sm
+**     ldr     z0, \[sp\]
+**     ldr     z1, \[sp, #1, mul vl\]
+**     ldr     z2, \[sp, #2, mul vl\]
+**     ldr     z3, \[sp, #3, mul vl\]
+**     add     sp, sp, #?128
+**     bl      consume_z3
+**     ...
+*/
+void
+test_z3 () [[arm::streaming]]
+{
+  svint8x4_t res = produce_z3 ();
+  asm volatile ("");
+  consume_z3 (res);
+}
+
+svbool_t produce_p0 ();
+void consume_p0 (svbool_t);
+
+/*
+** test_p0:
+**     ...
+**     smstop  sm
+**     bl      produce_p0
+**     sub     sp, sp, #?32
+**     str     p0, \[sp\]
+**     smstart sm
+**     ldr     p0, \[sp\]
+**     add     sp, sp, #?32
+**     sub     sp, sp, #?32
+**     str     p0, \[sp\]
+**     smstop  sm
+**     ldr     p0, \[sp\]
+**     add     sp, sp, #?32
+**     bl      consume_p0
+**     ...
+*/
+void
+test_p0 () [[arm::streaming]]
+{
+  svbool_t res = produce_p0 ();
+  asm volatile ("");
+  consume_p0 (res);
+}
-- 
2.25.1


Reply via email to