Ping (with Szabolcs remarks fixed).

On 07/02/2018 16:07, Adhemerval Zanella wrote:
> Changes from previous version:
> 
>   - Changed the wait to call __morestack to use use a branch with link
>     instead of a simple branch.  This allows use a call instruction and
>     avoid possible issues with later optimization passes which might
>     see a branch outside the instruction block (as noticed in previous
>     iterations while building a more complex workload as speccpu2006).
> 
>   - Change the return address to use the branch with link value and
>     set x12 to save x30.  This simplifies the required instructions
>     to setup/save the return address.
> 
> --
> 
> This patch adds the split-stack support on aarch64 (PR #67877).  As for
> other ports this patch should be used along with glibc and gold support.
> 
> The support is done similar to other architectures: a split-stack field
> is allocated before TCB by glibc, a target-specific __morestack implementation
> and helper functions are added in libgcc and compiler supported in adjusted
> (split-stack prologue, va_start for argument handling).  I also plan to
> send the gold support to adjust stack allocation acrosss split-stack
> and default code calls.
> 
> Current approach is to set the final stack adjustments using a 2 instructions
> at most (mov/movk) which limits stack allocation to upper limit of 4GB.
> The morestack call is non standard with x10 hollding the requested stack
> pointer, x11 the argument pointer (if required), and x12 to return
> continuation address.  Unwinding is handled by a personality routine that
> knows how to find stack segments.
> 
> Split-stack prologue on function entry is as follow (this goes before the
> usual function prologue):
> 
> function:
>       mrs    x9, tpidr_el0
>       ldur   x9, [x9, -8]
>       mov    x10, <required stack allocation>
>       movk   x10, #0x0, lsl #16
>       sub    x10, sp, x10
>       mov    x11, sp          # if function has stacked arguments
>       cmp    x9, x10
>       bcc    .LX
> main_fn_entry:
>       [function prologue]
> LX:
>       bl     __morestack
>       b      main_fn_entry
> 
> Notes:
> 
> 1. Even if a function does not allocate a stack frame, a split-stack prologue
>    is created.  It is to avoid issues with tail call for external symbols
>    which might require linker adjustment (libgo/runtime/go-varargs.c).
> 
> 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldur
>    to after the required stack calculation.
> 
> 3. Similar to powerpc, When the linker detects a call from split-stack to
>    non-split-stack code, it adds 16k (or more) to the value found in 
> "allocate"
>    instructions (so non-split-stack code gets a larger stack).  The amount is
>    tunable by a linker option.  This feature is only implemented in the GNU
>    gold linker.
> 
> 4. AArch64 does not handle >4G stack initially and although it is possible
>    to implement it, limiting to 4G allows to materize the allocation with
>    only 2 instructions (mov + movk) and thus simplifying the linker
>    adjustments required.  Supporting multiple threads each requiring more
>    than 4G of stack is probably not that important, and likely to OOM at
>    run time.
> 
> 5. The TCB support on GLIBC is meant to be included in version 2.28.
> 
> 6. Besides a regression tests I also checked with a SPECcpu2006 run with
>    -fsplit-stack additional option.  I saw no regression besides 416.gamess
>    which fails on trunk as well (not sure if some misconfiguration in my
>    environment).
> 
> libgcc/ChangeLog:
> 
>       * libgcc/config.host: Use t-stack and t-statck-aarch64 for
>       aarch64*-*-linux.
>       * libgcc/config/aarch64/morestack-c.c: New file.
>       * libgcc/config/aarch64/morestack.S: Likewise.
>       * libgcc/config/aarch64/t-stack-aarch64: Likewise.
>       * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific
>       code.
> 
> gcc/ChangeLog:
> 
>       * common/config/aarch64/aarch64-common.c
>       (aarch64_supports_split_stack): New function.
>       (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>       * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove
>       macro.
>       * gcc/config/aarch64/aarch64-protos.h: Add
>       aarch64_expand_split_stack_prologue and
>       aarch64_split_stack_space_check.
>       * gcc/config/aarch64/aarch64.c (aarch64_expand_builtin_va_start): Use
>       internal argument pointer instead of virtual_incoming_args_rtx.
>       (morestack_ref): New symbol.
>       (aarch64_load_split_stack_value): New function.
>       (aarch64_expand_split_stack_prologue): Likewise.
>       (aarch64_internal_arg_pointer): Likewise.
>       (aarch64_file_end): Emit the split-stack note sections.
>       (aarch64_split_stack_space_check): Likewise.
>       (TARGET_ASM_FILE_END): New macro.
>       (TARGET_INTERNAL_ARG_POINTER): Likewise.
>       * gcc/config/aarch64/aarch64.h (aarch64_frame): Add
>       split_stack_arg_pointer to setup the argument pointer when using
>       split-stack.
>       * gcc/config/aarch64/aarch64.md
>       (UNSPECV_STACK_CHECK): New define.
>       (split_stack_prologue): New expand.
>       (split_stack_space_check): Likewise.
> ---
>  gcc/common/config/aarch64/aarch64-common.c |  28 +++-
>  gcc/config/aarch64/aarch64-linux.h         |   2 -
>  gcc/config/aarch64/aarch64-protos.h        |   2 +
>  gcc/config/aarch64/aarch64.c               | 182 ++++++++++++++++++++-
>  gcc/config/aarch64/aarch64.h               |   3 +
>  gcc/config/aarch64/aarch64.md              |  29 ++++
>  libgcc/config.host                         |   1 +
>  libgcc/config/aarch64/morestack-c.c        |  87 ++++++++++
>  libgcc/config/aarch64/morestack.S          | 254 
> +++++++++++++++++++++++++++++
>  libgcc/config/aarch64/t-stack-aarch64      |   3 +
>  libgcc/generic-morestack.c                 |   1 +
>  11 files changed, 588 insertions(+), 4 deletions(-)
>  create mode 100644 libgcc/config/aarch64/morestack-c.c
>  create mode 100644 libgcc/config/aarch64/morestack.S
>  create mode 100644 libgcc/config/aarch64/t-stack-aarch64
> 
> diff --git a/gcc/common/config/aarch64/aarch64-common.c 
> b/gcc/common/config/aarch64/aarch64-common.c
> index 71d3953..cf17e2f 100644
> --- a/gcc/common/config/aarch64/aarch64-common.c
> +++ b/gcc/common/config/aarch64/aarch64-common.c
> @@ -107,6 +107,33 @@ aarch64_handle_option (struct gcc_options *opts,
>      }
>  }
>  
> +/* -fsplit-stack uses a TCB field available on glibc-2.27.  GLIBC also
> +   exports symbol, __tcb_private_ss, to signal it has the field available
> +   on TCB bloc.  This aims to prevent binaries linked against newer
> +   GLIBC to run on non-supported ones.  */
> +
> +static bool
> +aarch64_supports_split_stack (bool report ATTRIBUTE_UNUSED,
> +                           struct gcc_options *opts ATTRIBUTE_UNUSED)
> +{
> +#ifndef TARGET_GLIBC_MAJOR
> +#define TARGET_GLIBC_MAJOR 0
> +#endif
> +#ifndef TARGET_GLIBC_MINOR
> +#define TARGET_GLIBC_MINOR 0
> +#endif
> +  /* Note: Can't test DEFAULT_ABI here, it isn't set until later.  */
> +  if (TARGET_GLIBC_MAJOR * 1000 + TARGET_GLIBC_MINOR >= 2026)
> +    return true;
> +
> +  if (report)
> +    error ("%<-fsplit-stack%> currently only supported on AArch64 GNU/Linux 
> with glibc-2.27 or later");
> +  return false;
> +}
> +
> +#undef TARGET_SUPPORTS_SPLIT_STACK
> +#define TARGET_SUPPORTS_SPLIT_STACK aarch64_supports_split_stack
> +
>  struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
>  
>  /* An ISA extension in the co-processor and main instruction set space.  */
> @@ -340,4 +367,3 @@ aarch64_rewrite_mcpu (int argc, const char **argv)
>  }
>  
>  #undef AARCH64_CPU_NAME_LENGTH
> -
> diff --git a/gcc/config/aarch64/aarch64-linux.h 
> b/gcc/config/aarch64/aarch64-linux.h
> index bf1327e..1189bfe 100644
> --- a/gcc/config/aarch64/aarch64-linux.h
> +++ b/gcc/config/aarch64/aarch64-linux.h
> @@ -81,8 +81,6 @@
>      }                                                \
>    while (0)
>  
> -#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
> -
>  /* Uninitialized common symbols in non-PIE executables, even with
>     strong definitions in dependent shared libraries, will resolve
>     to COPY relocated symbol in the executable.  See PR65780.  */
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index cda2895..20fe10e 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -450,6 +450,8 @@ void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode);
>  bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx);
>  void aarch64_split_sve_subreg_move (rtx, rtx, rtx);
>  void aarch64_expand_prologue (void);
> +void aarch64_expand_split_stack_prologue (void);
> +void aarch64_split_stack_space_check (rtx, rtx);
>  void aarch64_expand_vector_init (rtx, rtx);
>  void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
>                                  const_tree, unsigned);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 7c9c6e5..c653755 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -71,6 +71,7 @@
>  #include "selftest.h"
>  #include "selftest-rtl.h"
>  #include "rtx-vector-builder.h"
> +#include "except.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -12073,7 +12074,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
> nextarg ATTRIBUTE_UNUSED)
>    /* Emit code to initialize STACK, which points to the next varargs stack
>       argument.  CUM->AAPCS_STACK_SIZE gives the number of stack words used
>       by named arguments.  STACK is 8-byte aligned.  */
> -  t = make_tree (TREE_TYPE (stack), virtual_incoming_args_rtx);
> +  t = make_tree (TREE_TYPE (stack), crtl->args.internal_arg_pointer);
>    if (cum->aapcs_stack_size > 0)
>      t = fold_build_pointer_plus_hwi (t, cum->aapcs_stack_size * 
> UNITS_PER_WORD);
>    t = build2 (MODIFY_EXPR, TREE_TYPE (stack), stack, t);
> @@ -17351,6 +17352,179 @@ aarch64_select_early_remat_modes (sbitmap modes)
>      }
>  }
>  
> +/* -fsplit-stack support.  */
> +
> +/* A SYMBOL_REF for __morestack.  */
> +static GTY(()) rtx morestack_ref;
> +
> +/* Load split-stack area from thread pointer position.  The split-stack is
> +   allocate just before thread pointer.  */
> +
> +static rtx
> +aarch64_load_split_stack_value (bool use_hard_reg)
> +{
> +  /* Offset from thread pointer to split-stack area.  */
> +  const int psso = -8;
> +
> +  rtx ssvalue = use_hard_reg
> +             ? gen_rtx_REG (Pmode, R9_REGNUM) : gen_reg_rtx (Pmode);
> +  ssvalue = aarch64_load_tp (ssvalue);
> +  rtx mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));
> +  emit_move_insn (ssvalue, mem);
> +  return ssvalue;
> +}
> +
> +/* Emit -fsplit-stack prologue, which goes before the regular function
> +   prologue.  */
> +
> +void
> +aarch64_expand_split_stack_prologue (void)
> +{
> +  rtx ssvalue, reg10, reg11, reg12, cc, jump;
> +  HOST_WIDE_INT allocate;
> +  rtx_code_label *ok_label;
> +  rtx_insn *insn;
> +
> +  gcc_assert (flag_split_stack && reload_completed);
> +
> +  /* It limits total maximum stack allocation on 4G so its value can be
> +     materialized using two instructions at most (movn/movk).  It might be
> +     used by the linker to add some extra space for split calling non split
> +     stack functions.  */
> +  allocate = constant_lower_bound (cfun->machine->frame.frame_size);
> +  if (allocate > ((int64_t)1 << 32))
> +    {
> +      sorry ("Stack frame larger than 4G is not supported for 
> -fsplit-stack");
> +      return;
> +    }
> +
> +  if (morestack_ref == NULL_RTX)
> +    {
> +      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
> +      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
> +                                        | SYMBOL_FLAG_FUNCTION);
> +    }
> +
> +  ssvalue = aarch64_load_split_stack_value (true);
> +
> +  /* Always emit two insns to calculate the requested stack, so the linker
> +     can edit them when adjusting size for calling non-split-stack code.  */
> +  reg10 = gen_rtx_REG (Pmode, R10_REGNUM);
> +  emit_insn (gen_rtx_SET (reg10, GEN_INT (allocate & 0xffff)));
> +  emit_insn (gen_insv_immdi (reg10, GEN_INT (16),
> +                          GEN_INT ((allocate & 0xffff0000) >> 16)));
> +  emit_insn (gen_sub3_insn (reg10, stack_pointer_rtx, reg10));
> +
> +  ok_label = gen_label_rtx ();
> +
> +  /* If function uses stacked arguments save the old stack value so morestack
> +     can return it.  */
> +  reg11 = gen_rtx_REG (Pmode, R11_REGNUM);
> +  if (maybe_gt(crtl->args.size, 0)
> +      || maybe_gt(cfun->machine->frame.saved_varargs_size, 0))
> +    emit_move_insn (reg11, stack_pointer_rtx);
> +
> +  /* x12 holds the function entry x30 which will be restored by morestack.  
> */
> +  reg12 = gen_rtx_REG (Pmode, R12_REGNUM);
> +  emit_move_insn (reg12, gen_rtx_REG (Pmode, R30_REGNUM));
> +
> +  ok_label = gen_label_rtx ();
> +  cc = aarch64_gen_compare_reg (GEU, reg10, ssvalue);
> +  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
> +                            gen_rtx_fmt_ee (GEU, VOIDmode, cc, const0_rtx),
> +                            gen_rtx_LABEL_REF (VOIDmode, ok_label),
> +                            pc_rtx);
> +  insn = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
> +  JUMP_LABEL (insn) = ok_label;
> +  /* Mark the jump as very likely to be taken.  */
> +  add_reg_br_prob_note (insn, profile_probability::very_likely ());
> +
> +  insn = emit_call_insn (gen_call (gen_rtx_MEM (Pmode, morestack_ref),
> +                                const0_rtx, const0_rtx));
> +
> +  rtx call_fusage = NULL_RTX;
> +  use_reg (&call_fusage, reg10);
> +  use_reg (&call_fusage, reg11);
> +  use_reg (&call_fusage, reg12);
> +  add_function_usage_to (insn, call_fusage);
> +  /* Indicate that this function can't jump to non-local gotos.  */
> +  make_reg_eh_region_note_nothrow_nononlocal (insn);
> +
> +  emit_label (ok_label);
> +  LABEL_NUSES (ok_label)++;
> +}
> +
> +/* Implement TARGET_ASM_FILE_END.  */
> +
> +static void
> +aarch64_file_end (void)
> +{
> +  file_end_indicate_exec_stack ();
> +
> +  if (flag_split_stack)
> +    {
> +      file_end_indicate_split_stack ();
> +
> +      switch_to_section (data_section);
> +      fprintf (asm_out_file, "\t.align 3\n");
> +      fprintf (asm_out_file, "\t.quad __libc_tcb_private_ss\n");
> +    }
> +}
> +
> +/* Return the internal arg pointer used for function incoming arguments.  */
> +
> +static rtx
> +aarch64_internal_arg_pointer (void)
> +{
> +  if (flag_split_stack
> +     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
> +         == NULL))
> +    {
> +      if (cfun->machine->frame.split_stack_arg_pointer == NULL_RTX)
> +     {
> +       rtx pat;
> +
> +       cfun->machine->frame.split_stack_arg_pointer = gen_reg_rtx (Pmode);
> +       REG_POINTER (cfun->machine->frame.split_stack_arg_pointer) = 1;
> +
> +       /* Put the pseudo initialization right after the note at the
> +          beginning of the function.  */
> +       pat = gen_rtx_SET (cfun->machine->frame.split_stack_arg_pointer,
> +                          gen_rtx_REG (Pmode, R11_REGNUM));
> +       push_topmost_sequence ();
> +       emit_insn_after (pat, get_insns ());
> +       pop_topmost_sequence ();
> +     }
> +      return plus_constant (Pmode, 
> cfun->machine->frame.split_stack_arg_pointer,
> +                         FIRST_PARM_OFFSET (current_function_decl));
> +    }
> +  return virtual_incoming_args_rtx;
> +}
> +
> +/* Emit -fsplit-stack dynamic stack allocation space check.  */
> +
> +void
> +aarch64_split_stack_space_check (rtx size, rtx label)
> +{
> +  rtx ssvalue, cc, cmp, jump, temp;
> +  rtx requested = gen_reg_rtx (Pmode);
> +
> +  /* Load __private_ss from TCB.  */
> +  ssvalue = aarch64_load_split_stack_value (false);
> +
> +  temp = gen_reg_rtx (Pmode);
> +
> +  /* And compare it with frame pointer plus required stack.  */
> +  size = force_reg (Pmode, size);
> +  emit_move_insn (requested, gen_rtx_MINUS (Pmode, stack_pointer_rtx, size));
> +
> +  /* Jump to label call if current ss guard is not suffice.  */
> +  cc = aarch64_gen_compare_reg (GE, temp, ssvalue);
> +  cmp = gen_rtx_fmt_ee (GEU, VOIDmode, cc, const0_rtx);
> +  jump = emit_jump_insn (gen_condjump (cmp, cc, label));
> +  JUMP_LABEL (jump) = label;
> +}
> +
>  /* Target-specific selftests.  */
>  
>  #if CHECKING_P
> @@ -17423,6 +17597,9 @@ aarch64_run_selftests (void)
>  #undef TARGET_ASM_FILE_START
>  #define TARGET_ASM_FILE_START aarch64_start_file
>  
> +#undef TARGET_ASM_FILE_END
> +#define TARGET_ASM_FILE_END aarch64_file_end
> +
>  #undef TARGET_ASM_OUTPUT_MI_THUNK
>  #define TARGET_ASM_OUTPUT_MI_THUNK aarch64_output_mi_thunk
>  
> @@ -17513,6 +17690,9 @@ aarch64_run_selftests (void)
>  #undef TARGET_FUNCTION_VALUE_REGNO_P
>  #define TARGET_FUNCTION_VALUE_REGNO_P aarch64_function_value_regno_p
>  
> +#undef TARGET_INTERNAL_ARG_POINTER
> +#define TARGET_INTERNAL_ARG_POINTER aarch64_internal_arg_pointer
> +
>  #undef TARGET_GIMPLE_FOLD_BUILTIN
>  #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
>  
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index e3c52f6..20ef441 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -675,6 +675,9 @@ struct GTY (()) aarch64_frame
>    unsigned wb_candidate2;
>  
>    bool laid_out;
> +
> +  /* Alternative internal arg pointer for -fsplit-stack.  */
> +  rtx split_stack_arg_pointer;
>  };
>  
>  typedef struct GTY (()) machine_function
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 5a2a930..3104ed4 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -169,6 +169,7 @@
>      UNSPEC_CLASTB
>      UNSPEC_FADDA
>      UNSPEC_REV_SUBREG
> +    UNSPEC_STACK_CHECK
>  ])
>  
>  (define_c_enum "unspecv" [
> @@ -6010,6 +6011,34 @@
>                  (match_operand 1))
>             (clobber (reg:CC CC_REGNUM))])])
>  
> +;; Handle -fsplit-stack
> +(define_expand "split_stack_prologue"
> +  [(const_int 0)]
> +  ""
> +{
> +  aarch64_expand_split_stack_prologue ();
> +  DONE;
> +})
> +
> +;; If there are operand 0 bytes available on the stack, jump to
> +;; operand 1.
> +(define_expand "split_stack_space_check"
> +  [(set (match_dup 2)
> +        (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
> +   (set (match_dup 3)
> +        (minus (reg SP_REGNUM)
> +               (match_operand 0)))
> +   (set (match_dup 4) (compare:CC (match_dup 3) (match_dup 2)))
> +   (set (pc) (if_then_else
> +              (geu (match_dup 4) (const_int 0))
> +              (label_ref (match_operand 1))
> +              (pc)))]
> +  ""
> +{
> +  aarch64_split_stack_space_check (operands[0], operands[1]);
> +  DONE;
> +})
> +
>  ;; AdvSIMD Stuff
>  (include "aarch64-simd.md")
>  
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 96d55a4..d6a2d15 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -355,6 +355,7 @@ aarch64*-*-linux*)
>       md_unwind_header=aarch64/linux-unwind.h
>       tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
>       tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> +     tmake_file="${tmake_file} t-stack aarch64/t-stack-aarch64"
>       ;;
>  alpha*-*-linux*)
>       tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee t-crtfm 
> alpha/t-linux"
> diff --git a/libgcc/config/aarch64/morestack-c.c 
> b/libgcc/config/aarch64/morestack-c.c
> new file mode 100644
> index 0000000..8de531f
> --- /dev/null
> +++ b/libgcc/config/aarch64/morestack-c.c
> @@ -0,0 +1,87 @@
> +/* AArch64 support for -fsplit-stack.
> + * Copyright (C) 2018 Free Software Foundation, Inc.
> + *
> + * This file is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 3, or (at your option) any
> + * later version.
> + *
> + * This file is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * Under Section 7 of GPL version 3, you are granted additional
> + * permissions described in the GCC Runtime Library Exception, version
> + * 3.1, as published by the Free Software Foundation.
> + *
> + * You should have received a copy of the GNU General Public License and
> + * a copy of the GCC Runtime Library Exception along with this program;
> + * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> + * <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef inhibit_libc
> +
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <stddef.h>
> +#include "generic-morestack.h"
> +
> +#define INITIAL_STACK_SIZE  0x4000
> +#define BACKOFF             0x1000
> +
> +void __generic_morestack_set_initial_sp (void *sp, size_t len);
> +void *__morestack_get_guard (void);
> +void __morestack_set_guard (void *);
> +void *__morestack_make_guard (void *stack, size_t size);
> +void __morestack_load_mmap (void);
> +
> +/* split-stack area position from thread pointer.  */
> +static inline void *
> +ss_pointer (void)
> +{
> +#define SS_OFFSET    (-8)
> +  return (void*) ((uintptr_t) __builtin_thread_pointer() + SS_OFFSET);
> +}
> +
> +/* Initialize the stack guard when the program starts or when a new
> +   thread.  This is called from a constructor using ctors section.  */
> +void
> +__stack_split_initialize (void)
> +{
> +  register uintptr_t* sp __asm__ ("sp");
> +  uintptr_t *ss = ss_pointer ();
> +  *ss = (uintptr_t)sp - INITIAL_STACK_SIZE;
> +  __generic_morestack_set_initial_sp (sp, INITIAL_STACK_SIZE);
> +}
> +
> +/* Return current __private_ss.  */
> +void *
> +__morestack_get_guard (void)
> +{
> +  void **ss = ss_pointer ();
> +  return *ss;
> +}
> +
> +/* Set __private_ss to ptr.  */
> +void
> +__morestack_set_guard (void *ptr)
> +{
> +  void **ss = ss_pointer ();
> +  *ss = ptr;
> +}
> +
> +/* Return the stack guard value for given stack.  */
> +void *
> +__morestack_make_guard (void *stack, size_t size)
> +{
> +  return (void*)((uintptr_t) stack - size + BACKOFF);
> +}
> +
> +/* Make __stack_split_initialize a high priority constructor.  */
> +static void (*const ctors [])
> +  __attribute__ ((used, section (".ctors.65535"), aligned (sizeof (void *))))
> +  = { __stack_split_initialize, __morestack_load_mmap };
> +
> +#endif /* !defined (inhibit_libc) */
> diff --git a/libgcc/config/aarch64/morestack.S 
> b/libgcc/config/aarch64/morestack.S
> new file mode 100644
> index 0000000..59a6391
> --- /dev/null
> +++ b/libgcc/config/aarch64/morestack.S
> @@ -0,0 +1,254 @@
> +# AArch64 support for -fsplit-stack.
> +# Copyright (C) 2018 Free Software Foundation, Inc.
> +
> +# This file is part of GCC.
> +
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +# for more details.
> +
> +# Under Section 7 of GPL version 3, you are granted additional
> +# permissions described in the GCC Runtime Library Exception, version
> +# 3.1, as published by the Free Software Foundation.
> +
> +# You should have received a copy of the GNU General Public License and
> +# a copy of the GCC Runtime Library Exception along with this program;
> +# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +/* Define an entry point visible from C.  */
> +#define ENTRY(name)                                          \
> +  .globl name;                                                       \
> +  .type name,%function;                                              \
> +  .align 4;                                                  \
> +  name##:
> +
> +#define END(name)                                            \
> +  .size name,.-name
> +
> +/* __morestack frame size.  */
> +#define MORESTACK_FRAMESIZE  112
> +/* Offset from __morestack frame where the new stack size is saved and
> +   passed to __generic_morestack.  */
> +#define NEWSTACK_SAVE                96
> +
> +# Excess space needed to call ld.so resolver for lazy plt resolution.
> +# Go uses sigaltstack so this doesn't need to also cover signal frame size.
> +#define BACKOFF                      0x1000
> +# Large excess allocated when calling non-split-stack code.
> +#define NON_SPLIT_STACK              0x100000
> +
> +/* split-stack area position from thread pointer.  */
> +#define SPLITSTACK_PTR_TP    -8
> +
> +     .text
> +ENTRY(__morestack_non_split)
> +     .cfi_startproc
> +# We use a cleanup to restore the TCB split stack field if an exception is
> +# through this code.
> +     sub     x10, x10, NON_SPLIT_STACK
> +     .cfi_endproc
> +END(__morestack_non_split)
> +# Fall through into __morestack
> +
> +# This function is called with non-standard calling convention: on entry
> +# x10 is the requested stack pointer, x11 is previous stack pointer (if
> +# functions has stacked arguments which needs to be restored), and x12 is
> +# the caller link register on function entry (which will be restored by
> +# morestack when returning to caller).  The split-stack prologue is in
> +# the form:
> +#
> +# function:
> +#    mrs    x9, tpidr_el0
> +#    ldur   x9, [x9, #-8]
> +#    mov    x10, <required stack allocation>
> +#    movk   x10, #0x0, lsl #16
> +#    sub    x10, sp, x10
> +#    mov    x11, sp          # if function has stacked arguments
> +#    mov    x12, x30
> +#    cmp    x9, x10
> +#    bcc    .LX
> +# main_fn_entry:
> +#    [function body]
> +# LX:
> +#    bl      __morestack
> +#    b       main_fn_entry
> +#
> +# The N bit is also restored to indicate that the function is called
> +# (so the prologue addition can set up the argument pointer correctly).
> +
> +ENTRY(__morestack)
> +.LFB1:
> +     .cfi_startproc
> +
> +#ifdef __PIC__
> +     .cfi_personality 0x9b,DW.ref.__gcc_personality_v0
> +     .cfi_lsda 0x1b,.LLSDA1
> +#else
> +     .cfi_personality 0x3,__gcc_personality_v0
> +     .cfi_lsda 0x3,.LLSDA1
> +#endif
> +     # Calculate requested stack size.
> +     sub     x10, sp, x10
> +
> +     # Save parameters
> +     stp     x29, x12, [sp, -MORESTACK_FRAMESIZE]!
> +     .cfi_def_cfa_offset MORESTACK_FRAMESIZE
> +     .cfi_offset 29, -MORESTACK_FRAMESIZE
> +     .cfi_offset 30, -MORESTACK_FRAMESIZE+8
> +     add     x29, sp, 0
> +     .cfi_def_cfa_register 29
> +     # Adjust the requested stack size for the frame pointer save.
> +     stp     x0, x1, [x29, 16]
> +     stp     x2, x3, [x29, 32]
> +     add     x10, x10, BACKOFF
> +     stp     x4, x5, [x29, 48]
> +     stp     x6, x7, [x29, 64]
> +     stp     x8, x30, [x29, 80]
> +     str     x10, [x29, 96]
> +
> +     # void __morestack_block_signals (void)
> +     bl      __morestack_block_signals
> +
> +     # void *__generic_morestack (size_t *pframe_size,
> +     #                            void *old_stack,
> +     #                            size_t param_size)
> +     # pframe_size: is the size of the required stack frame (the function
> +     #              amount of space remaining on the allocated stack).
> +     # old_stack: points at the parameters the old stack
> +     # param_size: size in bytes of parameters to copy to the new stack.
> +     add     x0, x29, NEWSTACK_SAVE
> +     add     x1, x29, MORESTACK_FRAMESIZE
> +     mov     x2, 0
> +     bl      __generic_morestack
> +
> +     # Start using new stack
> +     mov     sp, x0
> +
> +     # Set __private_ss stack guard for the new stack.
> +     ldr     x9, [x29, NEWSTACK_SAVE]
> +     add     x0, x0, BACKOFF
> +     sub     x0, x0, x9
> +.LEHB0:
> +     mrs     x1, tpidr_el0
> +     str     x0, [x1, SPLITSTACK_PTR_TP]
> +
> +     # void __morestack_unblock_signals (void)
> +     bl      __morestack_unblock_signals
> +
> +     # Set up for a call to the target function.
> +     ldp     x0, x1, [x29, 16]
> +     ldp     x2, x3, [x29, 32]
> +     ldp     x4, x5, [x29, 48]
> +     ldp     x6, x7, [x29, 64]
> +     ldp     x8, x12, [x29, 80]
> +     add     x11, x29, MORESTACK_FRAMESIZE
> +     ldr     x30, [x29, 8]
> +     # Indicate __morestack was called.
> +     cmp     x12, 0
> +     blr     x12
> +
> +     stp     x0, x1, [x29, 16]
> +     stp     x2, x3, [x29, 32]
> +     stp     x4, x5, [x29, 48]
> +     stp     x6, x7, [x29, 64]
> +
> +     bl      __morestack_block_signals
> +
> +     # void *__generic_releasestack (size_t *pavailable)
> +     add     x0, x29, NEWSTACK_SAVE
> +     bl      __generic_releasestack
> +
> +     # Reset __private_ss stack guard to value for old stack
> +     ldr     x9, [x29, NEWSTACK_SAVE]
> +     add     x0, x0, BACKOFF
> +     sub     x0, x0, x9
> +
> +     # Update TCB split stack field
> +.LEHE0:
> +     mrs     x1, tpidr_el0
> +     str     x0, [x1, SPLITSTACK_PTR_TP]
> +
> +     bl __morestack_unblock_signals
> +
> +     # Use old stack again.
> +     add     sp, x29, MORESTACK_FRAMESIZE
> +
> +     ldp     x0, x1, [x29, 16]
> +     ldp     x2, x3, [x29, 32]
> +     ldp     x4, x5, [x29, 48]
> +     ldp     x6, x7, [x29, 64]
> +     ldp     x29, x30, [x29]
> +
> +     .cfi_remember_state
> +     .cfi_restore 30
> +     .cfi_restore 29
> +     .cfi_def_cfa 31, 0
> +
> +     ret
> +
> +# This is the cleanup code called by the stack unwinder when
> +# unwinding through code between .LEHB0 and .LEHE0 above.
> +cleanup:
> +     .cfi_restore_state
> +     # Reuse the new stack allocation to save/restore the
> +     # exception header
> +     str     x0, [x29, NEWSTACK_SAVE]
> +     # size_t __generic_findstack (void *stack)
> +     add     x0, x29, MORESTACK_FRAMESIZE
> +     bl      __generic_findstack
> +     sub     x0, x29, x0
> +     add     x0, x0, BACKOFF
> +     # Restore split-stack guard value
> +     mrs     x1, tpidr_el0
> +     str     x0, [x1, SPLITSTACK_PTR_TP]
> +     ldr     x0, [x29, NEWSTACK_SAVE]
> +     b       _Unwind_Resume
> +        .cfi_endproc
> +END(__morestack)
> +
> +     .section .gcc_except_table,"a",@progbits
> +     .align 4
> +.LLSDA1:
> +     # @LPStart format (omit)
> +        .byte   0xff
> +     # @TType format (omit)
> +        .byte   0xff
> +     # Call-site format (uleb128)
> +        .byte   0x1
> +     # Call-site table length
> +        .uleb128 .LLSDACSE1-.LLSDACSB1
> +.LLSDACSB1:
> +     # region 0 start
> +        .uleb128 .LEHB0-.LFB1
> +     # length
> +        .uleb128 .LEHE0-.LEHB0
> +     # landing pad
> +        .uleb128 cleanup-.LFB1
> +     # no action (ie a cleanup)
> +        .uleb128 0
> +.LLSDACSE1:
> +
> +
> +     .global __gcc_personality_v0
> +#ifdef __PIC__
> +     # Build a position independent reference to the personality function.
> +     .hidden DW.ref.__gcc_personality_v0
> +     .weak   DW.ref.__gcc_personality_v0
> +     .section 
> .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
> +     .type   DW.ref.__gcc_personality_v0, @object
> +     .align 3
> +DW.ref.__gcc_personality_v0:
> +     .size   DW.ref.__gcc_personality_v0, 8
> +     .quad   __gcc_personality_v0
> +#endif
> +
> +     .section .note.GNU-stack,"",@progbits
> +     .section .note.GNU-split-stack,"",@progbits
> +     .section .note.GNU-no-split-stack,"",@progbits
> diff --git a/libgcc/config/aarch64/t-stack-aarch64 
> b/libgcc/config/aarch64/t-stack-aarch64
> new file mode 100644
> index 0000000..4babb4e
> --- /dev/null
> +++ b/libgcc/config/aarch64/t-stack-aarch64
> @@ -0,0 +1,3 @@
> +# Makefile fragment to support -fsplit-stack for aarch64.
> +LIB2ADD_ST += $(srcdir)/config/aarch64/morestack.S \
> +           $(srcdir)/config/aarch64/morestack-c.c
> diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
> index 80bfd7f..574f58d 100644
> --- a/libgcc/generic-morestack.c
> +++ b/libgcc/generic-morestack.c
> @@ -943,6 +943,7 @@ __splitstack_find (void *segment_arg, void *sp, size_t 
> *len,
>        nsp -= 2 * 160;
>  #elif defined __s390__
>        nsp -= 2 * 96;
> +#elif defined __aarch64__
>  #else
>  #error "unrecognized target"
>  #endif
> 

Reply via email to