Enable Pointer Authentication Codes (PAC) and Branch Target Identification (BTI) support for ARM 64 targets.
PAC works by signing the LR with either an A key or B key and verifying the return address. There are quite a few instructions capable of doing this, however, the Linux ARM ABI is to use hint compatible instructions that can be safely NOP'd on older hardware and can be assembled and linked with older binutils. This limits the instruction set to paciasp, pacibsp, autiasp and autibsp. Instructions prefixed with pac are for signing and instructions prefixed with aut are for signing. Both instructions are then followed with an a or b to indicate which signing key they are using. The keys can be controlled using -mbranch-protection=pac-ret for the A key and -mbranch-protection=pac-ret+b-key for the B key. BTI works by marking all call and jump positions with bti c and bti j instructions. If execution control transfers to an instruction other than a BTI instruction, the execution is killed via SIGILL. Note that to remove one instruction, the aforementioned pac instructions will also work as a BTI landing pad for bti c usages. For BTI to work, all object files linked for a unit of execution, whether an executable or a library must have the GNU Notes section of the ELF file marked to indicate BTI support. This is so loader/linkers can apply the proper permission bits (PROT_BRI) on the memory region. PAC can also be annotated in the GNU ELF notes section, but it's not required for enablement, as interleaved PAC and non-pac code works as expected since it's the callee that performs all the checking. The linker follows the same rules as BTI for discarding the PAC flag from the GNU Notes section. Testing was done under the following CFLAGS and CXXFLAGS for all combinations: 1. -mbranch-protection=none 2. -mbranch-protection=standard 3. -mbranch-protection=pac-ret 4. -mbranch-protection=pac-ret+b-key 5. -mbranch-protection=bti Additional Notes: MPN was handled differently then the standard approach of all PROLOGUES getting a SIGN_LR macro. This is becuase MPN does not make use of saving the x30, aka the link regeister (LR), to the stack in almost all instances. However, some functions do, and they were explicitly handled. This not only avoids the cost of the operations to sign and verify the LR but also handles instances where branches are taken to labels where indirect branches are used over branch and link to optimize the assembly. Also, within the configure.ac are a myriad of options for different architectures, chipsets, ABIs, etc. To compound that, additional architecture specifiec features could be enabled with in CFLAGS that needs to be respected in-order to get a correct output. For instance in aarch64, the PAC and BTI instructions need to be output in the generated assembly as well as the GNU notes section added to the ELF output to get those security features. Hacking it into the configure options seems baroque, especially considering that distro packaging will often just set a set of CFLAGS to be respected and move on and that's most users would expect. Taking this all into consideration, allowing for a per architecture script that can be executed to generate additional m4 allows for internal definitions, like in the PAC case, to be exposed, or any multitude of options if other archs need somethng like this. This introduces the variable gen_path_m4 that arch's can set to the script of their choosing to generate whatever m4 they need that is prepended to the m4 generation command after the defines. Signed-off-by: Bill Roberts <bill.robe...@arm.com> --- configure.ac | 12 +++++ mpn/Makeasm.am | 3 +- mpn/arm64/aors_n.asm | 4 ++ mpn/arm64/aorsmul_1.asm | 3 ++ mpn/arm64/aorsorrlsh1_n.asm | 2 + mpn/arm64/aorsorrlsh2_n.asm | 2 + mpn/arm64/aorsorrlshC_n.asm | 1 + mpn/arm64/arm64-defs.m4 | 67 ++++++++++++++++++++++++++++ mpn/arm64/bdiv_dbm1c.asm | 2 + mpn/arm64/bdiv_q_1.asm | 3 ++ mpn/arm64/cnd_aors_n.asm | 3 ++ mpn/arm64/com.asm | 2 + mpn/arm64/copyd.asm | 2 + mpn/arm64/copyi.asm | 2 + mpn/arm64/divrem_1.asm | 9 ++++ mpn/arm64/gcd_11.asm | 2 + mpn/arm64/gcd_22.asm | 2 + mpn/arm64/gen-extra-m4.sh | 81 ++++++++++++++++++++++++++++++++++ mpn/arm64/hamdist.asm | 7 ++- mpn/arm64/invert_limb.asm | 2 + mpn/arm64/logops_n.asm | 3 ++ mpn/arm64/lshift.asm | 2 + mpn/arm64/lshiftc.asm | 2 + mpn/arm64/mod_34lsub1.asm | 2 + mpn/arm64/mul_1.asm | 3 ++ mpn/arm64/popcount.asm | 8 +++- mpn/arm64/rsh1aors_n.asm | 3 ++ mpn/arm64/rshift.asm | 2 + mpn/arm64/sec_tabselect.asm | 2 + mpn/arm64/sqr_diag_addlsh1.asm | 2 + mpn/m4-ccas | 23 ++++++++-- 31 files changed, 255 insertions(+), 8 deletions(-) create mode 100755 mpn/arm64/gen-extra-m4.sh diff --git a/configure.ac b/configure.ac index c3a4a9bf8..83a73f3a0 100644 --- a/configure.ac +++ b/configure.ac @@ -473,6 +473,11 @@ cc_64_cflags="-O" SPEED_CYCLECOUNTER_OBJ= cyclecounter_size=2 +# architectures can set this to add defines dynamically to m4 generation. +# For example, in arm64 it is used to determine if PAC and BTI are enabled +# and enable generation of those instructions in m4 asm. +gen_path_m4= + AC_SUBST(HAVE_HOST_CPU_FAMILY_power, 0) AC_SUBST(HAVE_HOST_CPU_FAMILY_powerpc,0) @@ -781,6 +786,7 @@ case $host in gcc_cflags_arch="-march=armv8-a" gcc_cflags_neon="-mfpu=neon" gcc_cflags_tune="" + gen_path_m4="arm64/gen-extra-m4.sh" ;; [applem[1-9]*]) abilist="64" @@ -4051,6 +4057,12 @@ fi AC_PROG_YACC AM_PROG_LEX +# This may appear odd, however prefixing with m4 is +# reserved in m4/autoconf but not in automake and +# beyond. The prefixed version matches things like +# gcc_c_flags. +AC_SUBST([M4_GEN_PATH], [$gen_path_m4]) + # Create config.m4. GMP_FINISH diff --git a/mpn/Makeasm.am b/mpn/Makeasm.am index 5d7306c22..bfdc632fe 100644 --- a/mpn/Makeasm.am +++ b/mpn/Makeasm.am @@ -115,4 +115,5 @@ RM_TMP = rm -f $(CCAS) $(COMPILE_FLAGS) tmp-$*.s -o $@ $(RM_TMP) tmp-$*.s .asm.lo: - $(LIBTOOL) --mode=compile --tag=CC $(top_srcdir)/mpn/m4-ccas --m4="$(M4)" $(CCAS) $(COMPILE_FLAGS) `test -f '$<' || echo '$(srcdir)/'`$< + $(LIBTOOL) --mode=compile --tag=CC $(top_srcdir)/mpn/m4-ccas --m4-gen-path=$(top_srcdir)/mpn/$(M4_GEN_PATH) --m4="$(M4)" \ + $(CCAS) $(COMPILE_FLAGS) `test -f '$<' || echo '$(srcdir)/'`$< diff --git a/mpn/arm64/aors_n.asm b/mpn/arm64/aors_n.asm index b4a6da6ff..a5b542d4d 100644 --- a/mpn/arm64/aors_n.asm +++ b/mpn/arm64/aors_n.asm @@ -60,13 +60,16 @@ ifdef(`OPERATION_sub_n', ` define(`func_nc', mpn_sub_nc)') MULFUNC_PROLOGUE(mpn_add_n mpn_add_nc mpn_sub_n mpn_sub_nc) + BTI_C ASM_START() PROLOGUE(func_nc) + BTI_C SETCY( x4) b L(ent) EPILOGUE() PROLOGUE(func_n) + BTI_C CLRCY L(ent): lsr x17, n, #2 tbz n, #0, L(bx0) @@ -123,3 +126,4 @@ L(end): ADDSUBC x12, x6, x10 L(ret): RETVAL ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/aorsmul_1.asm b/mpn/arm64/aorsmul_1.asm index 81ec1dabb..05091330d 100644 --- a/mpn/arm64/aorsmul_1.asm +++ b/mpn/arm64/aorsmul_1.asm @@ -68,8 +68,10 @@ ifdef(`OPERATION_submul_1', ` define(`func', mpn_submul_1)') MULFUNC_PROLOGUE(mpn_addmul_1 mpn_submul_1) + BTI_C PROLOGUE(func) + BTI_C adds x15, xzr, xzr tbz n, #0, L(1) @@ -143,3 +145,4 @@ L(mid): sub n, n, #1 csinc x0, x15, x15, COND ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/aorsorrlsh1_n.asm b/mpn/arm64/aorsorrlsh1_n.asm index c617a67a9..2de3ff992 100644 --- a/mpn/arm64/aorsorrlsh1_n.asm +++ b/mpn/arm64/aorsorrlsh1_n.asm @@ -39,5 +39,7 @@ ifdef(`OPERATION_sublsh1_n',`define(`DO_sub')') ifdef(`OPERATION_rsblsh1_n',`define(`DO_rsb')') MULFUNC_PROLOGUE(mpn_addlsh1_n mpn_sublsh1_n mpn_rsblsh1_n) + BTI_C include_mpn(`arm64/aorsorrlshC_n.asm') +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/aorsorrlsh2_n.asm b/mpn/arm64/aorsorrlsh2_n.asm index 852d11720..2161ae2a9 100644 --- a/mpn/arm64/aorsorrlsh2_n.asm +++ b/mpn/arm64/aorsorrlsh2_n.asm @@ -39,5 +39,7 @@ ifdef(`OPERATION_sublsh2_n',`define(`DO_sub')') ifdef(`OPERATION_rsblsh2_n',`define(`DO_rsb')') MULFUNC_PROLOGUE(mpn_addlsh2_n mpn_sublsh2_n mpn_rsblsh2_n) + BTI_C include_mpn(`arm64/aorsorrlshC_n.asm') +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/aorsorrlshC_n.asm b/mpn/arm64/aorsorrlshC_n.asm index 1718b7757..97df8c6f0 100644 --- a/mpn/arm64/aorsorrlshC_n.asm +++ b/mpn/arm64/aorsorrlshC_n.asm @@ -65,6 +65,7 @@ ifdef(`DO_rsb', ` ASM_START() PROLOGUE(func_n) + BTI_C lsr x6, n, #2 tbz n, #0, L(bx0) diff --git a/mpn/arm64/arm64-defs.m4 b/mpn/arm64/arm64-defs.m4 index 46149f7bf..d0ad4b63c 100644 --- a/mpn/arm64/arm64-defs.m4 +++ b/mpn/arm64/arm64-defs.m4 @@ -36,6 +36,73 @@ dnl don't want to disable macro expansions in or after them. changecom +dnl use the hint instructions so they NOP on older machines. +dnl Add comments so the assembly is notated with the instruction + + +define(`BTI_C', `hint #34 /* bti c */') +define(`PACIASP', `hint #25 /* paciasp */') +define(`AUTIASP', `hint #29 /* autiasp */') +define(`PACIBSP', `hint #27 /* pacibsp */') +define(`AUTIBSP', `hint #31 /* autibsp */') + +dnl if BTI is enabled we want the SIGN_LR to be a valid +dnl landing pad, we don't need VERIFY_LR and we need to +dnl indicate the valid BTI support for gnu notes. + + +ifelse(ARM64_FEATURE_BTI_DEFAULT, `1', + `define(`SIGN_LR', `BTI_C') + define(`GNU_PROPERTY_AARCH64_BTI', `1') + define(`PAC_OR_BTI')', + define(`GNU_PROPERTY_AARCH64_BTI', `0')' +') + +dnl define instructions for PAC, which can use the A +dnl or the B key. PAC instructions are also valid BTI +dnl landing pads, so we re-define SIGN_LR if BTI is +dnl enabled. + + +ifelse(ARM64_FEATURE_PAC_DEFAULT, `1', + `define(`SIGN_LR', `PACIASP') + define(`VERIFY_LR', `AUTIASP') + define(`GNU_PROPERTY_AARCH64_POINTER_AUTH', `2') + define(`PAC_OR_BTI')', + ARM64_FEATURE_PAC_DEFAULT, `2', + `define(`SIGN_LR', `PACIBSP') + define(`VERIFY_LR', `AUTIBSP') + define(`GNU_PROPERTY_AARCH64_POINTER_AUTH', `2') + define(`PAC_OR_BTI')', + `ifdef(`SIGN_LR', , `define(`SIGN_LR', `')') + define(`VERIFY_LR', `') + define(`GNU_PROPERTY_AARCH64_POINTER_AUTH', `0')' +') + +dnl ADD_GNU_NOTES_IF_NEEDED +dnl +dnl Conditionally add into ELF assembly files the GNU notes indicating if +dnl BTI or PAC is support. BTI is required by the linkers and loaders, however +dnl PAC is a nice to have for auditing. Use readelf -n to display. + + +define(`ADD_GNU_NOTES_IF_NEEDED', ` + ifdef(`ARM64_ELF', ` + ifdef(`PAC_OR_BTI', ` + .pushsection .note.gnu.property, "a"; + .balign 8; + .long 4; + .long 0x10; + .long 0x5; + .asciz "GNU"; + .long 0xc0000000; /* GNU_PROPERTY_AARCH64_FEATURE_1_AND */ + .long 4; + .long eval(indir(`GNU_PROPERTY_AARCH64_POINTER_AUTH') + indir(`GNU_PROPERTY_AARCH64_BTI')); + .long 0; + .popsection; + ') + ') +') dnl LEA_HI(reg,gmp_symbol), LEA_LO(reg,gmp_symbol) dnl diff --git a/mpn/arm64/bdiv_dbm1c.asm b/mpn/arm64/bdiv_dbm1c.asm index 78984b426..9f15f8e59 100644 --- a/mpn/arm64/bdiv_dbm1c.asm +++ b/mpn/arm64/bdiv_dbm1c.asm @@ -45,6 +45,7 @@ ASM_START() TEXT ALIGN(16) PROLOGUE(mpn_bdiv_dbm1c) + BTI_C ldr x5, [up], #8 ands x6, n, #3 b.eq L(fi0) @@ -109,3 +110,4 @@ L(wd1): subs x4, x4, x12 sbc x0, x4, x13 ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/bdiv_q_1.asm b/mpn/arm64/bdiv_q_1.asm index 7fffc9369..401227a83 100644 --- a/mpn/arm64/bdiv_q_1.asm +++ b/mpn/arm64/bdiv_q_1.asm @@ -56,6 +56,7 @@ define(`tnc', `x8') ASM_START() PROLOGUE(mpn_bdiv_q_1) + BTI_C rbit x6, d clz cnt, x6 @@ -79,6 +80,7 @@ PROLOGUE(mpn_bdiv_q_1) EPILOGUE() PROLOGUE(mpn_pi1_bdiv_q_1) + BTI_C sub n, n, #1 subs x6, x6, x6 C clear r6 and C flag ldr x9, [up],#8 @@ -120,3 +122,4 @@ L(tpn): ldr x9, [up],#8 L(en1): ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/cnd_aors_n.asm b/mpn/arm64/cnd_aors_n.asm index 397aa5100..916708885 100644 --- a/mpn/arm64/cnd_aors_n.asm +++ b/mpn/arm64/cnd_aors_n.asm @@ -57,9 +57,11 @@ ifdef(`OPERATION_cnd_sub_n', ` define(`func', mpn_cnd_sub_n)') MULFUNC_PROLOGUE(mpn_cnd_add_n mpn_cnd_sub_n) + BTI_C ASM_START() PROLOGUE(func) + BTI_C cmp cnd, #1 sbc cnd, cnd, cnd @@ -127,3 +129,4 @@ L(end): bic x6, x12, cnd L(rt): RETVAL ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/com.asm b/mpn/arm64/com.asm index d59494380..82b6787bf 100644 --- a/mpn/arm64/com.asm +++ b/mpn/arm64/com.asm @@ -47,6 +47,7 @@ define(`n', `x2') ASM_START() PROLOGUE(mpn_com) + BTI_C cmp n, #3 b.le L(bc) @@ -90,3 +91,4 @@ L(tl1): tbz n, #0, L(tl2) str x4, [rp] L(tl2): ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/copyd.asm b/mpn/arm64/copyd.asm index d542970b7..b221d23a8 100644 --- a/mpn/arm64/copyd.asm +++ b/mpn/arm64/copyd.asm @@ -47,6 +47,7 @@ define(`n', `x2') ASM_START() PROLOGUE(mpn_copyd) + BTI_C add rp, rp, n, lsl #3 add up, up, n, lsl #3 @@ -83,3 +84,4 @@ L(tl1): tbz n, #0, L(tl2) str x4, [rp,#-8] L(tl2): ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/copyi.asm b/mpn/arm64/copyi.asm index 0de40c5d7..360266c67 100644 --- a/mpn/arm64/copyi.asm +++ b/mpn/arm64/copyi.asm @@ -47,6 +47,7 @@ define(`n', `x2') ASM_START() PROLOGUE(mpn_copyi) + BTI_C cmp n, #3 b.le L(bc) @@ -80,3 +81,4 @@ L(tl1): tbz n, #0, L(tl2) str x4, [rp] L(tl2): ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/divrem_1.asm b/mpn/arm64/divrem_1.asm index 9d5bb5959..2bb8850d9 100644 --- a/mpn/arm64/divrem_1.asm +++ b/mpn/arm64/divrem_1.asm @@ -66,6 +66,8 @@ dnl mp_limb_t d_unnorm, mp_limb_t dinv, int cnt) ASM_START() PROLOGUE(mpn_preinv_divrem_1) + BTI_C + SIGN_LR cbz n_arg, L(fz) stp x29, x30, [sp, #-80]! mov x29, sp @@ -86,6 +88,8 @@ PROLOGUE(mpn_preinv_divrem_1) EPILOGUE() PROLOGUE(mpn_divrem_1) + BTI_C + SIGN_LR cbz n_arg, L(fz) stp x29, x30, [sp, #-80]! mov x29, sp @@ -154,6 +158,7 @@ L(uend):add x2, x11, #1 ldp x21, x22, [sp, #32] ldp x23, x24, [sp, #48] ldp x29, x30, [sp], #80 + VERIFY_LR ret L(ufx): add x2, x2, #1 @@ -194,6 +199,7 @@ L(nend):cbnz fn, L(frac) ldp x21, x22, [sp, #32] ldp x23, x24, [sp, #48] ldp x29, x30, [sp], #80 + VERIFY_LR ret L(nfx): add x2, x2, #1 @@ -219,6 +225,7 @@ L(ftop):add x2, x11, #1 ldp x21, x22, [sp, #32] ldp x23, x24, [sp, #48] ldp x29, x30, [sp], #80 + VERIFY_LR ret C Block zero. We need this for the degenerated case of n = 0, fn != 0. @@ -227,5 +234,7 @@ L(ztop):str xzr, [qp_arg], #8 sub fn_arg, fn_arg, #1 cbnz fn_arg, L(ztop) L(zend):mov x0, #0 + VERIFY_LR ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/gcd_11.asm b/mpn/arm64/gcd_11.asm index d8cc3e2cf..5e18fa21b 100644 --- a/mpn/arm64/gcd_11.asm +++ b/mpn/arm64/gcd_11.asm @@ -54,6 +54,7 @@ ASM_START() TEXT ALIGN(16) PROLOGUE(mpn_gcd_11) + BTI_C subs x3, u0, v0 C 0 b.eq L(end) C @@ -68,3 +69,4 @@ L(top): rbit x12, x3 C 1,5 L(end): ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/gcd_22.asm b/mpn/arm64/gcd_22.asm index 5367fea02..4a0b902b7 100644 --- a/mpn/arm64/gcd_22.asm +++ b/mpn/arm64/gcd_22.asm @@ -56,6 +56,7 @@ define(`tnc', `x8') ASM_START() PROLOGUE(mpn_gcd_22) + BTI_C ALIGN(16) L(top): subs t0, u0, v0 C 0 6 @@ -110,3 +111,4 @@ L(end): mov x0, v0 mov x1, v1 ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/gen-extra-m4.sh b/mpn/arm64/gen-extra-m4.sh new file mode 100755 index 000000000..09c5b8975 --- /dev/null +++ b/mpn/arm64/gen-extra-m4.sh @@ -0,0 +1,81 @@ +#!/bin/sh +# +# A script for dynamically generating m4 definitions for aarch64 based on compilation flags. +# +# Copyright 2024 ARM Ltd. +# +# This file is part of the GNU MP Library. +# +# The GNU MP Library is free software; you can redistribute it and/or modify +# it under the terms of either: +# +# * the GNU Lesser General Public License as published by the Free +# Software Foundation; either version 3 of the License, or (at your +# option) any later version. +# +# or +# +# * the GNU General Public License as published by the Free Software +# Foundation; either version 2 of the License, or (at your option) any +# later version. +# +# or both in parallel, as here. +# +# The GNU MP Library is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +# or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received copies of the GNU General Public License and the +# GNU Lesser General Public License along with the GNU MP Library. If not, +# see https://www.gnu.org/licenses/. + +# Usage: ./gen-extra-m4.sh "$CC" +# Returns: valid M4 to stdout. + +if test "$#" -ne 1; then + echo "Expected 1 argument, the CC. Got: $#" + exit 1 +fi + +CC=$1 + +ARM64_FEATURE_BTI_DEFAULT="0" +ARM64_FEATURE_PAC_DEFAULT="0" +ARM64_ELF="0" + +# strip -o from CC line so -dM works +_CC=$(echo "$CC" | sed 's/-o [^ ]*//') +output=$($_CC -dM -E - < /dev/null || exit $?) +while IFS= read -r line; do + # Skip empty lines + if test -z "$line"; then + continue + fi + # Match the #define pattern and extract the macro name and value + case "$line" in + \#define\ *\ *) + macro_name=`echo "$line" | awk '{print $2}'` + macro_value=`echo "$line" | cut -d ' ' -f 3- | sed 's/^"\(.*\)"$/\1/'` + # map's would be nice in POSIX shell, could use eval to simplify, but + # I won't do that to others. + case "$macro_name" in + __ARM_FEATURE_BTI_DEFAULT) + ARM64_FEATURE_BTI_DEFAULT="$macro_value" + ;; + __ARM_FEATURE_PAC_DEFAULT) + ARM64_FEATURE_PAC_DEFAULT="$macro_value" + ;; + __ELF__) + ARM64_ELF="$macro_value" + ;; + esac # end assignments + ;; + esac # end define +done <<< "$output" + +# Output the M4 define statement. To make m4 simpler always output something so we can +# use an ifelse without needing to nest it within an ifdef. +echo "define(\`ARM64_FEATURE_BTI_DEFAULT', \`$ARM64_FEATURE_BTI_DEFAULT')" +echo "define(\`ARM64_FEATURE_PAC_DEFAULT', \`$ARM64_FEATURE_PAC_DEFAULT')" +echo "define(\`ARM64_ELF', \`$ARM64_ELF')" diff --git a/mpn/arm64/hamdist.asm b/mpn/arm64/hamdist.asm index c72ca55b3..418519458 100644 --- a/mpn/arm64/hamdist.asm +++ b/mpn/arm64/hamdist.asm @@ -60,12 +60,13 @@ define(`chunksize',0x1ff0) ASM_START() PROLOGUE(mpn_hamdist) + BTI_C mov x11, #maxsize cmp n, x11 b.hi L(gt8k) -L(lt8k): +L(lt8k): BTI_C movi v4.16b, #0 C clear summation register movi v5.16b, #0 C clear summation register @@ -103,7 +104,8 @@ L(gt4): ld1 {v2.2d,v3.2d}, [ap], #32 C load 4 limbs L(000): subs n, n, #8 b.lo L(e0) -L(chu): ld1 {v2.2d,v3.2d}, [ap], #32 C load 4 limbs +L(chu): BTI_C + ld1 {v2.2d,v3.2d}, [ap], #32 C load 4 limbs ld1 {v0.2d,v1.2d}, [ap], #32 C load 4 limbs ld1 {v18.2d,v19.2d}, [bp], #32 C load 4 limbs ld1 {v16.2d,v17.2d}, [bp], #32 C load 4 limbs @@ -179,3 +181,4 @@ L(gt8k): mov x30, x8 ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/invert_limb.asm b/mpn/arm64/invert_limb.asm index 6a99bf002..a42a3c751 100644 --- a/mpn/arm64/invert_limb.asm +++ b/mpn/arm64/invert_limb.asm @@ -40,6 +40,7 @@ C Compiler generated, mildly edited. Could surely be further optimised. ASM_START() PROLOGUE(mpn_invert_limb) + BTI_C lsr x2, x0, #54 LEA_HI( x1, approx_tab) and x2, x2, #0x1fe @@ -81,3 +82,4 @@ approx_tab: forloop(i,256,512-1,dnl ` .hword eval(0x7fd00/i) ')dnl +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/logops_n.asm b/mpn/arm64/logops_n.asm index e959abc71..c3400c760 100644 --- a/mpn/arm64/logops_n.asm +++ b/mpn/arm64/logops_n.asm @@ -75,9 +75,11 @@ ifdef(`OPERATION_xnor_n',` define(`LOGOP', `eon $1, $2, $3')') MULFUNC_PROLOGUE(mpn_and_n mpn_andn_n mpn_nand_n mpn_ior_n mpn_iorn_n mpn_nior_n mpn_xor_n mpn_xnor_n) + BTI_C ASM_START() PROLOGUE(func) + BTI_C lsr x17, n, #2 tbz n, #0, L(bx0) @@ -137,3 +139,4 @@ L(end): LOGOP( x12, x6, x10) stp x12, x13, [rp] L(ret): ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/lshift.asm b/mpn/arm64/lshift.asm index fe8a1aa18..a0cf9a3db 100644 --- a/mpn/arm64/lshift.asm +++ b/mpn/arm64/lshift.asm @@ -58,6 +58,7 @@ define(`NSHIFT', lsr) ASM_START() PROLOGUE(mpn_lshift) + BTI_C add rp, rp_arg, n, lsl #3 add up, up, n, lsl #3 sub tnc, xzr, cnt @@ -136,3 +137,4 @@ L(end): orr x10, x10, x13 str x2, [rp,#-24] ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/lshiftc.asm b/mpn/arm64/lshiftc.asm index 6bf584400..5880912de 100644 --- a/mpn/arm64/lshiftc.asm +++ b/mpn/arm64/lshiftc.asm @@ -58,6 +58,7 @@ define(`NSHIFT', lsr) ASM_START() PROLOGUE(mpn_lshiftc) + BTI_C add rp, rp_arg, n, lsl #3 add up, up, n, lsl #3 sub tnc, xzr, cnt @@ -139,3 +140,4 @@ L(end): eon x10, x10, x13 str x2, [rp,#-24] ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/mod_34lsub1.asm b/mpn/arm64/mod_34lsub1.asm index 7945fe72c..ac84675b7 100644 --- a/mpn/arm64/mod_34lsub1.asm +++ b/mpn/arm64/mod_34lsub1.asm @@ -62,6 +62,7 @@ ASM_START() TEXT ALIGN(32) PROLOGUE(mpn_mod_34lsub1) + BTI_C subs n, n, #3 mov x8, #0 b.lt L(le2) C n <= 2 @@ -122,3 +123,4 @@ L(1): ldr x2, [ap] add x0, x0, x2, lsr #48 ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/mul_1.asm b/mpn/arm64/mul_1.asm index fb965efff..87760191d 100644 --- a/mpn/arm64/mul_1.asm +++ b/mpn/arm64/mul_1.asm @@ -51,11 +51,13 @@ define(`v0', `x3') PROLOGUE(mpn_mul_1c) + BTI_C adds xzr, xzr, xzr C clear cy flag b L(com) EPILOGUE() PROLOGUE(mpn_mul_1) + BTI_C adds x4, xzr, xzr C clear register and cy flag L(com): lsr x17, n, #2 tbnz n, #0, L(bx1) @@ -126,3 +128,4 @@ L(2e): adcs x12, x8, x11 L(1): adc x0, x11, xzr ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/popcount.asm b/mpn/arm64/popcount.asm index 74de3fc01..4ea179faf 100644 --- a/mpn/arm64/popcount.asm +++ b/mpn/arm64/popcount.asm @@ -59,12 +59,14 @@ define(`chunksize',0x1ff0) ASM_START() PROLOGUE(mpn_popcount) + BTI_C mov x11, #maxsize cmp n, x11 b.hi L(gt8k) -L(lt8k): +L(lt8k): BTI_C + movi v4.16b, #0 C clear summation register movi v5.16b, #0 C clear summation register @@ -94,7 +96,8 @@ L(gt4): ld1 {v2.2d,v3.2d}, [ap], #32 C load 4 limbs L(000): subs n, n, #8 b.lo L(e0) -L(chu): ld1 {v2.2d,v3.2d}, [ap], #32 C load 4 limbs +L(chu): BTI_C + ld1 {v2.2d,v3.2d}, [ap], #32 C load 4 limbs ld1 {v0.2d,v1.2d}, [ap], #32 C load 4 limbs cnt v6.16b, v2.16b cnt v7.16b, v3.16b @@ -155,3 +158,4 @@ L(gt8k): mov x30, x8 ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/rsh1aors_n.asm b/mpn/arm64/rsh1aors_n.asm index afd3d5be4..17487d5d7 100644 --- a/mpn/arm64/rsh1aors_n.asm +++ b/mpn/arm64/rsh1aors_n.asm @@ -56,9 +56,11 @@ ifdef(`OPERATION_rsh1sub_n', ` define(`func_n', mpn_rsh1sub_n)') MULFUNC_PROLOGUE(mpn_rsh1add_n mpn_rsh1sub_n) + BTI_C ASM_START() PROLOGUE(func_n) + BTI_C lsr x6, n, #2 tbz n, #0, L(bx0) @@ -166,3 +168,4 @@ L(2): cset x14, COND L(ret): mov x0, x10 ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/rshift.asm b/mpn/arm64/rshift.asm index 90187ad51..d3fc16301 100644 --- a/mpn/arm64/rshift.asm +++ b/mpn/arm64/rshift.asm @@ -58,6 +58,7 @@ define(`NSHIFT', lsl) ASM_START() PROLOGUE(mpn_rshift) + BTI_C mov rp, rp_arg sub tnc, xzr, cnt lsr x17, n, #2 @@ -134,3 +135,4 @@ L(end): orr x10, x10, x13 str x2, [rp,#32] ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/sec_tabselect.asm b/mpn/arm64/sec_tabselect.asm index 18a268ace..d671b6f74 100644 --- a/mpn/arm64/sec_tabselect.asm +++ b/mpn/arm64/sec_tabselect.asm @@ -57,6 +57,7 @@ define(`maskq', `v4') ASM_START() PROLOGUE(mpn_sec_tabselect) + BTI_C dup v7.2d, x4 C 2 `which' copies mov x10, #1 @@ -120,3 +121,4 @@ L(tp1): cmeq maskq.2d, v5.2d, v7.2d L(b00): ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/arm64/sqr_diag_addlsh1.asm b/mpn/arm64/sqr_diag_addlsh1.asm index 39f1cb1bc..599717d3a 100644 --- a/mpn/arm64/sqr_diag_addlsh1.asm +++ b/mpn/arm64/sqr_diag_addlsh1.asm @@ -46,6 +46,7 @@ define(`n', `x3') ASM_START() PROLOGUE(mpn_sqr_diag_addlsh1) + BTI_C ldr x15, [up],#8 lsr x14, n, #1 tbz n, #0, L(bx0) @@ -100,3 +101,4 @@ L(end): extr x9, x6, x5, #63 ret EPILOGUE() +ADD_GNU_NOTES_IF_NEEDED diff --git a/mpn/m4-ccas b/mpn/m4-ccas index 16d80c6f5..1d68bfe8b 100755 --- a/mpn/m4-ccas +++ b/mpn/m4-ccas @@ -49,6 +49,8 @@ CC= DEFS= ASM= SEEN_O=no +M4_GENPATH= +M4_GENERATED= for i in "$@"; do case $i in @@ -73,6 +75,9 @@ for i in "$@"; do SEEN_O=yes CC="$CC $i" ;; + --m4-gen-path=*) + M4_GENPATH=`echo "$i" | sed 's/^--m4-gen-path=//'` + ;; *) CC="$CC $i" ;; @@ -97,11 +102,23 @@ if test $SEEN_O = no; then CC="$CC -o $BASENAME.o" fi -echo "$M4 $DEFS $ASM >$TMP" -$M4 $DEFS $ASM >$TMP || exit +# Does the architecture have any dynamically generated m4? +# if so execute the generation script +if test -n "$M4_GENPATH"; then + if ! test -f "$M4_GENPATH"; then + echo "$M4_GENPATH not found." + exit 1 + fi + echo "$M4_GENPATH \"$CC\"" + M4_GENERATED="${TMP%.*}.m4" + "$M4_GENPATH" "$CC" > "$M4_GENERATED" || exit +fi + +echo "$M4 $DEFS $M4_GENERATED $ASM >$TMP" +$M4 $DEFS "$M4_GENERATED" $ASM >$TMP || exit echo "$CC" $CC || exit # Comment this out to preserve .s intermediates -rm -f $TMP +rm -f $TMP "$M4_GENERATED" -- 2.45.2 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel