[Ping^2][PATCH, DWARF] Add DW_CFA_AARCH64_negate_ra_state to dwarf2.def/h and dwarfnames.c
2017-08-22 9:18 GMT+01:00 Jiong Wang : > On 10/08/17 17:39, Jiong Wang wrote: >> >> Hi, >> >> A new vendor CFA DW_CFA_AARCH64_negate_ra_state was introduced for >> ARMv8.3-A >> return address signing, it is multiplexing DW_CFA_GNU_window_save in CFA >> vendor >> extension space. >> >> This patch adds necessary code to make it available to external, the GDB >> patch (https://sourceware.org/ml/gdb-patches/2017-08/msg00215.html) is >> intended >> to use it. >> >> A new DW_CFA_DUP for it is added in dwarf2.def. The use of DW_CFA_DUP >> is to >> avoid duplicated case value issue when included in libiberty/dwarfnames. >> >> Native x86 builds OK to make sure no macro expanding errors. >> >> OK for trunk? >> >> 2017-08-10 Jiong Wang >> >> include/ >> * dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP. >> * dwarf2.h (DW_CFA_DUP): New define. >> >> libiberty/ >> * dwarfnames.c (DW_CFA_DUP): New define. >> > > Ping~ Ping^2
[Ping~][PATCH, DWARF] Add DW_CFA_AARCH64_negate_ra_state to dwarf2.def/h and dwarfnames.c
On 10/08/17 17:39, Jiong Wang wrote: Hi, A new vendor CFA DW_CFA_AARCH64_negate_ra_state was introduced for ARMv8.3-A return address signing, it is multiplexing DW_CFA_GNU_window_save in CFA vendor extension space. This patch adds necessary code to make it available to external, the GDB patch (https://sourceware.org/ml/gdb-patches/2017-08/msg00215.html) is intended to use it. A new DW_CFA_DUP for it is added in dwarf2.def. The use of DW_CFA_DUP is to avoid duplicated case value issue when included in libiberty/dwarfnames. Native x86 builds OK to make sure no macro expanding errors. OK for trunk? 2017-08-10 Jiong Wang include/ * dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP. * dwarf2.h (DW_CFA_DUP): New define. libiberty/ * dwarfnames.c (DW_CFA_DUP): New define. Ping~
[PATCH, DWARF] Add DW_CFA_AARCH64_negate_ra_state to dwarf2.def/h and dwarfnames.c
Hi, A new vendor CFA DW_CFA_AARCH64_negate_ra_state was introduced for ARMv8.3-A return address signing, it is multiplexing DW_CFA_GNU_window_save in CFA vendor extension space. This patch adds necessary code to make it available to external, the GDB patch (https://sourceware.org/ml/gdb-patches/2017-08/msg00215.html) is intended to use it. A new DW_CFA_DUP for it is added in dwarf2.def. The use of DW_CFA_DUP is to avoid duplicated case value issue when included in libiberty/dwarfnames. Native x86 builds OK to make sure no macro expanding errors. OK for trunk? 2017-08-10 Jiong Wang include/ * dwarf2.def (DW_CFA_AARCH64_negate_ra_state): New DW_CFA_DUP. * dwarf2.h (DW_CFA_DUP): New define. libiberty/ * dwarfnames.c (DW_CFA_DUP): New define. diff --git a/include/dwarf2.def b/include/dwarf2.def index a91e9439cd82f3bb9fdddc14904114e5490c1af6..2a3b23fef873db9bb2498cd28c4fafc72e6a234f 100644 --- a/include/dwarf2.def +++ b/include/dwarf2.def @@ -778,6 +778,7 @@ DW_CFA (DW_CFA_MIPS_advance_loc8, 0x1d) /* GNU extensions. NOTE: DW_CFA_GNU_window_save is multiplexed on Sparc and AArch64. */ DW_CFA (DW_CFA_GNU_window_save, 0x2d) +DW_CFA_DUP (DW_CFA_AARCH64_negate_ra_state, 0x2d) DW_CFA (DW_CFA_GNU_args_size, 0x2e) DW_CFA (DW_CFA_GNU_negative_offset_extended, 0x2f) diff --git a/include/dwarf2.h b/include/dwarf2.h index 14b6f22e39e2f2f8cadb05009bfd10fafa9ea07c..a2e022dbdb35c18bb591e0f00930978846b82c01 100644 --- a/include/dwarf2.h +++ b/include/dwarf2.h @@ -52,6 +52,7 @@ #define DW_ATE(name, value) , name = value #define DW_ATE_DUP(name, value) , name = value #define DW_CFA(name, value) , name = value +#define DW_CFA_DUP(name, value) , name = value #define DW_IDX(name, value) , name = value #define DW_IDX_DUP(name, value) , name = value @@ -104,6 +105,7 @@ #undef DW_ATE #undef DW_ATE_DUP #undef DW_CFA +#undef DW_CFA_DUP #undef DW_IDX #undef DW_IDX_DUP diff --git a/libiberty/dwarfnames.c b/libiberty/dwarfnames.c index e58d03c3a3d814f3a271edb4689c6306a2f958f0..dacd78dbaa9b33d6e9fdf35330cdc446dcf4f76c 100644 --- a/libiberty/dwarfnames.c +++ b/libiberty/dwarfnames.c @@ -75,6 +75,7 @@ Boston, MA 02110-1301, USA. */ #define DW_ATE(name, value) case name: return # name ; #define DW_ATE_DUP(name, value) #define DW_CFA(name, value) case name: return # name ; +#define DW_CFA_DUP(name, value) #define DW_IDX(name, value) case name: return # name ; #define DW_IDX_DUP(name, value) @@ -105,5 +106,6 @@ Boston, MA 02110-1301, USA. */ #undef DW_ATE #undef DW_ATE_DUP #undef DW_CFA +#undef DW_CFA_DUP #undef DW_IDX #undef DW_IDX_DUP
Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage
On 15/06/17 15:12, Wilco Dijkstra wrote: This results in smaller code and unwind info. I have done a quick test on your updated patch through building latest linux kernel. Dwarf frame size improved (~ 5% smaller) as using sp to address locals doesn't need to update CFA register etc. Though the impact on the codegen by using sp to address locals may be diversified, for the case of linux kernel, I saw text size increase slightly (~ 0.05% bigger), the reason looks like is GCC hardware copy propagation doesn't support stack pointer case, see regcprop.c, so if you have the following sequences, the fp case will be optimized into "add x0, x29, 36" while sp case left with two instructions. A simple testcase listed below. sp === mov x0, sp add x0, x0, 36 fp === mov x0, x29 add x0, x0, 36 test.c === struct K { int a; int b; int c; int d; char e; short f; long g; float h; double i; }; void foo (int, struct K *); void test (int i) { struct K k = { .a = 5, .b = 0, .c = i, }; foo (5, &k);
Re: [PATCH 1/5] testsuite: attr-alloc_size-11.c (PR79356)
On 15/03/17 15:34, Rainer Orth wrote: Hi Jiong, Subject: [PATCH] testsuite, 79356 As stated in the PR (and elsewhere), this test now passes on aarch64, ia64, mips, powerpc, sparc, and s390x. This patch disables the xfails for those targets. gcc/testsuite/ PR testsuite/79356 * gcc.dg/attr-alloc_size-11.c: Don't xfail on aarch64, ia64, mips, powerpc, sparc, or s390x. It's passing on ARM as well. I will commit the following patch which add arm*-*-* to the "Don't xfail". gcc/testsuite/ PR testsuite/79356 * gcc.dg/attr-alloc_size-11.c: Don't xfail on arm. please keep the lists sorted alphabetically. Thanks, noticed that just during committing, the committed one has been corrected. https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/testsuite/gcc.dg/attr-alloc_size-11.c?r1=246167&r2=246166&pathrev=246167
Re: [PATCH 1/5] testsuite: attr-alloc_size-11.c (PR79356)
On 10/03/17 15:26, Segher Boessenkool wrote: On Fri, Mar 10, 2017 at 01:57:31PM +0100, Rainer Orth wrote: I just noticed that nothing has happened at all in a month, so anything is better than the tests XPASSing on a number of targets. So the patch is ok for mainline with sparc*-*-* added to the target lists and a reference to PR testsuite/79356 in the comment. I'd still be very grateful if Martin could have a look what's really going on here, though. Same here. Committed as: Subject: [PATCH] testsuite, 79356 As stated in the PR (and elsewhere), this test now passes on aarch64, ia64, mips, powerpc, sparc, and s390x. This patch disables the xfails for those targets. gcc/testsuite/ PR testsuite/79356 * gcc.dg/attr-alloc_size-11.c: Don't xfail on aarch64, ia64, mips, powerpc, sparc, or s390x. It's passing on ARM as well. I will commit the following patch which add arm*-*-* to the "Don't xfail". gcc/testsuite/ PR testsuite/79356 * gcc.dg/attr-alloc_size-11.c: Don't xfail on arm. diff --git a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c index ccf2c2196c065b3387a91cc764dad3fcc1b4e3ee..3c1867bfb4e1cb762308dc6ac03afc7dc01cc075 100644 --- a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c +++ b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c @@ -47,8 +47,8 @@ typedef __SIZE_TYPE__size_t; /* The following tests fail because of missing range information. The xfail exclusions are PR79356. */ -TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for signed char" { xfail { ! { aarch64*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* } } } } */ -TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for short" { xfail { ! { aarch64*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* } } } } */ +TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for signed char" { xfail { ! { arm*-*-* aarch64*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* } } } } */ +TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for short" { xfail { ! { arm*-*-* aarch64*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* } } } } */ TEST (int, INT_MIN + 2, ALLOC_MAX);/* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */ TEST (int, -3, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */ TEST (int, -2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
Re: [PING 6, PATCH] Remove xfail from thread_local-order2.C.
On 07/02/17 16:01, Mike Stump wrote: On Feb 7, 2017, at 2:20 AM, Rainer Orth wrote: No. In fact, I'd go for something like this: 2017-02-07 Dominik Vogt Rainer Orth * g++.dg/tls/thread_local-order2.C: Only xfail execution on *-*-solaris*. # HG changeset patch # Parent 031bb7a327cc984d387a8ae64e7c65d4b8793731 Only xfail g++.dg/tls/thread_local-order2.C on Solaris diff --git a/gcc/testsuite/g++.dg/tls/thread_local-order2.C b/gcc/testsuite/g++.dg/tls/thread_local-order2.C --- a/gcc/testsuite/g++.dg/tls/thread_local-order2.C +++ b/gcc/testsuite/g++.dg/tls/thread_local-order2.C @@ -2,10 +2,11 @@ // that isn't reverse order of construction. We need to move // __cxa_thread_atexit into glibc to get this right. -// { dg-do run { xfail *-*-* } } +// { dg-do run } // { dg-require-effective-target c++11 } // { dg-add-options tls } // { dg-require-effective-target tls_runtime } +// { dg-xfail-run-if "" { *-*-solaris* } } extern "C" void abort(); extern "C" int printf (const char *, ...); This way one can easily add per-target PR references or explanations, e.g. for darwin10 or others should they come up. Tested on i386-pc-solaris2.12 and x86_64-pc-linux-gnu. Ok for mainline? Ok. I think that addresses most all known issues. I'll pre-appove any additional targets people find as trivial. For example, if darwin10 doesn't pass, then *-*-darwin10* would be fine to add if that fixes the issue. I don't happen to have one that old to just test on. I am seeing this failure on arm and aarch64 bare-metal environment where newlib are used. This patch also XFAIL this testcase on newlib. OK for trunk? Regards, Jiong gcc/testsuite/ 2017-03-10 Jiong Wang * g++.dg/tls/thread_local-order2.C: XFAIL on newlib. diff --git a/gcc/testsuite/g++.dg/tls/thread_local-order2.C b/gcc/testsuite/g++.dg/tls/thread_local-order2.C index 3cbd257b5fab05d9af7aeceb4f97e9a79d2a283e..d274e8c606542893f8a792469e075056793335ea 100644 --- a/gcc/testsuite/g++.dg/tls/thread_local-order2.C +++ b/gcc/testsuite/g++.dg/tls/thread_local-order2.C @@ -6,7 +6,7 @@ // { dg-require-effective-target c++11 } // { dg-add-options tls } // { dg-require-effective-target tls_runtime } -// { dg-xfail-run-if "" { hppa*-*-hpux* *-*-solaris* } } +// { dg-xfail-run-if "" { { hppa*-*-hpux* *-*-solaris* } || { newlib } } } extern "C" void abort(); extern "C" int printf (const char *, ...);
Re: [AArch64] Accelerate -fstack-protector through pointer authentication extension
On 15/02/17 15:45, Richard Earnshaw (lists) wrote: On 18/01/17 17:10, Jiong Wang wrote: NOTE, this approach however requires DWARF change as the original LR is signed, the binary needs new libgcc to make sure c++ eh works correctly. Given this acceleration already needs the user specify -mstack-protector-dialect=pauth which means the target platform largely should have install new libgcc, otherwise you can't utilize new pointer authentication features. gcc/ 2016-11-11 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_stack_protector_type): New enum. (aarch64_layout_frame): Swap callees and locals when -mstack-protector-dialect=pauth specified. (aarch64_expand_prologue): Use AARCH64_PAUTH_SSP_OR_RA_SIGN instead of AARCH64_ENABLE_RETURN_ADDRESS_SIGN. (aarch64_expand_epilogue): Likewise. * config/aarch64/aarch64.md (*do_return): Likewise. (aarch64_override_options): Sanity check for ILP32 and TARGET_PAUTH. * config/aarch64/aarch64.h (AARCH64_PAUTH_SSP_OPTION, AARCH64_PAUTH_SSP, AARCH64_PAUTH_SSP_OR_RA_SIGN, LINK_SSP_SPEC): New defines. * config/aarch64/aarch64.opt (-mstack-protector-dialect=): New option. * doc/invoke.texi (AArch64 Options): Documents -mstack-protector-dialect=. Patch updated to migrate to TARGET_STACK_PROTECT_RUNTIME_ENABLED_P. aarch64 cross check OK with the following options enabled on all testcases. -fstack-protector-all -mstack-protector-pauth OK for trunk? gcc/ 2017-01-18 Jiong Wang * config/aarch64/aarch64-protos.h (aarch64_pauth_stack_protector_enabled): New declaration. * config/aarch64/aarch64.c (aarch64_layout_frame): Swap callee-save area and locals area when aarch64_pauth_stack_protector_enabled returns true. (aarch64_stack_protect_runtime_enabled): New function. (aarch64_pauth_stack_protector_enabled): New function. (aarch64_return_address_signing_enabled): Enabled by aarch64_pauth_stack_protector_enabled. (aarch64_override_options): Sanity check for -mstack-protector-pauth. (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Define. * config/aarch64/aarch64.h (LINK_SSP_SPEC): Likewise. * config/aarch64/aarch64.opt (-mstack-protector-pauth): New option. * doc/invoke.texi (AArch64 Options): Documents -mstack-protector-pauth. gcc/testsuite/ * gcc.target/aarch64/stack_protector_1.c: New test. 1.patch diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 632dd4768d82c340ae4e9b4a93206743756c06e7..a3ad623eef498d00b52d24bf02a5748fad576c3d 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -383,6 +383,7 @@ void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx, void aarch64_init_expanders (void); void aarch64_init_simd_builtins (void); void aarch64_emit_call_insn (rtx); +bool aarch64_pauth_stack_protector_enabled (void); void aarch64_register_pragmas (void); void aarch64_relayout_simd_types (void); void aarch64_reset_previous_fndecl (void); diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 3718ad1b3bf27c6bdb9e74831fd660e617cccbde..dd742d37ab6fc6fb5085e1c6b5d86d5ce1ce5f8a 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -958,4 +958,11 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); extern tree aarch64_fp16_type_node; extern tree aarch64_fp16_ptr_type_node; +#ifndef TARGET_LIBC_PROVIDES_SSP +#define LINK_SSP_SPEC "%{!mstack-protector-pauth:\ +%{fstack-protector|fstack-protector-all\ + |fstack-protector-strong|fstack-protector-explicit:\ + -lssp_nonshared -lssp}}" +#endif + I don't think we want to suppress this. PAUTH pased stack protections isn't an all-or-nothing solution. What if some object files are built with traditional -fstack-protector code? I had done a decription on this in the ping email (changed summary may caused trouble to email client) -- Code compiled with "-mstack-protector-pauth" can co-work with code compiled without "-mstack-protector-pauth". The only problem is when "-mstack-protector-pauth" is specified, "-lssp/-lssp_nonshared" won't be implied as the software runtime supports are not required any more. So if the user has some object files compiled using default stack protector and wants them to be linked with object files compiled using "-mstack-protector-pauth", if "-mstack-protector-pauth" appear in the final command line and "gcc" is used as linker driver, then "-lssp/-lss
Ping [AArch64] Accelerate -fstack-protector
On 18/01/17 17:10, Jiong Wang wrote: aarch64 cross check OK with the following options enabled on all testcases. -fstack-protector-all -mstack-protector-pauth OK for trunk? gcc/ 2017-01-18 Jiong Wang * config/aarch64/aarch64-protos.h (aarch64_pauth_stack_protector_enabled): New declaration. * config/aarch64/aarch64.c (aarch64_layout_frame): Swap callee-save area and locals area when aarch64_pauth_stack_protector_enabled returns true. (aarch64_stack_protect_runtime_enabled): New function. (aarch64_pauth_stack_protector_enabled): New function. (aarch64_return_address_signing_enabled): Enabled by aarch64_pauth_stack_protector_enabled. (aarch64_override_options): Sanity check for -mstack-protector-pauth. (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Define. * config/aarch64/aarch64.h (LINK_SSP_SPEC): Likewise. * config/aarch64/aarch64.opt (-mstack-protector-pauth): New option. * doc/invoke.texi (AArch64 Options): Documents -mstack-protector-pauth. gcc/testsuite/ * gcc.target/aarch64/stack_protector_1.c: New test. I'd like to ping this patch which acclerates GCC -fstack-protector using ARMv8.3-A Pointer Authentication Extension. The whole acceleration will only be enabled through the new option "-mstack-protector-pauth" which is disabled at default. This patch does not touch any generic code and does not change GCC codegen on AArch64 at default, it should be with very low risk. So is it OK to commit to GCC trunk? Code compiled with "-mstack-protector-pauth" can co-work with code compiled without "-mstack-protector-pauth". The only problem is when "-mstack-protector-pauth" is specified, "-lssp/-lssp_nonshared" won't be implied as the software runtime supports are not required any more. So if the user has some object files compiled using default stack protector and wants them to be linked with object files compiled using "-mstack-protector-pauth", if "-mstack-protector-pauth" appear in the final command line and "gcc" is used as linker driver, then "-lssp/-lssp_nonshared" needs to be specified explicitly.
Re: Fix profile updating in ifcombine
On 06/02/17 15:26, Jan Hubicka wrote: I think it is not a regression, just the testcase if fragile and depends on outcome of ifcombine. It seems it was updated several time in the past. I am not quite sure what the test is testing. They are tring to make sure optimal stack adjustment decisions are made. Fix the testcases by disabling relevant transformation passes looks one way to me. The other way, might be more reliable, is we dump the decisions made during aarch64 frame layout if dump_file be true, and prefix the dump entry by function name to make it easier caught by dejagnu. We then scan rtl dump instead of instructions.
Re: [PATCH][wwwdocs] Mention -march=armv8.3-a -msign-return-address= for GCC 7
On 02/02/17 13:31, Gerald Pfeifer wrote: On Thu, 2 Feb 2017, Jiong Wang wrote: This patch adds a short entry for the -march=armv8.3-a and -msign-return-address= options in GCC 7 to the "AArch64" section. Thanks, Jiong. Index: gcc-7/changes.html === + The ARMv8.3-A architecture is now supported. It can be used by + specifying the -march=armv8.3-a option. + + The option -msign-return-address= is supported to enable + return address protection using ARMv8.3-A Pointer Authentication + Extensions. Please refer to the documentation for more information on + the arguments accepted by this option. + Would it make sense to make this two different items? The way it is currently marked up, the blank line will be "gone" once rendered. OK, seperated them into two different items. Where you "refer to the documentation", what kind of documentation is that? ARM reference manuals, GCC's documentation,...? Being a bit more explicit here and/or using a link would be good. It's GCC user manual, have added the link in the updated patch. Please review, thanks. Index: htdocs/gcc-7/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v retrieving revision 1.54 diff -u -r1.54 changes.html --- htdocs/gcc-7/changes.html 1 Feb 2017 19:23:00 - 1.54 +++ htdocs/gcc-7/changes.html 2 Feb 2017 14:34:49 - @@ -711,6 +711,18 @@ AArch64 + The ARMv8.3-A architecture is now supported. It can be used by + specifying the -march=armv8.3-a option. + + + The option -msign-return-address= is supported to enable + return address protection using ARMv8.3-A Pointer Authentication + Extensions. For more information on the arguments accepted by this + option, please refer to + https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options";> + AArch64-Options. + + The ARMv8.2-A architecture and the ARMv8.2-A 16-bit Floating-Point Extensions are now supported. They can be used by specifying the -march=armv8.2-a or -march=armv8.2-a+fp16
[PATCH][wwwdocs] Mention -march=armv8.3-a -msign-return-address= for GCC 7
Hi all, This patch adds a short entry for the -march=armv8.3-a and -msign-return-address= options in GCC 7 to the "AArch64" section. Eyeballed the result in Firefox. Ok to commit? Thanks, Jiong Index: gcc-7/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v retrieving revision 1.39 diff -u -r1.39 changes.html --- gcc-7/changes.html 17 Jan 2017 21:26:31 - 1.39 +++ gcc-7/changes.html 20 Jan 2017 14:31:21 - @@ -384,6 +384,15 @@ AArch64 + The ARMv8.3-A architecture is now supported. It can be used by + specifying the -march=armv8.3-a option. + + The option -msign-return-address= is supported to enable + return address protection using ARMv8.3-A Pointer Authentication + Extensions. Please refer to the documentation for more information on + the arguments accepted by this option. + + The ARMv8.2-A architecture and the ARMv8.2-A 16-bit Floating-Point Extensions are now supported. They can be used by specifying the -march=armv8.2-a or -march=armv8.2-a+fp16
Re: [PATCH v2] aarch64: Add split-stack initial support
On 24/01/17 18:05, Adhemerval Zanella wrote: On 03/01/2017 13:13, Wilco Dijkstra wrote: + /* If function uses stacked arguments save the old stack value so morestack + can return it. */ + reg11 = gen_rtx_REG (Pmode, R11_REGNUM); + if (cfun->machine->frame.saved_regs_size + || cfun->machine->frame.saved_varargs_size) +emit_move_insn (reg11, stack_pointer_rtx); This doesn't look right - we could have many arguments even without varargs or saved regs. This would need to check varargs as well as ctrl->args.size (I believe that is the size of the arguments on the stack). It's fine to omit this optimization in the first version - we already emit 2-3 extra instructions for the check anyway. I will check for a better solution. Hi Adhemerval My only concern on this this patch is the initialization of R11 (internal arg pointer). The current implementation looks to me is generating wrong code for a testcase simply return the sum of ten int param, I see the function body is using R11 while there is no initialization of it in split prologue, so if the execution flow is *not* through __morestack, then R11 is not initialized. As Wilco suggested, I feel using crtl->args.size instead of cfun->machine->frame.saved_regs_size might be the correct approach after checking assign_parms in function.c.
Re: [1/5][AArch64] Return address protection on AArch64
On 20/01/17 18:23, Jiong Wang wrote: OK, the attached patch disable the building of pointer signing code in libgcc on ILP32 mode, except the macro bit RA_A_SIGNED_BIT is still defined as I want to book this bit for ILP32 as LP64 in case we later enable ILP32 support. All pauth builtins are not registered as well for ILP32 mode as these builtins are supposed to be used by libgcc unwinder code only. I also gated the three new testcases for return address signing using the following directive and verified it works under my dejagnu environment. { dg-require-effective-target lp64 } multilib cross build finished (lp64, ilp32), OK for trunk? BTW, the mode fix patch doesn't have conflict with this patch, we may still need it if we want to enable ILP32 support later. Thanks. gcc/ 2017-01-20 Jiong Wang * config/aarch64/aarch64-builtins.c (aarch64_init_builtins): Don't register pauth builtins for ILP32. libgcc/ * config/aarch64/aarch64-unwind.h: Restrict this file on LP64 only. * unwind-dw2.c (execute_cfa_program): Only multiplexing DW_CFA_GNU_window_save for AArch64 LP64. Missing testcase change in Changelog, added: gcc/ 2017-01-20 Jiong Wang * config/aarch64/aarch64-builtins.c (aarch64_init_builtins): Register register pauth builtins for LP64 only. * testsuite/gcc.target/aarch64/return_address_sign_1.c: Enable on LP64 only. * testsuite/gcc.target/aarch64/return_address_sign_2.c: Likewise. * testsuite/gcc.target/aarch64/return_address_sign_3.c: Likewise. libgcc/ * config/aarch64/aarch64-unwind.h: Empty this file on ILP32. * unwind-dw2.c (execute_cfa_program): Only multiplexing DW_CFA_GNU_window_save for AArch64 and LP64.
Re: [1/5][AArch64] Return address protection on AArch64
Here is the patch. For XPACLRI builtin which drops the signature in a pointer, it's prototype is "void *foo (void *)" FOR PAC/AUT builtin which sign or authenticate a pointer, it's prototype is "void *foo (void *, uint64)". This patch adjusted those modes to make sure they strictly follow the C prototype. I also borrow the type define in ARM backend typedef unsigned _uw64 __attribute__((mode(__DI__))); And this is need to type cast the salt value which is always DImode. It passed my local ILP32 cross build. OK for trunk? gcc/ 2017-01-20 Jiong Wang * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin): Fix modes for AARCH64_PAUTH_BUILTIN_XPACLRI, AARCH64_PAUTH_BUILTIN_PACIA1716 and AARCH64_PAUTH_BUILTIN_AUTIA1716. libgcc/ * config/aarch64/aarch64-unwind.h (_uw64): New typedef. (aarch64_post_extract_frame_addr): Cast salt to _uw64. (aarch64_post_frob_eh_handler_addr): Likewise. Hmm, we currently don't support ILP32 at all for pointer signing (sorry ("Return address signing is only supported for -mabi=lp64");), so I wonder if it would be better for now to simply suppress all the new hooks in aarch64-unwind.h ifdef __ILP32__. R. OK, the attached patch disable the building of pointer signing code in libgcc on ILP32 mode, except the macro bit RA_A_SIGNED_BIT is still defined as I want to book this bit for ILP32 as LP64 in case we later enable ILP32 support. All pauth builtins are not registered as well for ILP32 mode as these builtins are supposed to be used by libgcc unwinder code only. I also gated the three new testcases for return address signing using the following directive and verified it works under my dejagnu environment. { dg-require-effective-target lp64 } multilib cross build finished (lp64, ilp32), OK for trunk? BTW, the mode fix patch doesn't have conflict with this patch, we may still need it if we want to enable ILP32 support later. Thanks. gcc/ 2017-01-20 Jiong Wang * config/aarch64/aarch64-builtins.c (aarch64_init_builtins): Don't register pauth builtins for ILP32. libgcc/ * config/aarch64/aarch64-unwind.h: Restrict this file on LP64 only. * unwind-dw2.c (execute_cfa_program): Only multiplexing DW_CFA_GNU_window_save for AArch64 LP64. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 7ef351eb53b21c94a07dbd7c49813276dfcebdb2..66bcb9ad5872d1f6cac4ce9613806eb390be33af 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -983,9 +983,14 @@ aarch64_init_builtins (void) aarch64_init_crc32_builtins (); aarch64_init_builtin_rsqrt (); -/* Initialize pointer authentication builtins which are backed by instructions - in NOP encoding space. */ - aarch64_init_pauth_hint_builtins (); + /* Initialize pointer authentication builtins which are backed by instructions + in NOP encoding space. + + NOTE: these builtins are supposed to be used by libgcc unwinder only, as + there is no support on return address signing under ILP32, we don't + register them. */ + if (!TARGET_ILP32) +aarch64_init_pauth_hint_builtins (); } tree diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c b/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c index fda72a414f1df7e81785864e994681e3695852f1..f87c3d28d1edff473a787a39a436e57076f97508 100644 --- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c +++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c @@ -1,6 +1,7 @@ /* Testing return address signing where no combined instructions used. */ /* { dg-do compile } */ /* { dg-options "-O2 -msign-return-address=all" } */ +/* { dg-require-effective-target lp64 } */ int foo (int); diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c b/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c index 54fe47a69723d182c65941ddb73e2f1a5aa0af84..c5c1439b92e6637f85c47c6161cd797c0d68df25 100644 --- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c +++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c @@ -1,6 +1,7 @@ /* Testing return address signing where combined instructions used. */ /* { dg-do compile } */ /* { dg-options "-O2 -msign-return-address=all" } */ +/* { dg-require-effective-target lp64 } */ int foo (int); int bar (int, int); diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_3.c b/gcc/testsuite/gcc.target/aarch64/return_address_sign_3.c index adc5effdded8900b2dfb68459883d399ebd91ac8..7d9ec6eebd1ce452013d2895a551671c59e98f0c 100644 --- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_3.c +++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_3.c @@ -1,6 +1,7 @@ /* Testing the disable of return address signing. */ /* { dg-do compile } */ /* { dg-options "-O2 -msign-retu
Re: [1/5][AArch64] Return address protection on AArch64
On 20/01/17 11:15, Jiong Wang wrote: On 20/01/17 03:39, Andrew Pinski wrote: On Fri, Jan 6, 2017 at 3:47 AM, Jiong Wang wrote: On 11/11/16 18:22, Jiong Wang wrote: As described in the cover letter, this patch implements return address signing for AArch64, it's controlled by the new option: -msign-return-address=[none | non-leaf | all] "none" means don't do return address signing at all on any function. "non-leaf" means only sign non-leaf function. "all" means sign all functions. Return address signing is currently disabled on ILP32. I haven't tested it. The instructions added in the architecture are of 2 kinds. * In the NOP instruction space, which allows binaries to run without any traps on older versions of the architecture. This doesn't give any additional protection on older hardware but allows for the same binary to be used on earlier versions of the architecture and newer versions of the architecture. * New instructions that are only valid for v8.3 and will trap if used on earlier versions of the architecture. At default, once return address signing is enabled, it will only generates NOP instruction. While if -march=armv8.3-a specified, GCC will try to use the most efficient pointer authentication instruction as it can. The architecture has 2 user invisible system keys for signing and creating signed addresses as part of these instructions. For some use case, the user might want to use difference key for different functions. The new option "-msign-return-address-key=key_name" let GCC select the key used for return address signing. Permissible values are "a_key" for A key and "b_key" for B key, and this option are supported by function target attribute and LTO will hopefully just work. gcc/ 2016-11-09 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_pauth_key_index): New enum. (aarch64_function_type): New enum. * config/aarch64/aarch64-protos.h (aarch64_output_sign_auth_reg): New declaration. * config/aarch64/aarch64.c (aarch64_expand_prologue): Sign return address before it's pushed onto stack. (aarch64_expand_epilogue): Authenticate return address fetched from stack. (aarch64_output_sign_auth_reg): New function. (aarch64_override_options): Sanity check for ILP32 and ISA level. (aarch64_attributes): New function attributes for "sign-return-address", "pauth-key". * config/aarch64/aarch64.md (UNSPEC_AUTH_REG, UNSPEC_AUTH_REG1716, UNSPEC_SIGN_REG, UNSPEC_SIGN_REG1716, UNSPEC_STRIP_REG_SIGN, UNSPEC_STRIP_X30_SIGN): New unspecs. ("*do_return"): Generate combined instructions according to key index. ("sign_reg", "sign_reg1716", "auth_reg", "auth_reg1716", "strip_reg_sign", "strip_lr_sign"): New. * config/aarch64/aarch64.opt (msign-return-address, mpauth-key): New. * config/aarch64/predicates.md (aarch64_const0_const1): New predicate. * doc/extend.texi (AArch64 Function Attributes): Documents "sign-return-address=", "pauth-key". * doc/invoke.texi (AArch64 Options): Documents "-msign-return-address=", "-pauth-key". gcc/testsuite/ 2016-11-09 Jiong Wang * gcc.target/aarch64/return_address_sign_1.c: New testcase. * gcc.target/aarch64/return_address_sign_scope_1.c: New testcase. Update the patchset according to new DWARF proposal described at https://gcc.gnu.org/ml/gcc-patches/2016-11/msg03010.html One of these patches of this patch set break ILP32 building for aarch64-elf and most likely also aarch64-linux-gnu. /home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/libgcc/unwind-dw2.c: In function ‘uw_init_context_1’: /home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/libgcc/unwind-dw2.c:1567:6: internal compiler error: in emit_move_insn, at expr.c:3698 ra = MD_POST_EXTRACT_ROOT_ADDR (ra); 0x8270cf emit_move_insn(rtx_def*, rtx_def*) /home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/gcc/expr.c:3697 0x80867b force_reg(machine_mode, rtx_def*) Must be the Pmode issue under ILP32, I am testing a fix (I don't have full ILP32 environment, so can only test simply by force libgcc build with -mabi=ilp32) Here is the patch. For XPACLRI builtin which drops the signature in a pointer, it's prototype is "void *foo (void *)" FOR PAC/AUT builtin which sign or authenticate a pointer, it's prototype is "void *foo (void *, uint64)". This patch adjusted those modes to make sure they strictly follow the C prototype.
Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)
On 20/01/17 10:30, Christophe Lyon wrote: error: 'DWARF_REGNUM_AARCH64_RA_STATE' undeclared (first use in this function) fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1; ^ Hi Christophe, could you please confirm you svn revision please? I do have done bootstrap and regression on both x86 and aarch64 before commit this patch. I had forgotten to "svn add" one header file, but add it later. The failures started with r244673, and are still present with r244687. When did you add the missing file? It was r244674, https://gcc.gnu.org/ml/gcc-cvs/2017-01/msg00689.html, so should have been included in your code. The faliure looks strange to me then, I will svn up and re-start a fresh bootstrap on AArch64. The file is present in my git clone. I'm not bootstrapping on AArch64, I'm building a cross-compiler on x86_64, but it shouldn't matter. Hi Christophe, Thanks, I reproduced this on cross linux environment, the reason is the header file is not included because of the inhabit_libc guard, while the unwinder header file should always be included. I will committed the attached patch as obvious, once I finished a fresh bootstrap, cross elf, cross linux. Thanks. libgcc/ 2017-01-20 Jiong Wang * config/aarch64/linux-unwind.h: Always include aarch64-unwind.h. diff --git a/libgcc/config/aarch64/linux-unwind.h b/libgcc/config/aarch64/linux-unwind.h index a8fa1d5..70e5a8a 100644 --- a/libgcc/config/aarch64/linux-unwind.h +++ b/libgcc/config/aarch64/linux-unwind.h @@ -20,11 +20,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see <http://www.gnu.org/licenses/>. */ +/* Always include AArch64 unwinder header file. */ +#include "config/aarch64/aarch64-unwind.h" + #ifndef inhibit_libc #include #include -#include "config/aarch64/aarch64-unwind.h" /* Since insns are always stored LE, on a BE system the opcodes will
Re: [1/5][AArch64] Return address protection on AArch64
On 20/01/17 03:39, Andrew Pinski wrote: On Fri, Jan 6, 2017 at 3:47 AM, Jiong Wang wrote: On 11/11/16 18:22, Jiong Wang wrote: As described in the cover letter, this patch implements return address signing for AArch64, it's controlled by the new option: -msign-return-address=[none | non-leaf | all] "none" means don't do return address signing at all on any function. "non-leaf" means only sign non-leaf function. "all" means sign all functions. Return address signing is currently disabled on ILP32. I haven't tested it. The instructions added in the architecture are of 2 kinds. * In the NOP instruction space, which allows binaries to run without any traps on older versions of the architecture. This doesn't give any additional protection on older hardware but allows for the same binary to be used on earlier versions of the architecture and newer versions of the architecture. * New instructions that are only valid for v8.3 and will trap if used on earlier versions of the architecture. At default, once return address signing is enabled, it will only generates NOP instruction. While if -march=armv8.3-a specified, GCC will try to use the most efficient pointer authentication instruction as it can. The architecture has 2 user invisible system keys for signing and creating signed addresses as part of these instructions. For some use case, the user might want to use difference key for different functions. The new option "-msign-return-address-key=key_name" let GCC select the key used for return address signing. Permissible values are "a_key" for A key and "b_key" for B key, and this option are supported by function target attribute and LTO will hopefully just work. gcc/ 2016-11-09 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_pauth_key_index): New enum. (aarch64_function_type): New enum. * config/aarch64/aarch64-protos.h (aarch64_output_sign_auth_reg): New declaration. * config/aarch64/aarch64.c (aarch64_expand_prologue): Sign return address before it's pushed onto stack. (aarch64_expand_epilogue): Authenticate return address fetched from stack. (aarch64_output_sign_auth_reg): New function. (aarch64_override_options): Sanity check for ILP32 and ISA level. (aarch64_attributes): New function attributes for "sign-return-address", "pauth-key". * config/aarch64/aarch64.md (UNSPEC_AUTH_REG, UNSPEC_AUTH_REG1716, UNSPEC_SIGN_REG, UNSPEC_SIGN_REG1716, UNSPEC_STRIP_REG_SIGN, UNSPEC_STRIP_X30_SIGN): New unspecs. ("*do_return"): Generate combined instructions according to key index. ("sign_reg", "sign_reg1716", "auth_reg", "auth_reg1716", "strip_reg_sign", "strip_lr_sign"): New. * config/aarch64/aarch64.opt (msign-return-address, mpauth-key): New. * config/aarch64/predicates.md (aarch64_const0_const1): New predicate. * doc/extend.texi (AArch64 Function Attributes): Documents "sign-return-address=", "pauth-key". * doc/invoke.texi (AArch64 Options): Documents "-msign-return-address=", "-pauth-key". gcc/testsuite/ 2016-11-09 Jiong Wang * gcc.target/aarch64/return_address_sign_1.c: New testcase. * gcc.target/aarch64/return_address_sign_scope_1.c: New testcase. Update the patchset according to new DWARF proposal described at https://gcc.gnu.org/ml/gcc-patches/2016-11/msg03010.html One of these patches of this patch set break ILP32 building for aarch64-elf and most likely also aarch64-linux-gnu. /home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/libgcc/unwind-dw2.c: In function ‘uw_init_context_1’: /home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/libgcc/unwind-dw2.c:1567:6: internal compiler error: in emit_move_insn, at expr.c:3698 ra = MD_POST_EXTRACT_ROOT_ADDR (ra); 0x8270cf emit_move_insn(rtx_def*, rtx_def*) /home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/gcc/expr.c:3697 0x80867b force_reg(machine_mode, rtx_def*) Must be the Pmode issue under ILP32, I am testing a fix (I don't have full ILP32 environment, so can only test simply by force libgcc build with -mabi=ilp32) Thanks, Andrew While A key support for return address signing using DW_CFA_GNU_window_save only needs simple modifications on code and associated DWARF generation, B key support is complexer, it needs multiple CIE support in GCC and Binutils, so currently we fall back to DWARF value expression which fully works although requires longer encodings. Value expression also requires a few changes on AArch64 prologu
Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)
On 20/01/17 10:11, Christophe Lyon wrote: /tmp/8132498_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2.c: In function 'execute_cfa_program': /tmp/8132498_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2.c:1193:17: error: 'DWARF_REGNUM_AARCH64_RA_STATE' undeclared (first use in this function) fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1; ^ Hi Christophe, could you please confirm you svn revision please? I do have done bootstrap and regression on both x86 and aarch64 before commit this patch. I had forgotten to "svn add" one header file, but add it later. The failures started with r244673, and are still present with r244687. When did you add the missing file? It was r244674, https://gcc.gnu.org/ml/gcc-cvs/2017-01/msg00689.html, so should have been included in your code. The faliure looks strange to me then, I will svn up and re-start a fresh bootstrap on AArch64. Thanks. Christophe
Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)
On 20/01/17 08:41, Christophe Lyon wrote: Hi Jiong, On 19 January 2017 at 15:46, Jiong Wang wrote: Thanks for the review. On 19/01/17 14:18, Richard Earnshaw (lists) wrote: diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c index 8085a42ace15d53f4cb0c6681717012d906a6d47..cf640135275deb76b820f8209fa51eacfd64c4a2 100644 --- a/libgcc/unwind-dw2.c +++ b/libgcc/unwind-dw2.c @@ -136,6 +136,8 @@ struct _Unwind_Context #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1) /* Context which has version/args_size/by_value fields. */ #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1) + /* Bit reserved on AArch64, return address has been signed with A key. */ +#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1) Why is this here? It appears to only be used within the AArch64-specific header file. I was putting it here so that when we allocate the next general purpose bit, we know clearly that bit 3 is allocated to AArch64 already, and the new general bit needs to go to the next one. This can avoid bit collision. ... +/* Frob exception handler's address kept in TARGET before installing into + CURRENT context. */ + +static void * +uw_frob_return_addr (struct _Unwind_Context *current, + struct _Unwind_Context *target) +{ + void *ret_addr = __builtin_frob_return_addr (target->ra); +#ifdef MD_POST_FROB_EH_HANDLER_ADDR + ret_addr = MD_POST_FROB_EH_HANDLER_ADDR (current, target, ret_addr); +#endif + return ret_addr; +} + I think this function should be marked inline. The optimizers would probably inline it anyway, but it seems wrong for us to rely on that. Thanks, fixed. Does the updated patch looks OK to you know? libgcc/ 2017-01-19 Jiong Wang * config/aarch64/aarch64-unwind.h: New file. (DWARF_REGNUM_AARCH64_RA_STATE): Define. (MD_POST_EXTRACT_ROOT_ADDR): Define. (MD_POST_EXTRACT_FRAME_ADDR): Define. (MD_POST_FROB_EH_HANDLER_ADDR): Define. (MD_FROB_UPDATE_CONTEXT): Define. (aarch64_post_extract_frame_addr): New function. (aarch64_post_frob_eh_handler_addr): New function. (aarch64_frob_update_context): New function. * config/aarch64/linux-unwind.h: Include aarch64-unwind.h * config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-freebsd*): Initialize md_unwind_header to include aarch64-unwind.h. * unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT". (execute_cfa_program): Multiplex DW_CFA_GNU_window_save for __aarch64__. (uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR. (uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR. (uw_frob_return_addr): New function. (_Unwind_DebugHook): Use uw_frob_return_addr. Since you committed this (r244673), GCC fails to build for AArch64: /tmp/8132498_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2.c: In function 'execute_cfa_program': /tmp/8132498_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2.c:1193:17: error: 'DWARF_REGNUM_AARCH64_RA_STATE' undeclared (first use in this function) fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1; ^ Hi Christophe, could you please confirm you svn revision please? I do have done bootstrap and regression on both x86 and aarch64 before commit this patch. I had forgotten to "svn add" one header file, but add it later. Thanks. Christophe
Re: [Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)
Thanks for the review. On 19/01/17 14:18, Richard Earnshaw (lists) wrote: diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c index 8085a42ace15d53f4cb0c6681717012d906a6d47..cf640135275deb76b820f8209fa51eacfd64c4a2 100644 --- a/libgcc/unwind-dw2.c +++ b/libgcc/unwind-dw2.c @@ -136,6 +136,8 @@ struct _Unwind_Context #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1) /* Context which has version/args_size/by_value fields. */ #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1) + /* Bit reserved on AArch64, return address has been signed with A key. */ +#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1) Why is this here? It appears to only be used within the AArch64-specific header file. I was putting it here so that when we allocate the next general purpose bit, we know clearly that bit 3 is allocated to AArch64 already, and the new general bit needs to go to the next one. This can avoid bit collision. ... +/* Frob exception handler's address kept in TARGET before installing into + CURRENT context. */ + +static void * +uw_frob_return_addr (struct _Unwind_Context *current, + struct _Unwind_Context *target) +{ + void *ret_addr = __builtin_frob_return_addr (target->ra); +#ifdef MD_POST_FROB_EH_HANDLER_ADDR + ret_addr = MD_POST_FROB_EH_HANDLER_ADDR (current, target, ret_addr); +#endif + return ret_addr; +} + I think this function should be marked inline. The optimizers would probably inline it anyway, but it seems wrong for us to rely on that. Thanks, fixed. Does the updated patch looks OK to you know? libgcc/ 2017-01-19 Jiong Wang * config/aarch64/aarch64-unwind.h: New file. (DWARF_REGNUM_AARCH64_RA_STATE): Define. (MD_POST_EXTRACT_ROOT_ADDR): Define. (MD_POST_EXTRACT_FRAME_ADDR): Define. (MD_POST_FROB_EH_HANDLER_ADDR): Define. (MD_FROB_UPDATE_CONTEXT): Define. (aarch64_post_extract_frame_addr): New function. (aarch64_post_frob_eh_handler_addr): New function. (aarch64_frob_update_context): New function. * config/aarch64/linux-unwind.h: Include aarch64-unwind.h * config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-freebsd*): Initialize md_unwind_header to include aarch64-unwind.h. * unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT". (execute_cfa_program): Multiplex DW_CFA_GNU_window_save for __aarch64__. (uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR. (uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR. (uw_frob_return_addr): New function. (_Unwind_DebugHook): Use uw_frob_return_addr. diff --git a/libgcc/config.host b/libgcc/config.host index 6f2e458e74e776a6b7a310919558bcca76389232..540bfa9635802adabb36a2d1b7cf3416462c59f3 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -331,11 +331,13 @@ aarch64*-*-elf | aarch64*-*-rtems*) extra_parts="$extra_parts crtfastmath.o" tmake_file="${tmake_file} ${cpu_type}/t-aarch64" tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm" + md_unwind_header=aarch64/aarch64-unwind.h ;; aarch64*-*-freebsd*) extra_parts="$extra_parts crtfastmath.o" tmake_file="${tmake_file} ${cpu_type}/t-aarch64" tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm" + md_unwind_header=aarch64/aarch64-unwind.h ;; aarch64*-*-linux*) extra_parts="$extra_parts crtfastmath.o" diff --git a/libgcc/config/aarch64/aarch64-unwind.h b/libgcc/config/aarch64/aarch64-unwind.h new file mode 100644 index ..a43d965b358f3e830b85fc42c7bceacf7d41a671 --- /dev/null +++ b/libgcc/config/aarch64/aarch64-unwind.h @@ -0,0 +1,87 @@ +/* Copyright (C) 2017 Free Software Foundation, Inc. + Contributed by ARM Ltd. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +#ifndef AARCH64_UNWIND_H +#define AARCH64_UNWIND_H + +#define DWARF_REGNUM_AARCH64_RA_STATE 34 + +#define MD_POST_EXTRACT_ROOT_ADDR(addr) __builtin_aarch64_xpaclri
[AArch64] Accelerate -fstack-protector through pointer authentication extension
NOTE, this approach however requires DWARF change as the original LR is signed, the binary needs new libgcc to make sure c++ eh works correctly. Given this acceleration already needs the user specify -mstack-protector-dialect=pauth which means the target platform largely should have install new libgcc, otherwise you can't utilize new pointer authentication features. gcc/ 2016-11-11 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_stack_protector_type): New enum. (aarch64_layout_frame): Swap callees and locals when -mstack-protector-dialect=pauth specified. (aarch64_expand_prologue): Use AARCH64_PAUTH_SSP_OR_RA_SIGN instead of AARCH64_ENABLE_RETURN_ADDRESS_SIGN. (aarch64_expand_epilogue): Likewise. * config/aarch64/aarch64.md (*do_return): Likewise. (aarch64_override_options): Sanity check for ILP32 and TARGET_PAUTH. * config/aarch64/aarch64.h (AARCH64_PAUTH_SSP_OPTION, AARCH64_PAUTH_SSP, AARCH64_PAUTH_SSP_OR_RA_SIGN, LINK_SSP_SPEC): New defines. * config/aarch64/aarch64.opt (-mstack-protector-dialect=): New option. * doc/invoke.texi (AArch64 Options): Documents -mstack-protector-dialect=. Patch updated to migrate to TARGET_STACK_PROTECT_RUNTIME_ENABLED_P. aarch64 cross check OK with the following options enabled on all testcases. -fstack-protector-all -mstack-protector-pauth OK for trunk? gcc/ 2017-01-18 Jiong Wang * config/aarch64/aarch64-protos.h (aarch64_pauth_stack_protector_enabled): New declaration. * config/aarch64/aarch64.c (aarch64_layout_frame): Swap callee-save area and locals area when aarch64_pauth_stack_protector_enabled returns true. (aarch64_stack_protect_runtime_enabled): New function. (aarch64_pauth_stack_protector_enabled): New function. (aarch64_return_address_signing_enabled): Enabled by aarch64_pauth_stack_protector_enabled. (aarch64_override_options): Sanity check for -mstack-protector-pauth. (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Define. * config/aarch64/aarch64.h (LINK_SSP_SPEC): Likewise. * config/aarch64/aarch64.opt (-mstack-protector-pauth): New option. * doc/invoke.texi (AArch64 Options): Documents -mstack-protector-pauth. gcc/testsuite/ * gcc.target/aarch64/stack_protector_1.c: New test. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 632dd4768d82c340ae4e9b4a93206743756c06e7..a3ad623eef498d00b52d24bf02a5748fad576c3d 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -383,6 +383,7 @@ void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx, void aarch64_init_expanders (void); void aarch64_init_simd_builtins (void); void aarch64_emit_call_insn (rtx); +bool aarch64_pauth_stack_protector_enabled (void); void aarch64_register_pragmas (void); void aarch64_relayout_simd_types (void); void aarch64_reset_previous_fndecl (void); diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 3718ad1b3bf27c6bdb9e74831fd660e617cccbde..dd742d37ab6fc6fb5085e1c6b5d86d5ce1ce5f8a 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -958,4 +958,11 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); extern tree aarch64_fp16_type_node; extern tree aarch64_fp16_ptr_type_node; +#ifndef TARGET_LIBC_PROVIDES_SSP +#define LINK_SSP_SPEC "%{!mstack-protector-pauth:\ + %{fstack-protector|fstack-protector-all\ + |fstack-protector-strong|fstack-protector-explicit:\ + -lssp_nonshared -lssp}}" +#endif + #endif /* GCC_AARCH64_H */ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 6451b08191cf1a44aed502930da8603111f6e8ca..461f7b59584af9315accaecc0256abc9a2df4350 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2884,8 +2884,28 @@ aarch64_layout_frame (void) else if (cfun->machine->frame.wb_candidate1 != INVALID_REGNUM) max_push_offset = 256; - if (cfun->machine->frame.frame_size < max_push_offset - && crtl->outgoing_args_size == 0) + /* Swap callee-save and local variables area to make callee-save which + includes return address register X30/LR position above local variables + that any local buffer overflow will override return address. */ + if (aarch64_pauth_stack_protector_enabled ()) +{ + if (varargs_and_saved_regs_size < max_push_offset) + /* stp reg1, reg2, [sp, -varargs_and_saved_regs_size]!. */ + cfun->machine->frame.callee_adjust = varargs_and_saved_regs_size; + else + /* sub sp, sp, varargs_and_saved_regs_size. */ + cfun->machine->frame.initial_adjust = varargs_and_saved_regs_si
[Ping~]Re: [5/5][libgcc] Runtime support for AArch64 return address signing (needs new target macros)
On 12/01/17 18:10, Jiong Wang wrote: On 06/01/17 11:47, Jiong Wang wrote: This is the update on libgcc unwinder support according to new DWARF proposal. As Joseph commented, duplication of unwind-dw2.c is not encouraged in libgcc, But from this patch, you can see there are a few places we need to modify for AArch64 in unwind-aarch64.c, so the file duplication approach is acceptable? libgcc/ 2017-01-06 Jiong Wang * config/aarch64/unwind-aarch64.c (DWARF_REGNUM_AARCH64_RA_STATE, RA_A_SIGNED_BIT): New macros. (execute_cfa_program): Multiplex DW_CFA_GNU_window_save on AArch64. (uw_frame_state_for): Clear bit[0] of DWARF_REGNUM_AARCH64_RA_STATE. (uw_update_context): Authenticate return address according to DWARF_REGNUM_AARCH64_RA_STATE. (uw_init_context_1): Strip signature of seed address. (uw_install_context): Re-authenticate EH handler's address. Ping~ For comparision, I have also attached the patch using the target macros. Four new target macros are introduced: MD_POST_EXTRACT_ROOT_ADDR MD_POST_EXTRACT_FRAME_ADDR MD_POST_FROB_EH_HANDLER_ADDR MD_POST_INIT_CONTEXT MD_POST_EXTRACT_ROOT_ADDR is to do target private post processing on the address inside _Unwind* functions, they are serving as root address to start the unwinding. MD_POST_EXTRACT_FRAME_ADDR is to do target private post processing on th address inside the real user program which throws the exceptions. MD_POST_FROB_EH_HANDLER_ADDR is to do target private frob on the EH handler's address before we install it into current context. MD_POST_INIT_CONTEXT it to do target private initialization on the context structure after common initialization. One "__aarch64__" macro check is needed to multiplex DW_CFA_window_save. Ping ~ Could global reviewers or libgcc maintainers please give a review on the generic part change? One small change is I removed MD_POST_INIT_CONTEXT as I found there is MD_FROB_UPDATE_CONTEXT which serve the same purpose. I still need to define MD_POST_EXTRACT_ROOT_ADDR MD_POST_EXTRACT_FRAME_ADDR MD_POST_FROB_EH_HANDLER_ADDR And do one __aarch64__ check to multiplexing DW_CFA_GNU_window_save. Thanks. libgcc/ChangeLog: 2017-01-18 Jiong Wang * config/aarch64/aarch64-unwind.h: New file. (DWARF_REGNUM_AARCH64_RA_STATE): Define. (MD_POST_EXTRACT_ROOT_ADDR): Define. (MD_POST_EXTRACT_FRAME_ADDR): Define. (MD_POST_FROB_EH_HANDLER_ADDR): Define. (MD_FROB_UPDATE_CONTEXT): Define. (aarch64_post_extract_frame_addr): New function. (aarch64_post_frob_eh_handler_addr): New function. (aarch64_frob_update_context): New function. * config/aarch64/linux-unwind.h: Include aarch64-unwind.h * config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-freebsd*): Initialize md_unwind_header to include aarch64-unwind.h. * unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT". (execute_cfa_program): Multiplex DW_CFA_GNU_window_save for __aarch64__. (uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR. (uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR. (uw_frob_return_addr): New function. (_Unwind_DebugHook): Use uw_frob_return_addr. diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c index 8085a42ace15d53f4cb0c6681717012d906a6d47..cf640135275deb76b820f8209fa51eacfd64c4a2 100644 --- a/libgcc/unwind-dw2.c +++ b/libgcc/unwind-dw2.c @@ -136,6 +136,8 @@ struct _Unwind_Context #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1) /* Context which has version/args_size/by_value fields. */ #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1) + /* Bit reserved on AArch64, return address has been signed with A key. */ +#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1) _Unwind_Word flags; /* 0 for now, can be increased when further fields are added to struct _Unwind_Context. */ @@ -1185,6 +1187,11 @@ execute_cfa_program (const unsigned char *insn_ptr, break; case DW_CFA_GNU_window_save: +#ifdef __aarch64__ + /* This CFA is multiplexed with Sparc. On AArch64 it's used to toggle + return address signing status. */ + fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1; +#else /* ??? Hardcoded for SPARC register window configuration. */ if (__LIBGCC_DWARF_FRAME_REGISTERS__ >= 32) for (reg = 16; reg < 32; ++reg) @@ -1192,6 +1199,7 @@ execute_cfa_program (const unsigned char *insn_ptr, fs->regs.reg[reg].how = REG_SAVED_OFFSET; fs->regs.reg[reg].loc.offset = (reg - 16) * sizeof (void *); } +#endif break; case DW_CFA_GNU_args_size: @@ -1513,10 +1521,15 @@ uw_update_context (struct _Unwind_Context *context, _Unwind_FrameState *fs) stack frame. */ context->ra = 0; else -/* Compute the return address now, since the return
Re: [2/5][DWARF] Generate dwarf information for -msign-return-address by introducing new DWARF mapping hook
On 17/01/17 13:57, Richard Earnshaw (lists) wrote: On 16/01/17 14:29, Jiong Wang wrote: I can see the reason for doing this is if you want to seperate the interpretion of GCC CFA reg-note and the final DWARF CFA operation. My understanding is all reg notes defined in gcc/reg-note.def should have general meaning, even the CFA_WINDOW_SAVE. For those which are architecture specific we might need a mechanism to define them in backend only. For general reg-notes in gcc/reg-note.def, they are not always have the corresponding standard DWARF CFA operation, for example CFA_WINDOW_SAVE, therefore if we want to achieve what you described, I think we also need to define a new target hook which maps a GCC CFA reg-note into architecture DWARF CFA operation. Regards, Jiong Here is the patch. Hmm, I really wasn't expecting any more than something like the following in dwarf2cfi.c: @@ -2098,7 +2098,9 @@ dwarf2out_frame_debug (rtx_insn *insn) handled_one = true; break; + case REG_CFA_TOGGLE_RA_MANGLE: case REG_CFA_WINDOW_SAVE: + /* We overload both of these operations onto the same DWARF opcode. */ dwarf2out_frame_debug_cfa_window_save (); handled_one = true; break; This keeps the two reg notes separate within the compiler, but emits the same dwarf operation during final output. This avoids the need for new hooks or anything more complicated. This was my initial thoughts and the patch would be very small as you've demonstrated. I later moved to this complexer patch as I am thinking it's better to completely treat notes in reg-notes.def as having generic meaning and maps them to standard DWARF CFA if there is, otherwise maps them to target private DWARF CFA through this new hook. This give other targets a chance to map, for example REG_CFA_TOGGLE_RA_MANGLE, to their architecture DWARF number. The introduction of new hook looks be very low risk in this stage, the only painful thing is the header file needs to be reorganized as we need to use some DWARF type and reg-note type in targhooks.c. Anyway, if the new hook patch is too heavy, I have attached the the simplified version which simply defines the new REG_CFA_TOGGLE_RA_MANGLE and maps to same code of REG_CFA_WINDOW_SAVE. gcc/ 2017-01-17 Jiong Wang * reg-notes.def (CFA_TOGGLE_RA_MANGLE): New reg-note. * combine-stack-adj.c (no_unhandled_cfa): Handle REG_CFA_TOGGLE_RA_MANGLE. * dwarf2cfi.c (dwarf2out_frame_debug): Handle REG_CFA_TOGGLE_RA_MANGLE. * config/aarch64/aarch64.c (aarch64_expand_prologue): Generates DWARF info for return address signing. (aarch64_expand_epilogue): Likewise. diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c index 20cd59ad08329e9f4f834bfc01d6f9ccc4485283..9ec14a3e44363f35f6419c38233ce5eebddd3458 100644 --- a/gcc/combine-stack-adj.c +++ b/gcc/combine-stack-adj.c @@ -208,6 +208,7 @@ no_unhandled_cfa (rtx_insn *insn) case REG_CFA_SET_VDRAP: case REG_CFA_WINDOW_SAVE: case REG_CFA_FLUSH_QUEUE: + case REG_CFA_TOGGLE_RA_MANGLE: return false; } diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3bcad76b68b6ea7c9d75d150d79c45fb74d6bf0d..6451b08191cf1a44aed502930da8603111f6e8ca 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3553,7 +3553,11 @@ aarch64_expand_prologue (void) /* Sign return address for functions. */ if (aarch64_return_address_signing_enabled ()) -emit_insn (gen_pacisp ()); +{ + insn = emit_insn (gen_pacisp ()); + add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx); + RTX_FRAME_RELATED_P (insn) = 1; +} if (flag_stack_usage_info) current_function_static_stack_size = frame_size; @@ -3707,7 +3711,11 @@ aarch64_expand_epilogue (bool for_sibcall) */ if (aarch64_return_address_signing_enabled () && (for_sibcall || !TARGET_ARMV8_3 || crtl->calls_eh_return)) -emit_insn (gen_autisp ()); +{ + insn = emit_insn (gen_autisp ()); + add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx); + RTX_FRAME_RELATED_P (insn) = 1; +} /* Stack adjustment for exception handler. */ if (crtl->calls_eh_return) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 2748e2fa48e4794181496b26df9b51b7e51e7b84..2a527c9fecab091dccb417492e5dbb2ade244be2 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -2098,7 +2098,9 @@ dwarf2out_frame_debug (rtx_insn *insn) handled_one = true; break; + case REG_CFA_TOGGLE_RA_MANGLE: case REG_CFA_WINDOW_SAVE: + /* We overload both of these operations onto the same DWARF opcode. */ dwarf2out_frame_debug_cfa_window_save (); handled_one = true; break; diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def index ead4a9f58e8621288ee765e029c673640fdf38f4..175da119b6a534b04bd154f2c69dd087afd474ea 100644 --- a/gcc/reg-note
Re: [2/5][DWARF] Generate dwarf information for -msign-return-address by introducing new DWARF mapping hook
On 13/01/17 18:02, Jiong Wang wrote: On 13/01/17 16:09, Richard Earnshaw (lists) wrote: On 06/01/17 11:47, Jiong Wang wrote: This patch is an update on DWARF generation for return address signing. According to new proposal, we simply needs to generate REG_CFA_WINDOW_SAVE annotation. gcc/ 2017-01-06 Jiong Wang * config/aarch64/aarch64.c (aarch64_expand_prologue): Generate dwarf annotation (REG_CFA_WINDOW_SAVE) for return address signing. (aarch64_expand_epilogue): Likewise. I don't think we should be overloading REG_CFA_WINDOW_SAVE internally in the compiler -- it's one thing to do it in the dwarf output tables, but quite another to be doing it elsewhere in the compiler. Instead we should create a new reg note kind and use that, but in the final dwarf output then emit the overloaded opcode. I can see the reason for doing this is if you want to seperate the interpretion of GCC CFA reg-note and the final DWARF CFA operation. My understanding is all reg notes defined in gcc/reg-note.def should have general meaning, even the CFA_WINDOW_SAVE. For those which are architecture specific we might need a mechanism to define them in backend only. For general reg-notes in gcc/reg-note.def, they are not always have the corresponding standard DWARF CFA operation, for example CFA_WINDOW_SAVE, therefore if we want to achieve what you described, I think we also need to define a new target hook which maps a GCC CFA reg-note into architecture DWARF CFA operation. Regards, Jiong Here is the patch. Introduced one new target hook TARGET_DWARF_MAP_REGNOTE_TO_CFA. The purpose is to allow GCC to map DWARF CFA reg notes in reg-note.def, which looks to me have generic meaning, into target private DWARF CFI if there is no standard DWARF CFI mapping. One new GCC reg-note REG_TOGGLE_RA_MANGLE introduced as well, currently, it's only used by AArch64 to implement return address signing and is mapped to AArch64's target private DWARF CFI. Does this approach and implementation looks OK? I can come up with seperate patches to define this hook on Sparc for CFA_WINDOW_SAVE, and to remove redundant including of dwarf2.h although there is "ifdef" protector in header file. The default hook implementation "default_dwarf_map_regnote_to_cfa" in targhooks.c used the types "enum reg_note" and "enum dwarf_call_frame_info" which is not included in coretypes.h thus this patch has several change in header files. I have done X86 bootstrap to make sure no build breakage. I'd appreciate there is better ideas to handle these type define. Thanks. gcc/ChangeLog: 2017-01-16 Jiong Wang * target.def (dwarf_map_regnote_to_cfa): New hook. * targhooks.c (default_dwarf_map_regnote_to_cfa): Default implementation for TARGET_DWARF_MAP_REGNOTE_TO_CFA. * targhooks.h (default_dwarf_map_regnote_to_cfa): New declaration. * rtl.h (enum reg_note): Move enum reg_note to... * coretypes.h: ... here. (dwarf2.h): New include file. * reg-notes.def (CFA_TOGGLE_RA_MANGLE): New reg-note. * combine-stack-adj.c (no_unhandled_cfa): Handle REG_CFA_TOGGLE_RA_MANGLE. * dwarf2cfi.c (dwarf2out_frame_debug_cfa_toggle_ra_mangle): New function. (dwarf2out_frame_debug): Handle REG_CFA_TOGGLE_RA_MANGLE. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Documents TARGET_DWARF_MAP_REGNOTE_TO_CFA. * config/aarch64/aarch64.c (aarch64_map_regnote_to_cfa): Implements TARGET_DWARF_MAP_REGNOTE_TO_CFA. (aarch64_expand_prologue): Generate DWARF info for return address signing. (aarch64_expand_epilogue): Likewise. (TARGET_DWARF_MAP_REGNOTE_TO_CFA): Define. diff --git a/gcc/target.def b/gcc/target.def index 0443390..6aaa9e6 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -3995,6 +3995,14 @@ the CFI label attached to the insn, @var{pattern} is the pattern of\n\ the insn and @var{index} is @code{UNSPEC_INDEX} or @code{UNSPECV_INDEX}.", void, (const char *label, rtx pattern, int index), NULL) +/* This target hook allows the backend to map GCC DWARF CFA reg-note to + architecture specific DWARF call frame instruction. */ +DEFHOOK +(dwarf_map_regnote_to_cfa, + "Maps the incoming GCC DWARF CFA reg-note to architecture specific DWARF call\ + frame instruction.", + enum dwarf_call_frame_info, (enum reg_note), default_dwarf_map_regnote_to_cfa) + /* ??? Documenting this hook requires a GFDL license grant. */ DEFHOOK_UNDOC (stdarg_optimize_hook, diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 2f2abd3..df07911 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1711,6 +1711,17 @@ default_dwarf_frame_reg_mode (int regno) return save_mode; } +/* Determine the correct mode for a Dwarf frame register that represents + register REGNO. */ + +enum dwarf_call_frame_in
Re: [2/5][AArch64] Generate dwarf information for -msign-return-address
On 13/01/17 16:09, Richard Earnshaw (lists) wrote: On 06/01/17 11:47, Jiong Wang wrote: This patch is an update on DWARF generation for return address signing. According to new proposal, we simply needs to generate REG_CFA_WINDOW_SAVE annotation. gcc/ 2017-01-06 Jiong Wang * config/aarch64/aarch64.c (aarch64_expand_prologue): Generate dwarf annotation (REG_CFA_WINDOW_SAVE) for return address signing. (aarch64_expand_epilogue): Likewise. I don't think we should be overloading REG_CFA_WINDOW_SAVE internally in the compiler -- it's one thing to do it in the dwarf output tables, but quite another to be doing it elsewhere in the compiler. Instead we should create a new reg note kind and use that, but in the final dwarf output then emit the overloaded opcode. I can see the reason for doing this is if you want to seperate the interpretion of GCC CFA reg-note and the final DWARF CFA operation. My understanding is all reg notes defined in gcc/reg-note.def should have general meaning, even the CFA_WINDOW_SAVE. For those which are architecture specific we might need a mechanism to define them in backend only. For general reg-notes in gcc/reg-note.def, they are not always have the corresponding standard DWARF CFA operation, for example CFA_WINDOW_SAVE, therefore if we want to achieve what you described, I think we also need to define a new target hook which maps a GCC CFA reg-note into architecture DWARF CFA operation. Regards, Jiong
Re: [1/5][AArch64] Return address protection on AArch64
On 13/01/17 16:04, James Greenhalgh wrote: On Fri, Jan 06, 2017 at 11:47:07AM +, Jiong Wang wrote: On 11/11/16 18:22, Jiong Wang wrote: gcc/ 2017-01-06 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_function_type): New enum. * config/aarch64/aarch64-protos.h (aarch64_return_address_signing_enabled): New declaration. * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled): New function. (aarch64_expand_prologue): Sign return address before it's pushed onto stack. (aarch64_expand_epilogue): Authenticate return address fetched from stack. (aarch64_override_options): Sanity check for ILP32 and ISA level. (aarch64_attributes): New function attributes for "sign-return-address". * config/aarch64/aarch64.md (UNSPEC_AUTI1716, UNSPEC_AUTISP, UNSPEC_PACI1716, UNSPEC_PACISP, UNSPEC_XPACLRI): New unspecs. ("*do_return"): Generate combined instructions according to key index. ("sp", " I have a few comments on this patch All fixed. New patch attached. gcc/ 2017-01-13 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_function_type): New enum. * config/aarch64/aarch64-protos.h (aarch64_return_address_signing_enabled): New declaration. * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled): New function. (aarch64_expand_prologue): Sign return address before it's pushed onto stack. (aarch64_expand_epilogue): Authenticate return address fetched from stack. (aarch64_override_options): Sanity check for ILP32 and ISA level. (aarch64_attributes): New function attributes for "sign-return-address". * config/aarch64/aarch64.md (UNSPEC_AUTI1716, UNSPEC_AUTISP, UNSPEC_PACI1716, UNSPEC_PACISP, UNSPEC_XPACLRI): New unspecs. ("*do_return"): Generate combined instructions according to key index. ("sp", "calls_eh_return) + return "retaa"; + +return "ret"; + } [(set_attr "type" "branch")] ) @@ -5341,6 +5353,36 @@ [(set_attr "length" "0")] ) +;; Pointer authentication patterns are always provided. In architecture +;; revisions prior to ARMv8.3-A these HINT instructions operate as NOPs. +;; This lets the user write portable software which authenticates pointers +;; when run on something which implements ARMv8.3-A, and which runs +;; correctly, but does not authenticate pointers, where ARMv8.3-A is not +;; implemented. + +;; Signing/Authenticating R30 using SP as the salt. +(define_insn "sp" + [(set (reg:DI R30_REGNUM) + (unspec:DI [(reg:DI R30_REGNUM) (reg:DI SP_REGNUM)] PAUTH_LR_SP))] + "" + "hint\t // asp"; +) + +;; Signing/Authenticating X17 using X16 as the salt. +(define_insn "1716" + [(set (reg:DI R17_REGNUM) + (unspec:DI [(reg:DI R17_REGNUM) (reg:DI R16_REGNUM)] PAUTH_17_16))] + "" + "hint\t // a1716"; +) + +;; Stripping the signature in R30. +(define_insn "xpaclri" + [(set (reg:DI R30_REGNUM) (unspec:DI [(reg:DI R30_REGNUM)] UNSPEC_XPACLRI))] + "" + "hint\t7 // xpaclri" +) + ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and ;; all of memory. This blocks insns from being moved across this point. diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 56b920d..5436884 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -149,6 +149,23 @@ mpc-relative-literal-loads Target Report Save Var(pcrelative_literal_loads) Init(2) Save PC relative literal loads. +msign-return-address= +Target RejectNegative Report Joined Enum(aarch64_ra_sign_scope_t) Var(aarch64_ra_sign_scope) Init(AARCH64_FUNCTION_NONE) Save +Select return address signing scope. + +Enum +Name(aarch64_ra_sign_scope_t) Type(enum aarch64_function_type) +Supported AArch64 return address signing scope (for use with -msign-return-address= option): + +EnumValue +Enum(aarch64_ra_sign_scope_t) String(none) Value(AARCH64_FUNCTION_NONE) + +EnumValue +Enum(aarch64_ra_sign_scope_t) String(non-leaf) Value(AARCH64_FUNCTION_NON_LEAF) + +EnumValue +Enum(aarch64_ra_sign_scope_t) String(all) Value(AARCH64_FUNCTION_ALL) + mlow-precision-recip-sqrt Common Var(flag_mrecip_low_precision_sqrt) Optimization Enable the reciprocal square root approximation. Enabling this reduces diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index e2377c1..c59d31e 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -1032,6 +1032,10 @@ (define_int_iterator FMAXMIN_UNS [UNSPEC_FMAX UNSPEC_FMIN UNSPEC_FMAXNM UNSPEC_FMINNM]) +(define_int_iterator PAUTH_LR_SP [UNSPEC_PACISP UNSPEC
[Ping~]Re: [5/5][AArch64, libgcc] Runtime support for AArch64 return address signing (also attached target macros version)
On 06/01/17 11:47, Jiong Wang wrote: This is the update on libgcc unwinder support according to new DWARF proposal. As Joseph commented, duplication of unwind-dw2.c is not encouraged in libgcc, But from this patch, you can see there are a few places we need to modify for AArch64 in unwind-aarch64.c, so the file duplication approach is acceptable? libgcc/ 2017-01-06 Jiong Wang * config/aarch64/unwind-aarch64.c (DWARF_REGNUM_AARCH64_RA_STATE, RA_A_SIGNED_BIT): New macros. (execute_cfa_program): Multiplex DW_CFA_GNU_window_save on AArch64. (uw_frame_state_for): Clear bit[0] of DWARF_REGNUM_AARCH64_RA_STATE. (uw_update_context): Authenticate return address according to DWARF_REGNUM_AARCH64_RA_STATE. (uw_init_context_1): Strip signature of seed address. (uw_install_context): Re-authenticate EH handler's address. Ping~ For comparision, I have also attached the patch using the target macros. Four new target macros are introduced: MD_POST_EXTRACT_ROOT_ADDR MD_POST_EXTRACT_FRAME_ADDR MD_POST_FROB_EH_HANDLER_ADDR MD_POST_INIT_CONTEXT MD_POST_EXTRACT_ROOT_ADDR is to do target private post processing on the address inside _Unwind* functions, they are serving as root address to start the unwinding. MD_POST_EXTRACT_FRAME_ADDR is to do target private post processing on th address inside the real user program which throws the exceptions. MD_POST_FROB_EH_HANDLER_ADDR is to do target private frob on the EH handler's address before we install it into current context. MD_POST_INIT_CONTEXT it to do target private initialization on the context structure after common initialization. One "__aarch64__" macro check is needed to multiplex DW_CFA_window_save. libgcc/ChangeLog: 2017-01-11 Jiong Wang * config/aarch64/aarch64-unwind.h: New file. (DWARF_REGNUM_AARCH64_RA_STATE): Define. (MD_POST_EXTRACT_ROOT_ADDR): Define. (MD_POST_EXTRACT_FRAME_ADDR): Define. (MD_POST_FROB_EH_HANDLER_ADDR): Define. (MD_POST_INIT_CONTEXT): Define. * config/aarch64/linux-unwind.h: Include aarch64-unwind.h * config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-freebsd*): Initialize md_unwind_header to include aarch64-unwind.h. * unwind-dw2.c (struct _Unwind_Context): Define "RA_A_SIGNED_BIT". (execute_cfa_program): Multiplex DW_CFA_GNU_window_save for __aarch64__. (uw_update_context): Honor MD_POST_EXTRACT_FRAME_ADDR. (uw_init_context_1): Honor MD_POST_EXTRACT_ROOT_ADDR and MD_POST_INIT_CONTEXT. (uw_frob_return_addr): New function. (_Unwind_DebugHook): Use uw_frob_return_addr. diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c index 8085a42ace15d53f4cb0c6681717012d906a6d47..35010a4065bb83f706701cb05392193f0ffa1f11 100644 --- a/libgcc/unwind-dw2.c +++ b/libgcc/unwind-dw2.c @@ -136,6 +136,8 @@ struct _Unwind_Context #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1) /* Context which has version/args_size/by_value fields. */ #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1) + /* Bit reserved on AArch64, return address has been signed with A key. */ +#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1) _Unwind_Word flags; /* 0 for now, can be increased when further fields are added to struct _Unwind_Context. */ @@ -1185,6 +1187,11 @@ execute_cfa_program (const unsigned char *insn_ptr, break; case DW_CFA_GNU_window_save: +#ifdef __aarch64__ + /* This CFA is multiplexed with Sparc. On AArch64 it's used to toggle + return address signing status. */ + fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1; +#else /* ??? Hardcoded for SPARC register window configuration. */ if (__LIBGCC_DWARF_FRAME_REGISTERS__ >= 32) for (reg = 16; reg < 32; ++reg) @@ -1192,6 +1199,7 @@ execute_cfa_program (const unsigned char *insn_ptr, fs->regs.reg[reg].how = REG_SAVED_OFFSET; fs->regs.reg[reg].loc.offset = (reg - 16) * sizeof (void *); } +#endif break; case DW_CFA_GNU_args_size: @@ -1513,10 +1521,15 @@ uw_update_context (struct _Unwind_Context *context, _Unwind_FrameState *fs) stack frame. */ context->ra = 0; else -/* Compute the return address now, since the return address column - can change from frame to frame. */ -context->ra = __builtin_extract_return_addr - (_Unwind_GetPtr (context, fs->retaddr_column)); +{ + /* Compute the return address now, since the return address column + can change from frame to frame. */ + context->ra = __builtin_extract_return_addr + (_Unwind_GetPtr (context, fs->retaddr_column)); +#ifdef MD_POST_EXTRACT_FRAME_ADDR + context->ra = MD_POST_EXTRACT_FRAME_ADDR (context, fs, context->ra); +#endif +} } static void @@ -1550,6 +1563,9 @@ uw_init_context_
[4/5][AArch64, libgcc] Let AArch64 use customized unwinder file
On 11/11/16 18:22, Jiong Wang wrote: We need customized EH unwinder support for AArch64 DWARF operations introduced earlier in this patchset, these changes mostly need to be done in the generic file unwind-dw2.c. There are two ways of introducing these AArch64 support: * Introducing a few target macros so we can customize functions like uw_init_context, uw_install_context etc. * Use target private unwind-dw2 implementation, i.e duplicate the generic unwind-dw2.c into target config directory and use it instead of generic one. This is current used by IA64 and CR16 is using. I am not sure which approach is the convention in libgcc, Ian, any comments on this? Thanks. This patch is the start of using approach 2 includes necessary Makefile support and copying of original unwind-dw2.c. A follow up patch will implement those AArch64 specific stuff so the change will be very clear. OK for trunk? libgcc/ 2016-11-08 Jiong Wang * config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-linux*): Include new AArch64 EH makefile. * config/aarch64/t-eh-aarch64: New EH makefile. * config/aarch64/unwind-aarch64.c: New EH unwinder implementation, copied from unwind-dw2.c. Ping ~ No change on this patch for new DWARF proposal.
[3/5][AArch64] New builtins required by libgcc unwinder
On 11/11/16 18:22, Jiong Wang wrote: This patch implements a few ARMv8.3-A new builtins for pointer sign and authentication instructions. Currently, these builtins are supposed to be used by libgcc EH unwinder only. They are not public interface to external user. OK to install? gcc/ 2016-11-11 Jiong Wang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): New entries for AARCH64_PAUTH_BUILTIN_PACI1716, AARCH64_PAUTH_BUILTIN_AUTIA1716, AARCH64_PAUTH_BUILTIN_AUTIB1716, AARCH64_PAUTH_BUILTIN_XPACLRI. (aarch64_init_v8_3_builtins): New. (aarch64_init_builtins): Call aarch64_init_builtins. (arch64_expand_builtin): Expand new builtins. This patch is an update on builtins support. All these builtins are to be internally used by libgcc only, so the updates only keeps those used. OK for trunk? gcc/ 2017-01-06 Jiong Wang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): New entries for AARCH64_PAUTH_BUILTIN_XPACLRI, AARCH64_PAUTH_BUILTIN_PACIA1716, AARCH64_PAUTH_BUILTIN_AUTIA1716); (aarch64_init_pauth_hint_builtins): New. (aarch64_init_builtins): Call aarch64_init_pauth_hint_builtins. (aarch64_expand_builtin): Expand new builtins. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 69fb756f0fbdc016f35ce1d08f2aaf092a034704..9ae9d9afc9c141235d7eee037d5571b9f35edc31 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -376,6 +376,10 @@ enum aarch64_builtins AARCH64_CRC32_BUILTIN_BASE, AARCH64_CRC32_BUILTINS AARCH64_CRC32_BUILTIN_MAX, + /* ARMv8.3-A Pointer Authentication Builtins. */ + AARCH64_PAUTH_BUILTIN_AUTIA1716, + AARCH64_PAUTH_BUILTIN_PACIA1716, + AARCH64_PAUTH_BUILTIN_XPACLRI, AARCH64_BUILTIN_MAX }; @@ -923,6 +927,33 @@ aarch64_init_fp16_types (void) aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node); } +/* Pointer authentication builtins that will become NOP on legacy platform. + Currently, these builtins are for internal use only (libgcc EH unwinder). */ + +void +aarch64_init_pauth_hint_builtins (void) +{ + /* Pointer Authentication builtins. */ + tree ftype_pointer_auth += build_function_type_list (ptr_type_node, ptr_type_node, +unsigned_intDI_type_node, NULL_TREE); + tree ftype_pointer_strip += build_function_type_list (ptr_type_node, ptr_type_node, NULL_TREE); + + aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_AUTIA1716] += add_builtin_function ("__builtin_aarch64_autia1716", ftype_pointer_auth, + AARCH64_PAUTH_BUILTIN_AUTIA1716, BUILT_IN_MD, NULL, + NULL_TREE); + aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_PACIA1716] += add_builtin_function ("__builtin_aarch64_pacia1716", ftype_pointer_auth, + AARCH64_PAUTH_BUILTIN_PACIA1716, BUILT_IN_MD, NULL, + NULL_TREE); + aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_XPACLRI] += add_builtin_function ("__builtin_aarch64_xpaclri", ftype_pointer_strip, + AARCH64_PAUTH_BUILTIN_XPACLRI, BUILT_IN_MD, NULL, + NULL_TREE); +} + void aarch64_init_builtins (void) { @@ -951,6 +982,10 @@ aarch64_init_builtins (void) aarch64_init_crc32_builtins (); aarch64_init_builtin_rsqrt (); + +/* Initialize pointer authentication builtins which are backed by instructions + in NOP encoding space. */ + aarch64_init_pauth_hint_builtins (); } tree @@ -1293,6 +1328,43 @@ aarch64_expand_builtin (tree exp, } emit_insn (pat); return target; +case AARCH64_PAUTH_BUILTIN_AUTIA1716: +case AARCH64_PAUTH_BUILTIN_PACIA1716: +case AARCH64_PAUTH_BUILTIN_XPACLRI: + arg0 = CALL_EXPR_ARG (exp, 0); + op0 = force_reg (Pmode, expand_normal (arg0)); + + if (!target) + target = gen_reg_rtx (Pmode); + else + target = force_reg (Pmode, target); + + emit_move_insn (target, op0); + + if (fcode == AARCH64_PAUTH_BUILTIN_XPACLRI) + { + rtx lr = gen_rtx_REG (Pmode, R30_REGNUM); + icode = CODE_FOR_xpaclri; + emit_move_insn (lr, op0); + emit_insn (GEN_FCN (icode) ()); + emit_move_insn (target, lr); + } + else + { + tree arg1 = CALL_EXPR_ARG (exp, 1); + rtx op1 = force_reg (Pmode, expand_normal (arg1)); + icode = (fcode == AARCH64_PAUTH_BUILTIN_PACIA1716 + ? CODE_FOR_paci1716 : CODE_FOR_auti1716); + + rtx x16_reg = gen_rtx_REG (Pmode, R16_REGNUM); + rtx x17_reg = gen_rtx_REG (Pmode, R17_REGNUM); + emit_move_insn (x17_reg, op0); + emit_move_insn (x16_reg, op1); + emit_insn (GEN_FCN (icode) ()); + emit_move_insn (target, x17_reg); + } + + return target; } if (fcode >= AARCH64_SIMD_BUILTIN_BASE && fcode <= AARCH64_SIMD_BUILTIN_MAX)
[5/5][AArch64, libgcc] Runtime support for AArch64 DWARF operations
On 11/11/16 18:22, Jiong Wang wrote: This patch add AArch64 specific runtime EH unwinding support for DW_OP_AARCH64_pauth, DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref. The semantics of them are described at the specification in patch [1/9]. The support includes: * Parsing these DWARF operations. Perform unwinding actions according to their semantics. * Handling eh_return multi return paths. Function calling __builtin_eh_return (_Unwind_RaiseException*) will have multiple return paths. One is for normal exit, the other is for install EH handler. If the _Unwind_RaiseException itself is return address signed, then there will always be return address authentication before return, however, if the return path in _Unwind_RaiseException if from installing EH handler the address of which has already been authenticated during unwinding, then we need to re-sign that address, so when the execution flow continues at _Unwind_RaiseException's epilogue, the authentication still works correctly. OK for trunk? libgcc/ 2016-11-11 Jiong Wang * config/aarch64/unwind-aarch64.c (RA_SIGN_BIT): New flag to indicate one frame is return address signed. (execute_stack_op): Handle DW_OP_AARCH64_pauth, DW_OP_AARCH64_paciasp, DW_OP_AARCH64_paciasp_deref. (uw_init_context): Call aarch64_uw_init_context_1. (uw_init_context_1): Rename to aarch64_uw_init_context_1. Strip signature for seed address. (uw_install_context): Re-sign handler's address so it works correctly with caller's context. (uw_install_context_1): by_value[LR] can be true, after return address signing LR will come from DWARF value expression rule which is a by_value true rule. This is the update on libgcc unwinder support according to new DWARF proposal. As Joseph commented, duplication of unwind-dw2.c is not encouraged in libgcc, But from this patch, you can see there are a few places we need to modify for AArch64 in unwind-aarch64.c, so the file duplication approach is acceptable? libgcc/ 2017-01-06 Jiong Wang * config/aarch64/unwind-aarch64.c (DWARF_REGNUM_AARCH64_RA_STATE, RA_A_SIGNED_BIT): New macros. (execute_cfa_program): Multiplex DW_CFA_GNU_window_save on AArch64. (uw_frame_state_for): Clear bit[0] of DWARF_REGNUM_AARCH64_RA_STATE. (uw_update_context): Authenticate return address according to DWARF_REGNUM_AARCH64_RA_STATE. (uw_init_context_1): Strip signature of seed address. (uw_install_context): Re-authenticate EH handler's address. diff --git a/libgcc/config/aarch64/unwind-aarch64.c b/libgcc/config/aarch64/unwind-aarch64.c index 1fb6026d123f8e7fc676f5e95e8e66caccf3d6ff..11e3c9f724c9bc5796103a0d973bfe769d23b6e7 100644 --- a/libgcc/config/aarch64/unwind-aarch64.c +++ b/libgcc/config/aarch64/unwind-aarch64.c @@ -37,6 +37,9 @@ #include "gthr.h" #include "unwind-dw2.h" +/* This is a copy of libgcc/unwind-dw2.c with AArch64 return address signing + support. */ + #ifdef HAVE_SYS_SDT_H #include #endif @@ -55,6 +58,8 @@ #define PRE_GCC3_DWARF_FRAME_REGISTERS __LIBGCC_DWARF_FRAME_REGISTERS__ #endif +#define DWARF_REGNUM_AARCH64_RA_STATE 32 + /* ??? For the public function interfaces, we tend to gcc_assert that the column numbers are in range. For the dwarf2 unwind info this does happen, although so far in a case that doesn't actually matter. @@ -136,6 +141,8 @@ struct _Unwind_Context #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1) /* Context which has version/args_size/by_value fields. */ #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1) + /* Return address has been signed with A key. */ +#define RA_A_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1) _Unwind_Word flags; /* 0 for now, can be increased when further fields are added to struct _Unwind_Context. */ @@ -1185,13 +1192,9 @@ execute_cfa_program (const unsigned char *insn_ptr, break; case DW_CFA_GNU_window_save: - /* ??? Hardcoded for SPARC register window configuration. */ - if (__LIBGCC_DWARF_FRAME_REGISTERS__ >= 32) - for (reg = 16; reg < 32; ++reg) - { - fs->regs.reg[reg].how = REG_SAVED_OFFSET; - fs->regs.reg[reg].loc.offset = (reg - 16) * sizeof (void *); - } + /* This CFA is multiplexed with Sparc. On AArch64 it's used to toggle + return address signing status. */ + fs->regs.reg[DWARF_REGNUM_AARCH64_RA_STATE].loc.offset ^= 1; break; case DW_CFA_GNU_args_size: @@ -1263,6 +1266,8 @@ uw_frame_state_for (struct _Unwind_Context *context, _Unwind_FrameState *fs) /* First decode all the insns in the CIE. */ end = (const unsigned char *) next_fde ((const struct dwarf_fde *) cie); execute_cfa_program (insn, end, context, fs); + /* Clear bit 0 of RA_STATE pse
[2/5][AArch64] Generate dwarf information for -msign-return-address
On 11/11/16 18:22, Jiong Wang wrote: This patch generate DWARF description for pointer authentication. DWARF value expression is used to describe the authentication action. Please see the cover letter and AArch64 DWARF specification for the semantics of AArch64 DWARF operations. When authentication key index is A key, we use compact DWARF description which can largely save DWARF frame size, otherwise we fallback to general operator. Example === int cal (int a, int b, int c) { return a + dec (b) + c; } Compact DWARF description (-march=armv8.3-a -msign-return-address) === DW_CFA_advance_loc: 4 to 0004 DW_CFA_val_expression: r30 (x30) (DW_OP_AARCH64_paciasp) DW_CFA_advance_loc: 4 to 0008 DW_CFA_val_expression: r30 (x30) (DW_OP_AARCH64_paciasp_deref: -24) General DWARF description === (-march=armv8.3-a -msign-return-address -mpauth-key=b_key) DW_CFA_advance_loc: 4 to 0004 DW_CFA_val_expression: r30 (x30) (DW_OP_breg30 (x30): 0; DW_OP_AARCH64_pauth: 18) DW_CFA_advance_loc: 4 to 0008 DW_CFA_val_expression: r30 (x30) (DW_OP_dup; DW_OP_const1s: -24; DW_OP_plus; DW_OP_deref; DW_OP_AARCH64_pauth: 18) From Linux kernel testing, -msign-return-address will introduce +24% .debug_frame size increase when signing all functions and using compact description, and about +45% .debug_frame size increase if using general description. gcc/ 2016-11-11 Jiong Wang * config/aarch64/aarch64.h (aarch64_pauth_action_type): New enum. * config/aarch64/aarch64.c (aarch64_attach_ra_auth_dwarf_note): New function. (aarch64_attach_ra_auth_dwarf_general): New function. (aarch64_attach_ra_auth_dwarf_shortcut): New function. (aarch64_save_callee_saves): Generate dwarf information if LR is signed. (aarch64_expand_prologue): Likewise. (aarch64_expand_epilogue): Likewise. This patch is an update on DWARF generation for return address signing. According to new proposal, we simply needs to generate REG_CFA_WINDOW_SAVE annotation. gcc/ 2017-01-06 Jiong Wang * config/aarch64/aarch64.c (aarch64_expand_prologue): Generate dwarf annotation (REG_CFA_WINDOW_SAVE) for return address signing. (aarch64_expand_epilogue): Likewise. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 002895a167ce0deb45a5c1726527651af18bb4df..20ed79e5690f45ec121ef516245c686cc0cc82b5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3553,7 +3553,11 @@ aarch64_expand_prologue (void) /* Sign return address for functions. */ if (aarch64_return_address_signing_enabled ()) -emit_insn (gen_pacisp ()); +{ + insn = emit_insn (gen_pacisp ()); + add_reg_note (insn, REG_CFA_WINDOW_SAVE, const0_rtx); + RTX_FRAME_RELATED_P (insn) = 1; +} if (flag_stack_usage_info) current_function_static_stack_size = frame_size; @@ -3698,7 +3702,11 @@ aarch64_expand_epilogue (bool for_sibcall) want to use the CFA of the function which calls eh_return. */ if (aarch64_return_address_signing_enabled () && (for_sibcall || !TARGET_ARMV8_3 || crtl->calls_eh_return)) -emit_insn (gen_autisp ()); +{ + insn = emit_insn (gen_autisp ()); + add_reg_note (insn, REG_CFA_WINDOW_SAVE, const0_rtx); + RTX_FRAME_RELATED_P (insn) = 1; +} /* Stack adjustment for exception handler. */ if (crtl->calls_eh_return)
[1/5][AArch64] Return address protection on AArch64
On 11/11/16 18:22, Jiong Wang wrote: As described in the cover letter, this patch implements return address signing for AArch64, it's controlled by the new option: -msign-return-address=[none | non-leaf | all] "none" means don't do return address signing at all on any function. "non-leaf" means only sign non-leaf function. "all" means sign all functions. Return address signing is currently disabled on ILP32. I haven't tested it. The instructions added in the architecture are of 2 kinds. * In the NOP instruction space, which allows binaries to run without any traps on older versions of the architecture. This doesn't give any additional protection on older hardware but allows for the same binary to be used on earlier versions of the architecture and newer versions of the architecture. * New instructions that are only valid for v8.3 and will trap if used on earlier versions of the architecture. At default, once return address signing is enabled, it will only generates NOP instruction. While if -march=armv8.3-a specified, GCC will try to use the most efficient pointer authentication instruction as it can. The architecture has 2 user invisible system keys for signing and creating signed addresses as part of these instructions. For some use case, the user might want to use difference key for different functions. The new option "-msign-return-address-key=key_name" let GCC select the key used for return address signing. Permissible values are "a_key" for A key and "b_key" for B key, and this option are supported by function target attribute and LTO will hopefully just work. gcc/ 2016-11-09 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_pauth_key_index): New enum. (aarch64_function_type): New enum. * config/aarch64/aarch64-protos.h (aarch64_output_sign_auth_reg): New declaration. * config/aarch64/aarch64.c (aarch64_expand_prologue): Sign return address before it's pushed onto stack. (aarch64_expand_epilogue): Authenticate return address fetched from stack. (aarch64_output_sign_auth_reg): New function. (aarch64_override_options): Sanity check for ILP32 and ISA level. (aarch64_attributes): New function attributes for "sign-return-address", "pauth-key". * config/aarch64/aarch64.md (UNSPEC_AUTH_REG, UNSPEC_AUTH_REG1716, UNSPEC_SIGN_REG, UNSPEC_SIGN_REG1716, UNSPEC_STRIP_REG_SIGN, UNSPEC_STRIP_X30_SIGN): New unspecs. ("*do_return"): Generate combined instructions according to key index. ("sign_reg", "sign_reg1716", "auth_reg", "auth_reg1716", "strip_reg_sign", "strip_lr_sign"): New. * config/aarch64/aarch64.opt (msign-return-address, mpauth-key): New. * config/aarch64/predicates.md (aarch64_const0_const1): New predicate. * doc/extend.texi (AArch64 Function Attributes): Documents "sign-return-address=", "pauth-key". * doc/invoke.texi (AArch64 Options): Documents "-msign-return-address=", "-pauth-key". gcc/testsuite/ 2016-11-09 Jiong Wang * gcc.target/aarch64/return_address_sign_1.c: New testcase. * gcc.target/aarch64/return_address_sign_scope_1.c: New testcase. Update the patchset according to new DWARF proposal described at https://gcc.gnu.org/ml/gcc-patches/2016-11/msg03010.html While A key support for return address signing using DW_CFA_GNU_window_save only needs simple modifications on code and associated DWARF generation, B key support is complexer, it needs multiple CIE support in GCC and Binutils, so currently we fall back to DWARF value expression which fully works although requires longer encodings. Value expression also requires a few changes on AArch64 prologue and epilogue hooks that code review will not be easy. Therefore I have removed all B key support code in the initial support patch set, and will organize them into a seperate follow up patchset so that we can do A key code review first. This patch is an update on the return address signing code generation. gcc/ 2017-01-06 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_function_type): New enum. * config/aarch64/aarch64-protos.h (aarch64_return_address_signing_enabled): New declaration. * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled): New function. (aarch64_expand_prologue): Sign return address before it's pushed onto stack. (aarch64_expand_epilogue): Authenticate return address fetched from stack. (aarch64_override_options): Sanity check for ILP32 and ISA level. (aarch64_attributes): New function attributes for "sign-return-address".
[Ping~][AArch64] Add commandline support for -march=armv8.3-a
On 11/11/16 18:22, Jiong Wang wrote: This patch add command line support for ARMv8.3-A through new architecture: -march=armv8.3-a ARMv8.3-A implies all default features of ARMv8.2-A and meanwhile it includes the new pointer authentication extension. gcc/ 2016-11-08 Jiong Wang * config/aarch64/aarch64-arches.def: New entry for "armv8.3-a". * config/aarch64/aarch64.h (AARCH64_FL_PAUTH, AARCH64_FL_V8_3, AARCH64_FL_FOR_ARCH8_3, AARCH64_ISA_PAUTH, AARCH64_ISA_V8_3, TARGET_PAUTH, TARGET_ARMV8_3): New. * doc/invoke.texi (AArch64 Options): Document "armv8.3-a". Ping ~ As pointer authentication extension is defined to be mandatory extension on ARMv8.3-A and is not optional, I adjusted the patch slightly. This also let GCC treating pointer authentication extension in consistent way with Binutils. OK for trunk? gcc/ 2017-01-06 Jiong Wang * config/aarch64/aarch64-arches.def: New entry for "armv8.3-a". * config/aarch64/aarch64.h (AARCH64_FL_V8_3, AARCH64_FL_FOR_ARCH8_3, AARCH64_ISA_V8_3, TARGET_ARMV8_3): New. * doc/invoke.texi (AArch64 Options): Document "armv8.3-a". diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index 830a7cf545532c050847a8c915d21bef12152388..ce6f73b3e5853b3d40e07545b9581298c768edca 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -33,5 +33,6 @@ AARCH64_ARCH("armv8-a", generic, 8A, 8, AARCH64_FL_FOR_ARCH8) AARCH64_ARCH("armv8.1-a", generic, 8_1A, 8, AARCH64_FL_FOR_ARCH8_1) AARCH64_ARCH("armv8.2-a", generic, 8_2A, 8, AARCH64_FL_FOR_ARCH8_2) +AARCH64_ARCH("armv8.3-a", generic, 8_3A, 8, AARCH64_FL_FOR_ARCH8_3) #undef AARCH64_ARCH diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 584ff5c43afcd1a7918019b09165371bb88bfda1..51916c95a736ade697a823f15d483336651ac99a 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -138,6 +138,8 @@ extern unsigned aarch64_architecture_version; /* ARMv8.2-A architecture extensions. */ #define AARCH64_FL_V8_2 (1 << 8) /* Has ARMv8.2-A features. */ #define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 extensions. */ +/* ARMv8.3-A architecture extensions. */ +#define AARCH64_FL_V8_3 (1 << 10) /* Has ARMv8.3-A features. */ /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) @@ -151,6 +153,8 @@ extern unsigned aarch64_architecture_version; (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1) #define AARCH64_FL_FOR_ARCH8_2 \ (AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_V8_2) +#define AARCH64_FL_FOR_ARCH8_3 \ + (AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_V8_3) /* Macros to test ISA flags. */ @@ -162,6 +166,7 @@ extern unsigned aarch64_architecture_version; #define AARCH64_ISA_RDMA (aarch64_isa_flags & AARCH64_FL_V8_1) #define AARCH64_ISA_V8_2 (aarch64_isa_flags & AARCH64_FL_V8_2) #define AARCH64_ISA_F16 (aarch64_isa_flags & AARCH64_FL_F16) +#define AARCH64_ISA_V8_3 (aarch64_isa_flags & AARCH64_FL_V8_3) /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO) @@ -176,6 +181,9 @@ extern unsigned aarch64_architecture_version; #define TARGET_FP_F16INST (TARGET_FLOAT && AARCH64_ISA_F16) #define TARGET_SIMD_F16INST (TARGET_SIMD && AARCH64_ISA_F16) +/* ARMv8.3-A features. */ +#define TARGET_ARMV8_3 (AARCH64_ISA_V8_3) + /* Make sure this is always defined so we don't have to check for ifdefs but rather use normal ifs. */ #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 8e2f46617b8e44ccf16941c31029ab5625322867..791718831d7089c44dfadb137f5e93caa9cd05f0 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -13874,7 +13874,10 @@ more feature modifiers. This option has the form @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}. The permissible values for @var{arch} are @samp{armv8-a}, -@samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}. +@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @var{native}. + +The value @samp{armv8.3-a} implies @samp{armv8.2-a} and enables compiler +support for the ARMv8.3-A architecture extensions. The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler support for the ARMv8.2-A architecture extensions.
Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
On 28/12/16 19:54, Cary Coutant wrote: OK on this proposal and to install this patch to gcc trunk? Hi GDB, Binutils maintainer: OK on this proposal and install this patch to binutils-gdb master? include/ 2016-11-29 Richard Earnshaw Jiong Wang * dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea. This is OK, but: +/* AARCH64 extensions. + DW_OP_AARCH64_operation takes one mandatory unsigned LEB128 operand. + Bits[6:0] of this operand is the action code, all others bits are initialized + to 0 except explicitly documented for one action. Please refer AArch64 DWARF + ABI documentation for details. */ Is it possible to include a stable URL that points to the ABI document? Hi Cary, Thanks for the review. Currently there is no URL for this AArch64 DWARF ABI updates. I will update the comments as soon as the stable URL is available. Regards, Jiong
[Ping^3][1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
Jiong Wang writes: > Jiong Wang writes: > >> Jiong Wang writes: >> >>> On 16/11/16 14:02, Jakub Jelinek wrote: >>>> On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote: >>>>> On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote: >>>>>> The two operations DW_OP_AARCH64_paciasp and >>>>>> DW_OP_AARCH64_paciasp_deref were >>>>>> designed as shortcut operations when LR is signed with A key and using >>>>>> function's CFA as salt. This is the default behaviour of return address >>>>>> signing so is expected to be used for most of the time. >>>>>> DW_OP_AARCH64_pauth >>>>>> is designed as a generic operation that allow describing pointer signing >>>>>> on >>>>>> any value using any salt and key in case we can't use the shortcut >>>>>> operations >>>>>> we can use this. >>>>> >>>>> I admit to not fully understand the salting/keying involved. But given >>>>> that the DW_OP space is really tiny, so we would like to not eat up too >>>>> many of them for new opcodes. And given that introducing any new DW_OPs >>>>> using for CFI unwinding will break any unwinder anyway causing us to >>>>> update them all for this new feature. Have you thought about using a new >>>>> CIE augmentation string character for describing that the return >>>>> address/link register used by a function/frame is salted/keyed? >>>>> >>>>> This seems a good description of CIE records and augmentation >>>>> characters:http://www.airs.com/blog/archives/460 >>>>> >>>>> It obviously also involves updating all unwinders to understand the new >>>>> augmentation character (and possible arguments). But it might be more >>>>> generic and saves us from using up too many DW_OPs. >>>> >>>> From what I understood, the return address is not always scrambled, so >>>> it doesn't apply to the whole function, just to most of it (except for >>>> an insn in the prologue and some in the epilogue). So I think one op is >>>> needed. But can't it be just a toggable flag whether the return address >>>> is scrambled + some arguments to it? >>>> Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default >>>> way of scrambling starts here (if not already active) or any kind of >>>> scrambling ends here (if already active), and >>>> DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you >>>> need >>>> to represent details of the less common variants with details what to do. >>>> Then you'd just hook through some MD_* macro in the unwinder the >>>> descrambling operation if the scrambling is active at the insns you unwind >>>> on. >>>> >>>> Jakub >>> >>> Hi Mark, Jakub: >>> >>>Thanks very much for the suggestions. >>> >>>I have done some experiments on your ideas and am thinking it's good to >>>combine them together. The use of DW_CFA instead of DW_OP can avoid >>> building >>>all information from scratch at each unwind location, while we can >>> indicate >>>the signing key index through new AArch64 CIE augmentation 'B'. This new >>>approach reduce the unwind table size overhead from ~25% to ~5% when >>> return >>>address signing enabled, it also largely simplified dwarf generation >>> code for >>>return address signing. >>> >>>As one new DWARF call frame instruction is needed for AArch64, I want to >>> reuse >>>DW_CFA_GNU_window_save to save the space. It is in vendor extension >>> space and >>>used for Sparc only, I think it make sense to reuse it for AArch64. On >>>AArch64, DW_CFA_GNU_window_save toggle return address sign status which >>> kept >>>in a new boolean type column in DWARF table, so DW_CFA_GNU_window_save >>> takes >>>no argument on AArch64, the same as on Sparc, this makes no difference >>> to those >>>existed encoding, length calculation code. >>> >>>Meanwhile one new DWARF expression operation number is still needed for >>>AArch64, it's useful for describing those complex pointer signing >>> scenarios >>>and it will be used to multiplex some further extensions on AArch64. >>> >>>OK on this proposal and to install this patch to gcc trunk? >>> >>> Hi GDB, Binutils maintainer: >>> >>>OK on this proposal and install this patch to binutils-gdb master? >>> >>> include/ >>> 2016-11-29 Richard Earnshaw >>> Jiong Wang >>> >>> * dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea. >> >> Ping~ > Ping^2 Ping^3 Can DWARF maintainers or global reviewers have a look at this? Thanks very much. -- Regards, Jiong
[Ping^2][1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
Jiong Wang writes: > Jiong Wang writes: > >> On 16/11/16 14:02, Jakub Jelinek wrote: >>> On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote: >>>> On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote: >>>>> The two operations DW_OP_AARCH64_paciasp and >>>>> DW_OP_AARCH64_paciasp_deref were >>>>> designed as shortcut operations when LR is signed with A key and using >>>>> function's CFA as salt. This is the default behaviour of return address >>>>> signing so is expected to be used for most of the time. >>>>> DW_OP_AARCH64_pauth >>>>> is designed as a generic operation that allow describing pointer signing >>>>> on >>>>> any value using any salt and key in case we can't use the shortcut >>>>> operations >>>>> we can use this. >>>> >>>> I admit to not fully understand the salting/keying involved. But given >>>> that the DW_OP space is really tiny, so we would like to not eat up too >>>> many of them for new opcodes. And given that introducing any new DW_OPs >>>> using for CFI unwinding will break any unwinder anyway causing us to >>>> update them all for this new feature. Have you thought about using a new >>>> CIE augmentation string character for describing that the return >>>> address/link register used by a function/frame is salted/keyed? >>>> >>>> This seems a good description of CIE records and augmentation >>>> characters:http://www.airs.com/blog/archives/460 >>>> >>>> It obviously also involves updating all unwinders to understand the new >>>> augmentation character (and possible arguments). But it might be more >>>> generic and saves us from using up too many DW_OPs. >>> >>> From what I understood, the return address is not always scrambled, so >>> it doesn't apply to the whole function, just to most of it (except for >>> an insn in the prologue and some in the epilogue). So I think one op is >>> needed. But can't it be just a toggable flag whether the return address >>> is scrambled + some arguments to it? >>> Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default >>> way of scrambling starts here (if not already active) or any kind of >>> scrambling ends here (if already active), and >>> DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you need >>> to represent details of the less common variants with details what to do. >>> Then you'd just hook through some MD_* macro in the unwinder the >>> descrambling operation if the scrambling is active at the insns you unwind >>> on. >>> >>> Jakub >> >> Hi Mark, Jakub: >> >>Thanks very much for the suggestions. >> >>I have done some experiments on your ideas and am thinking it's good to >>combine them together. The use of DW_CFA instead of DW_OP can avoid >> building >>all information from scratch at each unwind location, while we can >> indicate >>the signing key index through new AArch64 CIE augmentation 'B'. This new >>approach reduce the unwind table size overhead from ~25% to ~5% when >> return >>address signing enabled, it also largely simplified dwarf generation code >> for >>return address signing. >> >>As one new DWARF call frame instruction is needed for AArch64, I want to >> reuse >>DW_CFA_GNU_window_save to save the space. It is in vendor extension >> space and >>used for Sparc only, I think it make sense to reuse it for AArch64. On >>AArch64, DW_CFA_GNU_window_save toggle return address sign status which >> kept >> in a new boolean type column in DWARF table, so DW_CFA_GNU_window_save >> takes >>no argument on AArch64, the same as on Sparc, this makes no difference to >> those >>existed encoding, length calculation code. >> >>Meanwhile one new DWARF expression operation number is still needed for >>AArch64, it's useful for describing those complex pointer signing >> scenarios >>and it will be used to multiplex some further extensions on AArch64. >> >>OK on this proposal and to install this patch to gcc trunk? >> >> Hi GDB, Binutils maintainer: >> >>OK on this proposal and install this patch to binutils-gdb master? >> >> include/ >> 2016-11-29 Richard Earnshaw >> Jiong Wang >> >> * dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea. > > Ping~ Ping^2 -- Regards, Jiong
Re: [Ping~][1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
Jiong Wang writes: > On 16/11/16 14:02, Jakub Jelinek wrote: >> On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote: >>> On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote: >>>> The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref >>>> were >>>> designed as shortcut operations when LR is signed with A key and using >>>> function's CFA as salt. This is the default behaviour of return address >>>> signing so is expected to be used for most of the time. >>>> DW_OP_AARCH64_pauth >>>> is designed as a generic operation that allow describing pointer signing on >>>> any value using any salt and key in case we can't use the shortcut >>>> operations >>>> we can use this. >>> >>> I admit to not fully understand the salting/keying involved. But given >>> that the DW_OP space is really tiny, so we would like to not eat up too >>> many of them for new opcodes. And given that introducing any new DW_OPs >>> using for CFI unwinding will break any unwinder anyway causing us to >>> update them all for this new feature. Have you thought about using a new >>> CIE augmentation string character for describing that the return >>> address/link register used by a function/frame is salted/keyed? >>> >>> This seems a good description of CIE records and augmentation >>> characters:http://www.airs.com/blog/archives/460 >>> >>> It obviously also involves updating all unwinders to understand the new >>> augmentation character (and possible arguments). But it might be more >>> generic and saves us from using up too many DW_OPs. >> >> From what I understood, the return address is not always scrambled, so >> it doesn't apply to the whole function, just to most of it (except for >> an insn in the prologue and some in the epilogue). So I think one op is >> needed. But can't it be just a toggable flag whether the return address >> is scrambled + some arguments to it? >> Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default >> way of scrambling starts here (if not already active) or any kind of >> scrambling ends here (if already active), and >> DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you need >> to represent details of the less common variants with details what to do. >> Then you'd just hook through some MD_* macro in the unwinder the >> descrambling operation if the scrambling is active at the insns you unwind >> on. >> >> Jakub > > Hi Mark, Jakub: > >Thanks very much for the suggestions. > >I have done some experiments on your ideas and am thinking it's good to >combine them together. The use of DW_CFA instead of DW_OP can avoid > building >all information from scratch at each unwind location, while we can indicate >the signing key index through new AArch64 CIE augmentation 'B'. This new >approach reduce the unwind table size overhead from ~25% to ~5% when return >address signing enabled, it also largely simplified dwarf generation code > for >return address signing. > >As one new DWARF call frame instruction is needed for AArch64, I want to > reuse >DW_CFA_GNU_window_save to save the space. It is in vendor extension space > and >used for Sparc only, I think it make sense to reuse it for AArch64. On >AArch64, DW_CFA_GNU_window_save toggle return address sign status which > kept >in a new boolean type column in DWARF table, so DW_CFA_GNU_window_save > takes >no argument on AArch64, the same as on Sparc, this makes no difference to > those >existed encoding, length calculation code. > >Meanwhile one new DWARF expression operation number is still needed for >AArch64, it's useful for describing those complex pointer signing scenarios >and it will be used to multiplex some further extensions on AArch64. > >OK on this proposal and to install this patch to gcc trunk? > > Hi GDB, Binutils maintainer: > >OK on this proposal and install this patch to binutils-gdb master? > > include/ > 2016-11-29 Richard Earnshaw > Jiong Wang > > * dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea. Ping~ Thanks. -- Regards, Jiong
Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
On 01/12/16 10:42, Richard Earnshaw (lists) wrote: On 30/11/16 21:43, Cary Coutant wrote: How about if instead of special DW_OP codes, you instead define a new virtual register that contains the mangled return address? If the rule for that virtual register is anything other than DW_CFA_undefined, you'd expect to find the mangled return address using that rule; otherwise, you would use the rule for LR instead and expect an unmangled return address. The earlier example would become (picking an arbitrary value of 120 for the new virtual register number): .cfi_startproc 0x0 paciasp (this instruction sign return address register LR/X30) .cfi_val 120, DW_OP_reg30 0x4 stp x29, x30, [sp, -32]! .cfi_offset 120, -16 .cfi_offset 29, -32 .cfi_def_cfa_offset 32 0x8 add x29, sp, 0 Just a suggestion... What about signing other registers? And what if the value is then copied to another register? Don't you end up with every possible register (including the FP/SIMD registers) needing a shadow copy? Another issue is compared with the DW_CFA approach, this virtual register approach is less efficient on unwind table size and complexer to implement. .cfi_register takes two ULEB128 register number, it needs 3 bytes rather than DW_CFA's 1 byte. From example .debug_frame section size for linux kernel increment will be ~14% compared with DW_CFA approach's 5%. In the implementation, the prologue then normally will be .cfi_startproc 0x0 paciasp (this instruction sign return address register LR/X30) .cfi_val 120, DW_OP_reg30 <-A 0x4 stp x29, x30, [sp, -32]! .cfi_offset 120, -16 <-B .cfi_offset 29, -32 .cfi_def_cfa_offset 32 The epilogue normally will be ... ldp x29, x30, [sp], 32 .cfi_val 120, DW_OP_reg30 <- C .cfi_restore 29 .cfi_def_cfa 31, 0 autiasp (this instruction unsign LR/X30) .cfi_restore 30 For the virual register approach, GCC needs to track dwarf generation for LR/X30 in every place (A/B/C, maybe some other rare LR copy places), and rewrite LR to new virtual register accordingly. This seems easy, but my practice shows GCC won't do any DWARF auto-deduction if you have one explict DWARF CFI note attached to an insn (handled_one will be true in dwarf2out_frame_debug). So for instruction like stp/ldp, we then need to explicitly generate all three DWARF CFI note manually. While for DW_CFA approach, they will be: .cfi_startproc 0x0 paciasp (this instruction sign return address register LR/X30) .cfi_cfa_window_save 0x4 stp x29, x30, [sp, -32]! \ .cfi_offset 30, -16 | .cfi_offset 29, -32 | .cfi_def_cfa_offset 32 | all dwarf generation between sign and ... | unsign (paciasp/autiasp) is the same ldp x29, x30, [sp], 16| as before .cfi_restore 30 | .cfi_restore 29 | .cfi_def_cfa 31, 0 | / autiasp (this instruction unsign LR/X30) .cfi_cfa_window_save The DWARF generation implementation in backend is very simple, nothing needs to be updated between sign and unsign instruction. For the impact on the unwinder, the virtual register approach needs to change the implementation of "save value" rule which is quite general code. A target hook might need for AArch64 that when the destination register is the special virtual register, it seems a little bit hack to me. -cary On Wed, Nov 16, 2016 at 6:02 AM, Jakub Jelinek wrote: On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote: On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote: The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref were designed as shortcut operations when LR is signed with A key and using function's CFA as salt. This is the default behaviour of return address signing so is expected to be used for most of the time. DW_OP_AARCH64_pauth is designed as a generic operation that allow describing pointer signing on any value using any salt and key in case we can't use the shortcut operations we can use this. I admit to not fully understand the salting/keying involved. But given that the DW_OP space is really tiny, so we would like to not eat up too many of them for new opcodes. And given that introducing any new DW_OPs using for CFI unwinding will break any unwinder anyway causing us to update them all for this new feature. Have you thought about using a new CIE augmentation string character for describing that the return address/link register used by a function/frame is salted/keyed? This seems a good descri
Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
On 16/11/16 14:02, Jakub Jelinek wrote: On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote: On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote: The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref were designed as shortcut operations when LR is signed with A key and using function's CFA as salt. This is the default behaviour of return address signing so is expected to be used for most of the time. DW_OP_AARCH64_pauth is designed as a generic operation that allow describing pointer signing on any value using any salt and key in case we can't use the shortcut operations we can use this. I admit to not fully understand the salting/keying involved. But given that the DW_OP space is really tiny, so we would like to not eat up too many of them for new opcodes. And given that introducing any new DW_OPs using for CFI unwinding will break any unwinder anyway causing us to update them all for this new feature. Have you thought about using a new CIE augmentation string character for describing that the return address/link register used by a function/frame is salted/keyed? This seems a good description of CIE records and augmentation characters:http://www.airs.com/blog/archives/460 It obviously also involves updating all unwinders to understand the new augmentation character (and possible arguments). But it might be more generic and saves us from using up too many DW_OPs. From what I understood, the return address is not always scrambled, so it doesn't apply to the whole function, just to most of it (except for an insn in the prologue and some in the epilogue). So I think one op is needed. But can't it be just a toggable flag whether the return address is scrambled + some arguments to it? Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default way of scrambling starts here (if not already active) or any kind of scrambling ends here (if already active), and DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you need to represent details of the less common variants with details what to do. Then you'd just hook through some MD_* macro in the unwinder the descrambling operation if the scrambling is active at the insns you unwind on. Jakub Hi Mark, Jakub: Thanks very much for the suggestions. I have done some experiments on your ideas and am thinking it's good to combine them together. The use of DW_CFA instead of DW_OP can avoid building all information from scratch at each unwind location, while we can indicate the signing key index through new AArch64 CIE augmentation 'B'. This new approach reduce the unwind table size overhead from ~25% to ~5% when return address signing enabled, it also largely simplified dwarf generation code for return address signing. As one new DWARF call frame instruction is needed for AArch64, I want to reuse DW_CFA_GNU_window_save to save the space. It is in vendor extension space and used for Sparc only, I think it make sense to reuse it for AArch64. On AArch64, DW_CFA_GNU_window_save toggle return address sign status which kept in a new boolean type column in DWARF table, so DW_CFA_GNU_window_save takes no argument on AArch64, the same as on Sparc, this makes no difference to those existed encoding, length calculation code. Meanwhile one new DWARF expression operation number is still needed for AArch64, it's useful for describing those complex pointer signing scenarios and it will be used to multiplex some further extensions on AArch64. OK on this proposal and to install this patch to gcc trunk? Hi GDB, Binutils maintainer: OK on this proposal and install this patch to binutils-gdb master? include/ 2016-11-29 Richard Earnshaw Jiong Wang * dwarf2.def (DW_OP_AARCH64_operation): Reserve the number 0xea. diff --git a/include/dwarf2.def b/include/dwarf2.def index bb916ca238221151cf49359c25fd92643c7e60af..f3892a20da1fe13ddb419e5d7eda07f2c8d8b0c6 100644 --- a/include/dwarf2.def +++ b/include/dwarf2.def @@ -684,6 +684,12 @@ DW_OP (DW_OP_HP_unmod_range, 0xe5) DW_OP (DW_OP_HP_tls, 0xe6) /* PGI (STMicroelectronics) extensions. */ DW_OP (DW_OP_PGI_omp_thread_num, 0xf8) +/* AARCH64 extensions. + DW_OP_AARCH64_operation takes one mandatory unsigned LEB128 operand. + Bits[6:0] of this operand is the action code, all others bits are initialized + to 0 except explicitly documented for one action. Please refer AArch64 DWARF + ABI documentation for details. */ +DW_OP (DW_OP_AARCH64_operation, 0xea) DW_END_OP DW_FIRST_ATE (DW_ATE_void, 0x0) @@ -765,7 +771,8 @@ DW_CFA (DW_CFA_hi_user, 0x3f) /* SGI/MIPS specific. */ DW_CFA (DW_CFA_MIPS_advance_loc8, 0x1d) -/* GNU extensions. */ +/* GNU extensions. + NOTE: DW_CFA_GNU_window_save is multiplexed on Sparc and AArch64. */ DW_CFA (DW_CFA_GNU_window_save, 0x2d) DW_CFA (DW_CFA_GNU_args_size, 0x2e) DW_CFA (DW_CFA_GNU_negative_offset_extended, 0x2f)
Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE
gcc/ 2016-11-11 Jiong Wang * function.c (expand_function_end): Guard stack_protect_epilogue with ENABLE_DEFAULT_SSP_RUNTIME. * cfgexpand.c (pass_expand::execute): Likewise guard for stack_protect_prologue. * defaults.h (ENABLE_DEFAULT_SSP_RUNTIME): New macro. Default set to 1. * doc/tm.texi.in (Misc): Documents ENABLE_DEFAULT_SSP_RUNTIME. * doc/tm.texi: Regenerate. Like Joseph, I think this should be a hook rather than a new target macro. I do think it's closer to the right track though (separation of access to the guard from the rest of the SSP runtime bits). Hi Josephy, Jeff: Thanks for the review. I was planning to update the patch after resolving the pending DWARF issue (https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01156.html), While as this patch itself it quite independent, so OK to commit the attached patch? x86-64 boostrap and regression OK. gcc/ 2016-11-24 Jiong Wang * target.def (stack_protect_runtime_enabled_p): New. * function.c (expand_function_end): Guard stack_protect_epilogue with targetm.stack_protect_runtime_enabled_p. * cfgexpand.c (pass_expand::execute): Likewise. * calls.c (expand_call): Likewise. * doc/tm.texi.in (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Add it. * doc/tm.texi: Regenerate. diff --git a/gcc/calls.c b/gcc/calls.c index c916e07..21385ce 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -3083,7 +3083,9 @@ expand_call (tree exp, rtx target, int ignore) if (pass && (flags & ECF_MALLOC)) start_sequence (); - if (pass == 0 && crtl->stack_protect_guard) + if (pass == 0 + && crtl->stack_protect_guard + && targetm.stack_protect_runtime_enabled_p ()) stack_protect_epilogue (); adjusted_args_size = args_size; diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 7ffb558..9c5a892 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -6334,7 +6334,7 @@ pass_expand::execute (function *fun) /* Initialize the stack_protect_guard field. This must happen after the call to __main (if any) so that the external decl is initialized. */ - if (crtl->stack_protect_guard) + if (crtl->stack_protect_guard && targetm.stack_protect_runtime_enabled_p ()) stack_protect_prologue (); expand_phi_nodes (&SA); diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 84bba07..c4f4ec3 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -4949,6 +4949,10 @@ The default version of this hook invokes a function called normally defined in @file{libgcc2.c}. @end deftypefn +@deftypefn {Target Hook} bool TARGET_STACK_PROTECT_RUNTIME_ENABLED_P (void) +Returns true if the target wants GCC's default stack protect runtime support, otherwise return false. The default implementation always returns true. +@end deftypefn + @deftypefn {Common Target Hook} bool TARGET_SUPPORTS_SPLIT_STACK (bool @var{report}, struct gcc_options *@var{opts}) Whether this target supports splitting the stack when the options described in @var{opts} have been passed. This is called after options have been parsed, so the target may reject splitting the stack in some configurations. The default version of this hook returns false. If @var{report} is true, this function may issue a warning or error; if @var{report} is false, it must simply return a value @end deftypefn diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 9afd5daa..9202bfe6 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3825,6 +3825,8 @@ generic code. @hook TARGET_STACK_PROTECT_FAIL +@hook TARGET_STACK_PROTECT_RUNTIME_ENABLED_P + @hook TARGET_SUPPORTS_SPLIT_STACK @node Miscellaneous Register Hooks diff --git a/gcc/function.c b/gcc/function.c index 0b1d168..871f5a0 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -5627,7 +5627,7 @@ expand_function_end (void) emit_insn (gen_blockage ()); /* If stack protection is enabled for this function, check the guard. */ - if (crtl->stack_protect_guard) + if (crtl->stack_protect_guard && targetm.stack_protect_runtime_enabled_p ()) stack_protect_epilogue (); /* If we had calls to alloca, and this machine needs diff --git a/gcc/target.def b/gcc/target.def index c24b4cf..a63b850 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4039,6 +4039,15 @@ normally defined in @file{libgcc2.c}.", tree, (void), default_external_stack_protect_fail) +/* This target hook allows the operating system to disable the default stack + protector runtime support. */ +DEFHOOK +(stack_protect_runtime_enabled_p, + "Returns true if the target wants GCC's default stack protect runtime support,\ + otherwise return false. The default implementation always returns true.", + bool, (void), + hook_bool_void_true) + DEFHOOK (can_use_doloop_p, "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\
Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
On 15/11/16 19:25, Richard Earnshaw (lists) wrote: On 15/11/16 16:48, Jiong Wang wrote: On 15/11/16 16:18, Jakub Jelinek wrote: I know nothing about the aarch64 return address signing, would all 3 or say 2 usually appear together without any separate pc advance, or are they all going to appear frequently and at different pcs? I think it's the latter, the DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref are going to appear frequently and at different pcs. For example, the following function prologue, there are three instructions at 0x0, 0x4, 0x8. After the first instruction at 0x0, LR/X30 will be mangled. The "paciasp" always mangle LR register using SP as salt and write back the value into LR. We then generate DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is mangled in this way so they can unwind the original value properly. After the second instruction at 0x4, The mangled value of LR/X30 will be pushed on to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes: first fetch the mangled value from stack offset -16, then do whatever to restore the original value from the mangled value. This is represented by (DW_OP_AARCH64_paciasp_deref, offset). .cfi_startproc 0x0 paciasp (this instruction sign return address register LR/X30) .cfi_val_expression 30, DW_OP_AARCH64_paciasp 0x4 stp x29, x30, [sp, -32]! .cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16 .cfi_offset 29, -32 .cfi_def_cfa_offset 32 0x8 add x29, sp, 0 Now I'm confused. I was thinking that we needed one opcode for the sign operation in the prologue and one for the unsign/validate operation in the epilogue (to support non-call exceptions. IMO, non-call exceptions is fine, it looks to me doesn't need extra description as for non-call exceptions (exceptions thrown from signal handler) the key point is how to unwind across signal frame. For libgcc EH unwinder, when normal unwinding failed, it will fall back to architecture unwinding hook which restore some information from signal frame which is just on top of the signal handler's frame. I can see AArch64 implementation will setup return address column like the following logic where "sc->pc" is initialized by kernel and it's not signed therefore should sucess on further unwinding. fs->regs.reg[__LIBGCC_DWARF_ALT_FRAME_RETURN_COLUMN__].how = REG_SAVED_VAL_OFFSET; fs->regs.reg[__LIBGCC_DWARF_ALT_FRAME_RETURN_COLUMN__].loc.offset = (_Unwind_Ptr) (sc->pc) - new_cfa; But why do we need a separate code to say that a previously signed value has now been pushed on the stack? Surely that's just a normal store operation that can be tracked through the unwinding state machine. I was thinking the same thing, but found it doesn't work. My understanding of frame unwinding described at DWARF specification is: there are two steps for frame unwinding. The first step is to calculate register restore rules. Unwinder scans register rules from function start to the unwinding PC, one rule will be *overridden* by the next for the same register, there is *no inheritance*. The second step is then to evaluate all the final rules collected at the unwinding PC. According to the rule, either fetch the value from stack or evaluate the value on DWARF expression stack etc. I also had tried to modify ".cfi_val_expression" at offset 0x4 in above example into ".cfi_offset 30, -24", libgcc EH unwinder just doesn't work. I was expecting the third opcode to be needed for the special operations that are not frequently used by the compiler. The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref were designed as shortcut operations when LR is signed with A key and using function's CFA as salt. This is the default behaviour of return address signing so is expected to be used for most of the time. DW_OP_AARCH64_pauth is designed as a generic operation that allow describing pointer signing on any value using any salt and key in case we can't use the shortcut operations we can use this.
Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
On 15/11/16 16:18, Jakub Jelinek wrote: On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote: Takes one signed LEB128 offset and retrieves 8-byte contents from the address calculated by CFA plus this offset, the contents then authenticated as per A key for instruction pointer using current CFA as salt. The result is pushed onto the stack. I'd like to point out that especially the vendor range of DW_OP_* is extremely scarce resource, we have only a couple of unused values, so taking 3 out of the remaining unused 12 for a single architecture is IMHO too much. Can't you use just a single opcode and encode which of the 3 operations it is in say the low 2 bits of a LEB 128 operand? We'll likely need to do RSN some multiplexing even for the generic GNU opcodes if we need just a few further ones (say 0xff as an extension, followed by uleb128 containing the opcode - 0xff). In the non-vendor area we still have 54 values left, so there is more space for future expansion. Seperate DWARF operations are introduced instead of combining all of them into one are mostly because these operations are going to be used for most of the functions once return address signing are enabled, and they are used for describing frame unwinding that they will go into unwind table for C++ program or C program compiled with -fexceptions, the impact on unwind table size is significant. So I was trying to lower the unwind table size overhead as much as I can. IMHO, three numbers actually is not that much for one architecture in DWARF operation vendor extension space as vendors can overlap with each other. The only painful thing from my understand is there are platform vendors, for example "GNU" and "LLVM" etc, for which architecture vendor can't overlap with. For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just one range, so ideally the opcodes would be unique everywhere, if not, there is just a single GNU vendor, there is no separate range for Aarch64, that can overlap with range for x86_64, and powerpc, etc. Perhaps we could declare that certain opcode subrange for the GNU vendor is architecture specific and document that the meaning of opcodes in that range and count/encoding of their arguments depends on the architecture, but then we should document how to figure out the architecture too (e.g. for ELF base it on the containing EM_*). All the tools that look at DWARF (readelf, objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on that though. I know nothing about the aarch64 return address signing, would all 3 or say 2 usually appear together without any separate pc advance, or are they all going to appear frequently and at different pcs? I think it's the latter, the DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref are going to appear frequently and at different pcs. For example, the following function prologue, there are three instructions at 0x0, 0x4, 0x8. After the first instruction at 0x0, LR/X30 will be mangled. The "paciasp" always mangle LR register using SP as salt and write back the value into LR. We then generate DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is mangled in this way so they can unwind the original value properly. After the second instruction at 0x4, The mangled value of LR/X30 will be pushed on to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes: first fetch the mangled value from stack offset -16, then do whatever to restore the original value from the mangled value. This is represented by (DW_OP_AARCH64_paciasp_deref, offset). .cfi_startproc 0x0 paciasp (this instruction sign return address register LR/X30) .cfi_val_expression 30, DW_OP_AARCH64_paciasp 0x4 stp x29, x30, [sp, -32]! .cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16 .cfi_offset 29, -32 .cfi_def_cfa_offset 32 0x8 add x29, sp, 0 Perhaps if there is just 1 opcode and has all the info encoded just in one bigger uleb128 or something similar... Jakub
Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
On 15/11/16 16:18, Jakub Jelinek wrote: On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote: Takes one signed LEB128 offset and retrieves 8-byte contents from the address calculated by CFA plus this offset, the contents then authenticated as per A key for instruction pointer using current CFA as salt. The result is pushed onto the stack. I'd like to point out that especially the vendor range of DW_OP_* is extremely scarce resource, we have only a couple of unused values, so taking 3 out of the remaining unused 12 for a single architecture is IMHO too much. Can't you use just a single opcode and encode which of the 3 operations it is in say the low 2 bits of a LEB 128 operand? We'll likely need to do RSN some multiplexing even for the generic GNU opcodes if we need just a few further ones (say 0xff as an extension, followed by uleb128 containing the opcode - 0xff). In the non-vendor area we still have 54 values left, so there is more space for future expansion. Seperate DWARF operations are introduced instead of combining all of them into one are mostly because these operations are going to be used for most of the functions once return address signing are enabled, and they are used for describing frame unwinding that they will go into unwind table for C++ program or C program compiled with -fexceptions, the impact on unwind table size is significant. So I was trying to lower the unwind table size overhead as much as I can. IMHO, three numbers actually is not that much for one architecture in DWARF operation vendor extension space as vendors can overlap with each other. The only painful thing from my understand is there are platform vendors, for example "GNU" and "LLVM" etc, for which architecture vendor can't overlap with. For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just one range, so ideally the opcodes would be unique everywhere, if not, there is just a single GNU vendor, there is no separate range for Aarch64, that can overlap with range for x86_64, and powerpc, etc. Perhaps we could declare that certain opcode subrange for the GNU vendor is architecture specific and document that the meaning of opcodes in that range and count/encoding of their arguments depends on the architecture, but then we should document how to figure out the architecture too (e.g. for ELF base it on the containing EM_*). All the tools that look at DWARF (readelf, objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on that though. I know nothing about the aarch64 return address signing, would all 3 or say 2 usually appear together without any separate pc advance, or are they all going to appear frequently and at different pcs? I think it's the latter, the DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref are going to appear frequently and at different pcs. For example, the following function prologue, there are three instructions at 0x0, 0x4, 0x8. After the first instruction at 0x0, LR/X30 will be mangled. The "paciasp" always mangle LR register using SP as salt and write back the value into LR. We then generate DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is mangled in this way so they can unwind the original value properly. After the second instruction at 0x4, The mangled value of LR/X30 will be pushed on to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes: first fetch the mangled value from stack offset -16, then do whatever to restore the original value from the mangled value. This is represented by (DW_OP_AARCH64_paciasp_deref, offset). .cfi_startproc 0x0 paciasp (this instruction sign return address register LR/X30) .cfi_val_expression 30, DW_OP_AARCH64_paciasp 0x4 stp x29, x30, [sp, -32]! .cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16 .cfi_offset 29, -32 .cfi_def_cfa_offset 32 0x8 add x29, sp, 0 Perhaps if there is just 1 opcode and has all the info encoded just in one bigger uleb128 or something similar...
Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
On 11/11/16 19:38, Jakub Jelinek wrote: On Fri, Nov 11, 2016 at 06:21:48PM +, Jiong Wang wrote: This patch introduces three AARCH64 private DWARF operations in vendor extension space. DW_OP_AARCH64_pauth 0xea === Takes one unsigned LEB 128 Pointer Authentication Description. Bits [3:0] of the description contain the Authentication Action Code. All unused bits are initialized to 0. The operation then proceeds according to the value of the action code as described in the Action Code Table. DW_OP_AARCH64_paciasp 0xeb === Authenticates the contents in X30/LR register as per A key for instruction pointer using current CFA as salt. The result is pushed onto the stack. DW_OP_AARCH64_paciasp_deref 0xec === Takes one signed LEB128 offset and retrieves 8-byte contents from the address calculated by CFA plus this offset, the contents then authenticated as per A key for instruction pointer using current CFA as salt. The result is pushed onto the stack. I'd like to point out that especially the vendor range of DW_OP_* is extremely scarce resource, we have only a couple of unused values, so taking 3 out of the remaining unused 12 for a single architecture is IMHO too much. Can't you use just a single opcode and encode which of the 3 operations it is in say the low 2 bits of a LEB 128 operand? We'll likely need to do RSN some multiplexing even for the generic GNU opcodes if we need just a few further ones (say 0xff as an extension, followed by uleb128 containing the opcode - 0xff). In the non-vendor area we still have 54 values left, so there is more space for future expansion. Jakub Seperate DWARF operations are introduced instead of combining all of them into one are mostly because these operations are going to be used for most of the functions once return address signing are enabled, and they are used for describing frame unwinding that they will go into unwind table for C++ program or C program compiled with -fexceptions, the impact on unwind table size is significant. So I was trying to lower the unwind table size overhead as much as I can. IMHO, three numbers actually is not that much for one architecture in DWARF operation vendor extension space as vendors can overlap with each other. The only painful thing from my understand is there are platform vendors, for example "GNU" and "LLVM" etc, for which architecture vendor can't overlap with. In include/dwarf2.def, I saw DW_OP_GNU* has reserved 13, DW_OP_HP* has reserved 7 and DW_OP_PGI has reserved 1. So for an alternative approach, can these AArch64 extensions overlap and reuse those numbers reserved for DW_OP_HP* ? for example 0xe4, 0xe5, 0xe6. I am even thinking GNU toolchain makes the 8 numbers reserved by existed DW_OP_HP* and DW_OP_SGI* as architecture vendor area and allow multiplexing on them for different architectures. This may offer more flexibilities for architecture vendors. Under current code base, my search shows the overlap should be safe inside GCC/GDB and we only needs minor disassemble tweak in Binutils. Thanks. Regards, Jiong
Re: [7/9][AArch64, libgcc] Let AArch64 use customized unwinder file
On 11/11/16 22:12, Joseph Myers wrote: On Fri, 11 Nov 2016, Jiong Wang wrote: There are two ways of introducing these AArch64 support: * Introducing a few target macros so we can customize functions like uw_init_context, uw_install_context etc. * Use target private unwind-dw2 implementation, i.e duplicate the generic unwind-dw2.c into target config directory and use it instead of generic one. This is current used by IA64 and CR16 is using. I am not sure which approach is the convention in libgcc, Ian, any comments on this? Although as you note duplication has been used before, I think it should be strongly discouraged; duplicated files are unlikely to be kept up to date with relevant changes to the main file. Hi Joseph, The changes AArch64 needs to do on top of the generic unwind-dw2.c is at: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01167.html If I don't duplicate unwind-dw2.c, then I need to guard those changes with something like __aarch64__ or introduce several target macros. It looks to me only the hunk that supports AArch64 DWARF operations worth a target macro, something like MD_DW_OP_HANDLER, for the other changes they are quite scattered, for example the field extension on "struct _Unwind_Context" and the relax of assertion on uw_install_context_1. Any comments on this? Thanks. Regards, Jiong
Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE
On 24/10/16 16:22, Jeff Law wrote: Asserting couldn't hurt. I'd much rather have the compiler issue an error, ICE or somesuch than silently not generate a call to the stack protector fail routine. Hi Jeff, I have just send out the other patch which accelerates -fstack-protector on AArch64, more background information there at: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01168.html Previous, I was emptying three target insns/hooks, and was relying GCC to optimize all remaining SSP runtime stuff out. I am thinking it's better and safer that GCC allow one backend to disable the default SSP runtime cleanly, so the backend does't need to rely on optimization level, and libssp is not needed at all optimization level. In this new patch, I introduced a new target macro for SSP to allow one backend GCC's default SSP runtime generation be disabled. How does this looks to you? Thanks. gcc/ 2016-11-11 Jiong Wang * function.c (expand_function_end): Guard stack_protect_epilogue with ENABLE_DEFAULT_SSP_RUNTIME. * cfgexpand.c (pass_expand::execute): Likewise guard for stack_protect_prologue. * defaults.h (ENABLE_DEFAULT_SSP_RUNTIME): New macro. Default set to 1. * doc/tm.texi.in (Misc): Documents ENABLE_DEFAULT_SSP_RUNTIME. * doc/tm.texi: Regenerate. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 130a16b1d7d06c4ec9e31439037ffcbcbd0e085f..99f055d2db622f7acd393a223b3968be12b6235f 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -6343,7 +6343,7 @@ pass_expand::execute (function *fun) /* Initialize the stack_protect_guard field. This must happen after the call to __main (if any) so that the external decl is initialized. */ - if (crtl->stack_protect_guard) + if (crtl->stack_protect_guard && ENABLE_DEFAULT_SSP_RUNTIME) stack_protect_prologue (); expand_phi_nodes (&SA); diff --git a/gcc/defaults.h b/gcc/defaults.h index af8fe916be49e745c842d992a5af372c46ec2fe3..ec5e52c9761e3e5aee5274c54628157d0bde1808 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1404,6 +1404,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see # define DEFAULT_FLAG_SSP 0 #endif +/* Supply a default definition of ENABLE_DEFAULT_SSP_RUNTIME. GCC use this to + decide whether stack_protect_prologue and stack_protect_epilogue based + runtime support should be generated. */ + +#ifndef ENABLE_DEFAULT_SSP_RUNTIME +#define ENABLE_DEFAULT_SSP_RUNTIME 1 +#endif + /* Provide default values for the macros controlling stack checking. */ /* The default is neither full builtin stack checking... */ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 586626062435f3718cfae84c6aab3024d08d79d7..64d20bc493470221286b6248354f0d6122405cb6 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -10487,6 +10487,14 @@ The default implementation does nothing. @c prevent bad page break with this line Here are several miscellaneous parameters. +@defmac ENABLE_DEFAULT_SSP_RUNTIME +Define this boolean macro to indicate whether or not your architecture +use GCC default stack protector runtime. If this macro is set to false, +stack_protect_prologue and stack_protect_epilogue based runtime support will be +generated, otherwise GCC assumes your architecture generates private runtime +support. This macro is default set to true. +@end defmac + @defmac HAS_LONG_COND_BRANCH Define this boolean macro to indicate whether or not your architecture has conditional branches that can span all of memory. It is used in diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index da133a4b7010533d85d5bb9a850b91e8a80ce1ca..729c76fa182076828a5819ab391b4f61fb80a771 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7499,6 +7499,14 @@ c_register_addr_space ("__ea", ADDR_SPACE_EA); @c prevent bad page break with this line Here are several miscellaneous parameters. +@defmac ENABLE_DEFAULT_SSP_RUNTIME +Define this boolean macro to indicate whether or not your architecture +use GCC default stack protector runtime. If this macro is set to false, +stack_protect_prologue and stack_protect_epilogue based runtime support will be +generated, otherwise GCC assumes your architecture generates private runtime +support. This macro is default set to true. +@end defmac + @defmac HAS_LONG_COND_BRANCH Define this boolean macro to indicate whether or not your architecture has conditional branches that can span all of memory. It is used in diff --git a/gcc/function.c b/gcc/function.c index 53bad8736e9ef251347d23d40bc0ab767a979bc7..9dce8929590f6cb06155a540e33960c2cf0e3b16 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -5624,7 +5624,7 @@ expand_function_end (void) emit_insn (gen_blockage ()); /* If stack protection is enabled for this function, check the guard. */ - if (crtl->stack_protect_guard) + if (crtl->stack_protect_guard && ENABLE_DEF
[9/9][RFC][AArch64] Accelerate -fstack-protector through pointer authentication extension
This patch accelerates GCC's existed -fstack-protector using ARMv8.3-A pointer authentication instructions. Given AArch64 currently has the following stack layout: | caller's LR | | | canary<- sentinel for -fstack-protector | locals (buffer located here) | | | other callees | | callee's LR <- sentinel for -msign-return-address | | we can switch locals and callees, | ... | vararg | | other callee | | LR | | locals (buffer located here) We then sign LR and make it serve as canary value. There are several benefits of this approach: * It's evetually -msign-return-address + swap locals and callees areas. * Require nearly no modifications on prologue and epilogue, avoid making them complexer. * No need of any other runtime support, libssp is not required. The runtime overhead before and after this patch will be: o canary insert GCC default SSP runtime was loading from global variable "__stack_chk_guard" initilized in libssp: adrpx19, _GLOBAL_OFFSET_TABLE_ ldr x19, [x19, #:gotpage_lo15:__stack_chk_guard] ldr x2, [x19] str x2, [x29, 56] this patch accelerats into: sign lr o canary check GCC default SSP runtime was reloading from stack, then comparing with original value and branch to abort function: ldr x2, [x29, 56] ldr x1, [x19] eor x1, x2, x1 cbnzx1, .L5 ... ret .L5: bl __stack_chk_fail acclerated into: aut lr + ret or retaa the the canary value (signed LR) fails authentication, the return to invalid address will cause exception. NOTE, this approach however requires DWARF change as the original LR is signed, the binary needs new libgcc to make sure c++ eh works correctly. Given this acceleration already needs the user specify -mstack-protector-dialect=pauth which means the target platform largely should have install new libgcc, otherwise you can't utilize new pointer authentication features. gcc/ 2016-11-11 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_stack_protector_type): New enum. (aarch64_layout_frame): Swap callees and locals when -mstack-protector-dialect=pauth specified. (aarch64_expand_prologue): Use AARCH64_PAUTH_SSP_OR_RA_SIGN instead of AARCH64_ENABLE_RETURN_ADDRESS_SIGN. (aarch64_expand_epilogue): Likewise. * config/aarch64/aarch64.md (*do_return): Likewise. (aarch64_override_options): Sanity check for ILP32 and TARGET_PAUTH. * config/aarch64/aarch64.h (AARCH64_PAUTH_SSP_OPTION, AARCH64_PAUTH_SSP, AARCH64_PAUTH_SSP_OR_RA_SIGN, LINK_SSP_SPEC): New defines. * config/aarch64/aarch64.opt (-mstack-protector-dialect=): New option. * doc/invoke.texi (AArch64 Options): Documents -mstack-protector-dialect=. diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h index 41c14b38a6188d399eb04baca2896e033c03ff1b..ff464ea5675146d62f0b676fe776f882fc1b8d80 100644 --- a/gcc/config/aarch64/aarch64-opts.h +++ b/gcc/config/aarch64/aarch64-opts.h @@ -99,4 +99,10 @@ enum aarch64_function_type { AARCH64_FUNCTION_ALL }; +/* GCC standard stack protector (Canary insertion based) types for AArch64. */ +enum aarch64_stack_protector_type { + STACK_PROTECTOR_TRAD, + STACK_PROTECTOR_PAUTH +}; + #endif diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 907e8bdf5b4961b3107dcd5a481de28335e4be89..73ef2677a11450fe21f765011317bd3367ef0d94 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -982,4 +982,25 @@ enum aarch64_pauth_action_type AARCH64_PAUTH_AUTH }; +/* Pointer authentication accelerated -fstack-protector. */ +#define AARCH64_PAUTH_SSP_OPTION \ + (TARGET_PAUTH && aarch64_stack_protector_dialect == STACK_PROTECTOR_PAUTH) + +#define AARCH64_PAUTH_SSP \ + (crtl->stack_protect_guard && AARCH64_PAUTH_SSP_OPTION) + +#define AARCH64_PAUTH_SSP_OR_RA_SIGN \ + (AARCH64_PAUTH_SSP || AARCH64_ENABLE_RETURN_ADDRESS_SIGN) + +#ifndef TARGET_LIBC_PROVIDES_SSP +#define LINK_SSP_SPEC "%{!mstack-protector-dialect=pauth:\ + %{fstack-protector|fstack-protector-all\ + |fstack-protector-strong|fstack-protector-explicit:\ + -lssp_nonshared -lssp}}" +#endif + +/* Don't use GCC default SSP runtime if pointer authentication acceleration + enabled. */ +#define ENABLE_DEFAULT_SSP_RUNTIME !(AARCH64_PAUTH_SSP_OPTION) + #endif /* GCC_AARCH64_H */ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index cae177dca511fdb909ef82c972d3bbdebab215e2..c469baf92268ff894f5cf0ea9f5dbd4180714b98 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2993,6 +2993,15 @@ aarc
[8/9][AArch64, libgcc] Runtime support for AArch64 DWARF operations
This patch add AArch64 specific runtime EH unwinding support for DW_OP_AARCH64_pauth, DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref. The semantics of them are described at the specification in patch [1/9]. The support includes: * Parsing these DWARF operations. Perform unwinding actions according to their semantics. * Handling eh_return multi return paths. Function calling __builtin_eh_return (_Unwind_RaiseException*) will have multiple return paths. One is for normal exit, the other is for install EH handler. If the _Unwind_RaiseException itself is return address signed, then there will always be return address authentication before return, however, if the return path in _Unwind_RaiseException if from installing EH handler the address of which has already been authenticated during unwinding, then we need to re-sign that address, so when the execution flow continues at _Unwind_RaiseException's epilogue, the authentication still works correctly. OK for trunk? libgcc/ 2016-11-11 Jiong Wang * config/aarch64/unwind-aarch64.c (RA_SIGN_BIT): New flag to indicate one frame is return address signed. (execute_stack_op): Handle DW_OP_AARCH64_pauth, DW_OP_AARCH64_paciasp, DW_OP_AARCH64_paciasp_deref. (uw_init_context): Call aarch64_uw_init_context_1. (uw_init_context_1): Rename to aarch64_uw_init_context_1. Strip signature for seed address. (uw_install_context): Re-sign handler's address so it works correctly with caller's context. (uw_install_context_1): by_value[LR] can be true, after return address signing LR will come from DWARF value expression rule which is a by_value true rule. diff --git a/libgcc/config/aarch64/unwind-aarch64.c b/libgcc/config/aarch64/unwind-aarch64.c index 1fb6026d123f8e7fc676f5e95e8e66caccf3d6ff..f6441a56960dbd4b754f8fc17d581402389a4812 100644 --- a/libgcc/config/aarch64/unwind-aarch64.c +++ b/libgcc/config/aarch64/unwind-aarch64.c @@ -37,6 +37,10 @@ #include "gthr.h" #include "unwind-dw2.h" +/* This AArch64 implementation is exactly the same as libgcc/unwind-dw2.c, + except we have a customized uw_init_context_1 to handle pointer + authentication. */ + #ifdef HAVE_SYS_SDT_H #include #endif @@ -67,7 +71,7 @@ waste. However, some runtime libraries supplied with ICC do contain such an unorthodox transition, as well as the unwind info to match. This loss of register restoration doesn't matter in practice, because the exception - is caught in the native unix abi, where all of the xmm registers are + is caught in the native unix abi, where all of the xmm registers are call clobbered. Ideally, we'd record some bit to notice when we're failing to restore some @@ -136,6 +140,8 @@ struct _Unwind_Context #define SIGNAL_FRAME_BIT ((~(_Unwind_Word) 0 >> 1) + 1) /* Context which has version/args_size/by_value fields. */ #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1) + /* Return address has been signed. */ +#define RA_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1) _Unwind_Word flags; /* 0 for now, can be increased when further fields are added to struct _Unwind_Context. */ @@ -908,6 +914,89 @@ execute_stack_op (const unsigned char *op_ptr, const unsigned char *op_end, case DW_OP_nop: goto no_push; + case DW_OP_AARCH64_paciasp: + { + _Unwind_Word lr_value = _Unwind_GetGR (context, LR_REGNUM); + /* Note: initial is guaranteed to be CFA by DWARF specification. */ + result + = (_Unwind_Word) __builtin_aarch64_autia1716 ((void *) lr_value, + initial); + context->flags |= RA_SIGNED_BIT; + break; + } + + case DW_OP_AARCH64_paciasp_deref: + { + _sleb128_t offset; + op_ptr = read_sleb128 (op_ptr, &offset); + result = (_Unwind_Word) read_pointer ((void *) initial + offset); + result + = (_Unwind_Word) __builtin_aarch64_autia1716 ((void *) result, + initial); + context->flags |= RA_SIGNED_BIT; + break; + } + + case DW_OP_AARCH64_pauth: + { + _uleb128_t auth_descriptor; + op_ptr = read_uleb128 (op_ptr, &auth_descriptor); + enum aarch64_pauth_action_type action_code = + (enum aarch64_pauth_action_type) (auth_descriptor & 0xf); + context->flags |= RA_SIGNED_BIT; + + /* Different action may take different number of operands. + AARCH64_PAUTH_DROP* takes one operand while AARCH64_PAUTH_AUTH + takes two and both of them produce one result. */ + switch (action_code) + { + case AARCH64_PAUTH_DROP_I: + { + /* Fetch the value to drop signature. */ + stack_elt -= 1; + result = stack[stack_elt]; + result + = (_Unwind_Word) + __builtin_aarch64_xpaclri ((void *) result); + break; + } + case AARCH64_PAUTH_AUTH
[7/9][AArch64, libgcc] Let AArch64 use customized unwinder file
We need customized EH unwinder support for AArch64 DWARF operations introduced earlier in this patchset, these changes mostly need to be done in the generic file unwind-dw2.c. There are two ways of introducing these AArch64 support: * Introducing a few target macros so we can customize functions like uw_init_context, uw_install_context etc. * Use target private unwind-dw2 implementation, i.e duplicate the generic unwind-dw2.c into target config directory and use it instead of generic one. This is current used by IA64 and CR16 is using. I am not sure which approach is the convention in libgcc, Ian, any comments on this? Thanks. This patch is the start of using approach 2 includes necessary Makefile support and copying of original unwind-dw2.c. A follow up patch will implement those AArch64 specific stuff so the change will be very clear. OK for trunk? libgcc/ 2016-11-11 Jiong Wang * config.host (aarch64*-*-elf, aarch64*-*-rtems*, aarch64*-*-linux*): Include new AArch64 EH makefile. * config/aarch64/t-eh-aarch64: New EH makefile. * config/aarch64/unwind-aarch64.c: New EH unwinder implementation, copied from unwind-dw2.c. diff --git a/libgcc/config.host b/libgcc/config.host index 002f650be9a7cd6f69ce3d51639a735ca7eba564..2bf90818c03e71bd3a601b607b98ac6b78fe763a 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -330,12 +330,14 @@ aarch64*-*-elf | aarch64*-*-rtems*) extra_parts="$extra_parts crtfastmath.o" tmake_file="${tmake_file} ${cpu_type}/t-aarch64" tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm" + tmake_file="${tmake_file} ${cpu_type}/t-eh-aarch64" ;; aarch64*-*-linux*) extra_parts="$extra_parts crtfastmath.o" md_unwind_header=aarch64/linux-unwind.h tmake_file="${tmake_file} ${cpu_type}/t-aarch64" tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm" + tmake_file="${tmake_file} ${cpu_type}/t-eh-aarch64" ;; alpha*-*-linux*) tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee t-crtfm alpha/t-linux" diff --git a/libgcc/config/aarch64/t-eh-aarch64 b/libgcc/config/aarch64/t-eh-aarch64 new file mode 100644 index ..2ccc02d409ff850ec9db355a4d06efd125b4f68d --- /dev/null +++ b/libgcc/config/aarch64/t-eh-aarch64 @@ -0,0 +1,3 @@ +# Use customized EH unwinder implementation. +LIB2ADDEH = $(srcdir)/config/aarch64/unwind-aarch64.c $(srcdir)/unwind-dw2-fde-dip.c \ + $(srcdir)/unwind-sjlj.c $(srcdir)/unwind-c.c diff --git a/libgcc/config/aarch64/unwind-aarch64.c b/libgcc/config/aarch64/unwind-aarch64.c new file mode 100644 index ..1fb6026d123f8e7fc676f5e95e8e66caccf3d6ff --- /dev/null +++ b/libgcc/config/aarch64/unwind-aarch64.c @@ -0,0 +1,1715 @@ +/* DWARF2 exception handling and frame unwind runtime interface routines. + Copyright (C) 1997-2016 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +#include "tconfig.h" +#include "tsystem.h" +#include "coretypes.h" +#include "tm.h" +#include "libgcc_tm.h" +#include "dwarf2.h" +#include "unwind.h" +#ifdef __USING_SJLJ_EXCEPTIONS__ +# define NO_SIZE_OF_ENCODED_VALUE +#endif +#include "unwind-pe.h" +#include "unwind-dw2-fde.h" +#include "gthr.h" +#include "unwind-dw2.h" + +#ifdef HAVE_SYS_SDT_H +#include +#endif + +#ifndef __USING_SJLJ_EXCEPTIONS__ + +#ifndef __LIBGCC_STACK_GROWS_DOWNWARD__ +#define __LIBGCC_STACK_GROWS_DOWNWARD__ 0 +#else +#undef __LIBGCC_STACK_GROWS_DOWNWARD__ +#define __LIBGCC_STACK_GROWS_DOWNWARD__ 1 +#endif + +/* Dwarf frame registers used for pre gcc 3.0 compiled glibc. */ +#ifndef PRE_GCC3_DWARF_FRAME_REGISTERS +#define PRE_GCC3_DWARF_FRAME_REGISTERS __LIBGCC_DWARF_FRAME_REGISTERS__ +#endif + +/* ??? For the public function interfaces, we tend to gcc_assert that the + column numbers are in range. For the dwa
[6/9][AArch64] Add builtins support for pac/aut/xpac
This patch implements a few ARMv8.3-A new builtins for pointer sign and authentication instructions. Currently, these builtins are supposed to be used by libgcc EH unwinder only. They are not public interface to external user. OK to install? gcc/ 2016-11-11 Jiong Wang * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): New entries for AARCH64_PAUTH_BUILTIN_PACI1716, AARCH64_PAUTH_BUILTIN_AUTIA1716, AARCH64_PAUTH_BUILTIN_AUTIB1716, AARCH64_PAUTH_BUILTIN_XPACLRI. (aarch64_init_v8_3_builtins): New. (aarch64_init_builtins): Call aarch64_init_builtins. (arch64_expand_builtin): Expand new builtins. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 9136910cd324a391de929ea9d1a13419dbcfb8bc..20679a5d3f6138f4c55b84f3aff5dfd0341e6787 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -353,6 +353,11 @@ enum aarch64_builtins AARCH64_CRC32_BUILTIN_BASE, AARCH64_CRC32_BUILTINS AARCH64_CRC32_BUILTIN_MAX, + /* ARMv8.3-A Pointer Authentication Builtins. */ + AARCH64_PAUTH_BUILTIN_AUTIA1716, + AARCH64_PAUTH_BUILTIN_AUTIB1716, + AARCH64_PAUTH_BUILTIN_XPACLRI, + AARCH64_PAUTH_BUILTIN_PACI1716, AARCH64_BUILTIN_MAX }; @@ -900,6 +905,37 @@ aarch64_init_fp16_types (void) aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node); } +/* Pointer authentication builtins that will become NOP on legacy platform. + Currently, these builtins are for internal use only (libgcc EH unwinder). */ + +void +aarch64_init_pauth_hint_builtins (void) +{ + /* Pointer Authentication builtins. */ + tree ftype_pointer_auth += build_function_type_list (ptr_type_node, ptr_type_node, +unsigned_intDI_type_node, NULL_TREE); + tree ftype_pointer_strip += build_function_type_list (ptr_type_node, ptr_type_node, NULL_TREE); + + aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_AUTIA1716] += add_builtin_function ("__builtin_aarch64_autia1716", ftype_pointer_auth, + AARCH64_PAUTH_BUILTIN_AUTIA1716, BUILT_IN_MD, NULL, + NULL_TREE); + aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_AUTIB1716] += add_builtin_function ("__builtin_aarch64_autib1716", ftype_pointer_auth, + AARCH64_PAUTH_BUILTIN_AUTIB1716, BUILT_IN_MD, NULL, + NULL_TREE); + aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_XPACLRI] += add_builtin_function ("__builtin_aarch64_xpaclri", ftype_pointer_strip, + AARCH64_PAUTH_BUILTIN_XPACLRI, BUILT_IN_MD, NULL, + NULL_TREE); + aarch64_builtin_decls[AARCH64_PAUTH_BUILTIN_PACI1716] += add_builtin_function ("__builtin_aarch64_paci1716", ftype_pointer_auth, + AARCH64_PAUTH_BUILTIN_PACI1716, BUILT_IN_MD, NULL, + NULL_TREE); +} + void aarch64_init_builtins (void) { @@ -928,6 +964,10 @@ aarch64_init_builtins (void) aarch64_init_crc32_builtins (); aarch64_init_builtin_rsqrt (); + +/* Initialize pointer authentication builtins which are backed by instructions + in NOP encoding space. */ + aarch64_init_pauth_hint_builtins (); } tree @@ -1270,6 +1310,76 @@ aarch64_expand_builtin (tree exp, } emit_insn (pat); return target; +case AARCH64_PAUTH_BUILTIN_AUTIA1716: +case AARCH64_PAUTH_BUILTIN_AUTIB1716: +case AARCH64_PAUTH_BUILTIN_PACI1716: +case AARCH64_PAUTH_BUILTIN_XPACLRI: + arg0 = CALL_EXPR_ARG (exp, 0); + op0 = force_reg (Pmode, expand_normal (arg0)); + + if (!target) + target = gen_reg_rtx (Pmode); + else + target = force_reg (Pmode, target); + + emit_move_insn (target, op0); + + if (fcode == AARCH64_PAUTH_BUILTIN_XPACLRI) + { + rtx lr_reg = gen_rtx_REG (Pmode, R30_REGNUM); + icode = CODE_FOR_strip_lr_sign; + emit_move_insn (lr_reg, op0); + emit_insn (GEN_FCN (icode) (const0_rtx)); + emit_move_insn (target, lr_reg); + } + else + { + tree arg1 = CALL_EXPR_ARG (exp, 1); + rtx op1 = expand_normal (arg1); + bool sign_op_p = (fcode == AARCH64_PAUTH_BUILTIN_PACI1716); + + bool x1716_op_p = (fcode == AARCH64_PAUTH_BUILTIN_AUTIA1716 + || fcode == AARCH64_PAUTH_BUILTIN_AUTIB1716 + || fcode == AARCH64_PAUTH_BUILTIN_PACI1716); + + bool a_key_p = (fcode == AARCH64_PAUTH_BUILTIN_AUTIA1716 + || (aarch64_pauth_key == AARCH64_PAUTH_IKEY_A + && fcode == AARCH64_PAUTH_BUILTIN_PACI1716)); + HOST_WIDE_INT key_index = + a_key_p ? AARCH64_PAUTH_IKEY_A : AARCH64_PAUTH_IKEY_B; + + if (sign_op_p) + { + if (x1716_op_p) + icode = CODE_FOR_sign_reg1716; + else + icode = CODE_FOR_sign_reg; + } + else + { + if (x1716_op_p) + icode = CODE_FOR_auth_reg1716; + else + icode = CODE_FOR_auth_reg;; + } + + op1 = force_reg (Pmode, op1); + + if (x1716_op_p) + { + rtx x16_reg = gen_rtx_REG (Pmode, R16_REGNUM); + rtx x17_reg = gen_rtx_REG (Pmode, R17_REGNUM); + emit_move_insn (x17
[5/9][AArch64] Generate dwarf information for -msign-return-address
This patch generate DWARF description for pointer authentication. DWARF value expression is used to describe the authentication action. Please see the cover letter and AArch64 DWARF specification for the semantics of AArch64 DWARF operations. When authentication key index is A key, we use compact DWARF description which can largely save DWARF frame size, otherwise we fallback to general operator. Example === int cal (int a, int b, int c) { return a + dec (b) + c; } Compact DWARF description (-march=armv8.3-a -msign-return-address) === DW_CFA_advance_loc: 4 to 0004 DW_CFA_val_expression: r30 (x30) (DW_OP_AARCH64_paciasp) DW_CFA_advance_loc: 4 to 0008 DW_CFA_val_expression: r30 (x30) (DW_OP_AARCH64_paciasp_deref: -24) General DWARF description === (-march=armv8.3-a -msign-return-address -mpauth-key=b_key) DW_CFA_advance_loc: 4 to 0004 DW_CFA_val_expression: r30 (x30) (DW_OP_breg30 (x30): 0; DW_OP_AARCH64_pauth: 18) DW_CFA_advance_loc: 4 to 0008 DW_CFA_val_expression: r30 (x30) (DW_OP_dup; DW_OP_const1s: -24; DW_OP_plus; DW_OP_deref; DW_OP_AARCH64_pauth: 18) From Linux kernel testing, -msign-return-address will introduce +24% .debug_frame size increase when signing all functions and using compact description, and about +45% .debug_frame size increase if using general description. gcc/ 2016-11-11 Jiong Wang * config/aarch64/aarch64.h (aarch64_pauth_action_type): New enum. * config/aarch64/aarch64.c (aarch64_attach_ra_auth_dwarf_note): New function. (aarch64_attach_ra_auth_dwarf_general): New function. (aarch64_attach_ra_auth_dwarf_shortcut): New function. (aarch64_save_callee_saves): Generate dwarf information if LR is signed. (aarch64_expand_prologue): Likewise. (aarch64_expand_epilogue): Likewise. diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 4bfadb512915d5dc606f7fc06f027868d6be7613..907e8bdf5b4961b3107dcd5a481de28335e4be89 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -970,4 +970,16 @@ extern tree aarch64_fp16_ptr_type_node; || (aarch64_ra_sign_scope == AARCH64_FUNCTION_NON_LEAF \ && cfun->machine->frame.reg_offset[LR_REGNUM] >= 0)) +/* AArch64 pointer authentication action types. See AArch64 DWARF ABI for + details. */ +enum aarch64_pauth_action_type +{ + /* Drop the authentication signature for instruction pointer. */ + AARCH64_PAUTH_DROP_I, + /* Likewise for data pointer. */ + AARCH64_PAUTH_DROP_D, + /* Do authentication. */ + AARCH64_PAUTH_AUTH +}; + #endif /* GCC_AARCH64_H */ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b3d9a2a3f51ee240d00beb4cc65f99b089a3215e..cae177dca511fdb909ef82c972d3bbdebab215e2 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2717,6 +2717,104 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2) return ""; } +/* Generate return address signing DWARF annotation using general DWARF + operator. DWARF frame size will be bigger than using shortcut DWARF + operator. See aarch64_attach_ra_auth_dwarf for parameter meanings. */ + +static rtx +aarch64_attach_ra_auth_dwarf_general (rtx notes, HOST_WIDE_INT offset) +{ + /* The authentication descriptor. */ + HOST_WIDE_INT desc_const = (AARCH64_PAUTH_AUTH | (aarch64_pauth_key << 4)); + + /* DW_OP_AARCH64_pauth takes one uleb128 operand which is the authentication + descriptor. The low 4 bits of the descriptor is the authentication action + code, all other bits are reserved and initialized into zero except when the + action code is AARCH64_PAUTH_AUTH then bits[7:4] is the authentication key + index. */ + rtx auth_op += gen_rtx_UNSPEC (Pmode, gen_rtvec (2, GEN_INT (desc_const), const0_rtx), + DW_OP_AARCH64_pauth); + + rtx par; + if (offset == 0) +{ + /* Step 1: Push LR onto stack. + NOTE: the bottom of DWARF expression stack is always CFA. + Step 2: Issue AArch64 authentication operation. */ +par = gen_rtx_PARALLEL (DImode, + gen_rtvec (2, gen_rtx_REG (Pmode, LR_REGNUM), + auth_op)); +} + else +{ + rtx dup_cfa + = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, const0_rtx, const0_rtx), + DW_OP_dup); + + rtx deref_op + = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, const0_rtx, const0_rtx), + DW_OP_deref); + + rtx raw_plus + = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, const0_rtx, const0_rtx), + DW_OP_plus); + /* Step 1: Push the authentication key on to dwarf expression stack. + Step 2: Push the stack address of where return address saved, followed + by a memory de-reference operation. + Step 3: Push the authentication descriptor. + Step 4: Issue AArch64 authentication operation. */ + par = gen_rtx_PARALLEL (DImode, + gen_rtvec (5, dup_cfa, GEN_INT (offset), r
[4/9][AArch64] Return address protection on AArch64
As described in the cover letter, this patch implements return address signing for AArch64, it's controlled by the new option: -msign-return-address=[none | non-leaf | all] "none" means don't do return address signing at all on any function. "non-leaf" means only sign non-leaf function. "all" means sign all functions. Return address signing is currently disabled on ILP32. I haven't tested it. The instructions added in the architecture are of 2 kinds. * In the NOP instruction space, which allows binaries to run without any traps on older versions of the architecture. This doesn't give any additional protection on older hardware but allows for the same binary to be used on earlier versions of the architecture and newer versions of the architecture. * New instructions that are only valid for v8.3 and will trap if used on earlier versions of the architecture. At default, once return address signing is enabled, it will only generates NOP instruction. While if -march=armv8.3-a specified, GCC will try to use the most efficient pointer authentication instruction as it can. The architecture has 2 user invisible system keys for signing and creating signed addresses as part of these instructions. For some use case, the user might want to use difference key for different functions. The new option "-msign-return-address-key=key_name" let GCC select the key used for return address signing. Permissible values are "a_key" for A key and "b_key" for B key, and this option are supported by function target attribute and LTO will hopefully just work. gcc/ 2016-11-09 Jiong Wang * config/aarch64/aarch64-opts.h (aarch64_pauth_key_index): New enum. (aarch64_function_type): New enum. * config/aarch64/aarch64-protos.h (aarch64_output_sign_auth_reg): New declaration. * config/aarch64/aarch64.c (aarch64_expand_prologue): Sign return address before it's pushed onto stack. (aarch64_expand_epilogue): Authenticate return address fetched from stack. (aarch64_output_sign_auth_reg): New function. (aarch64_override_options): Sanity check for ILP32 and ISA level. (aarch64_attributes): New function attributes for "sign-return-address", "pauth-key". * config/aarch64/aarch64.md (UNSPEC_AUTH_REG, UNSPEC_AUTH_REG1716, UNSPEC_SIGN_REG, UNSPEC_SIGN_REG1716, UNSPEC_STRIP_REG_SIGN, UNSPEC_STRIP_X30_SIGN): New unspecs. ("*do_return"): Generate combined instructions according to key index. ("sign_reg", "sign_reg1716", "auth_reg", "auth_reg1716", "strip_reg_sign", "strip_lr_sign"): New. * config/aarch64/aarch64.opt (msign-return-address, mpauth-key): New. * config/aarch64/predicates.md (aarch64_const0_const1): New predicate. * doc/extend.texi (AArch64 Function Attributes): Documents "sign-return-address=", "pauth-key". * doc/invoke.texi (AArch64 Options): Documents "-msign-return-address=", "-pauth-key". gcc/testsuite/ 2016-11-09 Jiong Wang * gcc.target/aarch64/return_address_sign_1.c: New testcase. * gcc.target/aarch64/return_address_sign_scope_1.c: New testcase. diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h index c550a74..41c14b3 100644 --- a/gcc/config/aarch64/aarch64-opts.h +++ b/gcc/config/aarch64/aarch64-opts.h @@ -73,4 +73,30 @@ enum aarch64_code_model { AARCH64_CMODEL_LARGE }; +/* AArch64 pointer authentication key indexes. "key_array" in + aarch64_output_sign_auth_reg depends on the order of this enum. */ +enum aarch64_pauth_key_index +{ + /* A key for instruction pointer. */ + AARCH64_PAUTH_IKEY_A = 0, + /* B key for instruction pointer. */ + AARCH64_PAUTH_IKEY_B, + /* A key for data pointer. */ + AARCH64_PAUTH_DKEY_A, + /* B key for data pointer. */ + AARCH64_PAUTH_DKEY_B, + /* A key for general pointer. */ + AARCH64_PAUTH_GKEY_A +}; + +/* Function types -msign-return-address should sign. */ +enum aarch64_function_type { + /* Don't sign any function. */ + AARCH64_FUNCTION_NONE, + /* Non-leaf functions. */ + AARCH64_FUNCTION_NON_LEAF, + /* All functions. */ + AARCH64_FUNCTION_ALL +}; + #endif diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 3cdd69b..fa6d16b 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -329,6 +329,7 @@ rtx aarch64_reverse_mask (enum machine_mode); bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT); char *aarch64_output_scalar_simd_mov_immediate (rtx, machine_mode); char *aarch64_output_simd_mov_immediate (rtx, machine_mode, unsigned); +const char *aarch64_output_sign_auth_reg (rtx
[3/9][AArch64] Add commandline support for -march=armv8.3-a
This patch add command line support for ARMv8.3-A through new architecture: -march=armv8.3-a ARMv8.3-A implies all default features of ARMv8.2-A and meanwhile it includes the new pointer authentication extension. gcc/ 2016-11-08 Jiong Wang * config/aarch64/aarch64-arches.def: New entry for "armv8.3-a". * config/aarch64/aarch64.h (AARCH64_FL_PAUTH, AARCH64_FL_V8_3, AARCH64_FL_FOR_ARCH8_3, AARCH64_ISA_PAUTH, AARCH64_ISA_V8_3, TARGET_PAUTH, TARGET_ARMV8_3): New. * doc/invoke.texi (AArch64 Options): Document "armv8.3-a". diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index 7dcf140411f6eb95504d9b92df9dadce50529a28..0a33f799e66b4ec6e016845eb333f24aaf63383e 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -33,4 +33,5 @@ AARCH64_ARCH("armv8-a", generic, 8A, 8, AARCH64_FL_FOR_ARCH8) AARCH64_ARCH("armv8.1-a", generic, 8_1A, 8, AARCH64_FL_FOR_ARCH8_1) AARCH64_ARCH("armv8.2-a", generic, 8_2A, 8, AARCH64_FL_FOR_ARCH8_2) +AARCH64_ARCH("armv8.3-a", generic, 8_3A, 8, AARCH64_FL_FOR_ARCH8_3) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 19caf9f2979e30671720823829464300b5349273..70efbe9b5f97bd38d61ad66e38608f7ac5bdfb38 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -138,6 +138,10 @@ extern unsigned aarch64_architecture_version; /* ARMv8.2-A architecture extensions. */ #define AARCH64_FL_V8_2 (1 << 8) /* Has ARMv8.2-A features. */ #define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 extensions. */ +/* ARMv8.3-A architecture extensions. */ +#define AARCH64_FL_PAUTH (1 << 10) /* Has Pointer Authentication + Extensions. */ +#define AARCH64_FL_V8_3 (1 << 11) /* Has ARMv8.3-A features. */ /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) @@ -151,6 +155,8 @@ extern unsigned aarch64_architecture_version; (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1) #define AARCH64_FL_FOR_ARCH8_2 \ (AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_V8_2) +#define AARCH64_FL_FOR_ARCH8_3 \ + (AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_V8_3 | AARCH64_FL_PAUTH) /* Macros to test ISA flags. */ @@ -162,6 +168,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_ISA_RDMA (aarch64_isa_flags & AARCH64_FL_V8_1) #define AARCH64_ISA_V8_2 (aarch64_isa_flags & AARCH64_FL_V8_2) #define AARCH64_ISA_F16 (aarch64_isa_flags & AARCH64_FL_F16) +#define AARCH64_ISA_PAUTH (aarch64_isa_flags & AARCH64_FL_PAUTH) +#define AARCH64_ISA_V8_3 (aarch64_isa_flags & AARCH64_FL_V8_3) /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO) @@ -176,6 +184,12 @@ extern unsigned aarch64_architecture_version; #define TARGET_FP_F16INST (TARGET_FLOAT && AARCH64_ISA_F16) #define TARGET_SIMD_F16INST (TARGET_SIMD && AARCH64_ISA_F16) +/* Pointer Authentication extension. */ +#define TARGET_PAUTH (AARCH64_ISA_PAUTH) + +/* ARMv8.3-A extension. */ +#define TARGET_ARMV8_3 (AARCH64_ISA_V8_3) + /* Make sure this is always defined so we don't have to check for ifdefs but rather use normal ifs. */ #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 87da1f1c12b718fa63c9b89fdd8f85fbc6b54cb0..18ab6d9f20eca7fa29317e10678f1e46f64039bd 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -13257,7 +13257,10 @@ more feature modifiers. This option has the form @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}. The permissible values for @var{arch} are @samp{armv8-a}, -@samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}. +@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @var{native}. + +The value @samp{armv8.3-a} implies @samp{armv8.2-a} and enables compiler +support for the ARMv8.3-A architecture extensions. The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler support for the ARMv8.2-A architecture extensions.
[2/9] Encoding support for AArch64 DWARF operations
The encoding for new added AARCH64 DWARF operations. I am thinking DWARF specification actually allows vendor private operations overlap with each other as one can't co-exist with the other. So in theory we should introduce target hook to handle target private operations. But in GCC/binutils/LLVM scope, I only see one overlap between DW_OP_GNU_push_tls_address and and DW_OP_HP_unknown, and DW_OP_HP_unknown seems not used. So I added the support in GCC generic code directly instead of introducing target hook. Is this OK to install? gcc/ 2016-11-11 Jiong Wang * dwarf2out.c (size_of_loc_descr): Increase set for DW_OP_AARCH64_pauth and DW_OP_AARCH64_paciasp_deref. (output_loc_operands): Generate encoding for DW_OP_AARCH64_pauth and DW_OP_AARCH64_paciasp_deref. (output_loc_operands_raw): Likewise. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 4a5c602f535fa49a45ae96f356f63c955dc527c6..fd159abe3c402cc8dedb0422e7b2680aabd28f93 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -1698,6 +1698,12 @@ size_of_loc_descr (dw_loc_descr_ref loc) case DW_OP_GNU_parameter_ref: size += 4; break; +case DW_OP_AARCH64_pauth: + size += size_of_uleb128 (loc->dw_loc_oprnd1.v.val_unsigned); + break; +case DW_OP_AARCH64_paciasp_deref: + size += size_of_sleb128 (loc->dw_loc_oprnd1.v.val_int); + break; default: break; } @@ -2177,6 +2183,13 @@ output_loc_operands (dw_loc_descr_ref loc, int for_eh_or_skip) } break; +case DW_OP_AARCH64_pauth: + dw2_asm_output_data_uleb128 (val1->v.val_unsigned, NULL); + break; +case DW_OP_AARCH64_paciasp_deref: + dw2_asm_output_data_sleb128 (val1->v.val_int, NULL); + break; + default: /* Other codes have no operands. */ break; @@ -2365,6 +2378,15 @@ output_loc_operands_raw (dw_loc_descr_ref loc) gcc_unreachable (); break; +case DW_OP_AARCH64_pauth: + fputc (',', asm_out_file); + dw2_asm_output_data_uleb128_raw (val1->v.val_unsigned); + break; +case DW_OP_AARCH64_paciasp_deref: + fputc (',', asm_out_file); + dw2_asm_output_data_sleb128_raw (val1->v.val_int); + break; + default: /* Other codes have no operands. */ break;
[1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space
This patch introduces three AARCH64 private DWARF operations in vendor extension space. DW_OP_AARCH64_pauth 0xea === Takes one unsigned LEB 128 Pointer Authentication Description. Bits [3:0] of the description contain the Authentication Action Code. All unused bits are initialized to 0. The operation then proceeds according to the value of the action code as described in the Action Code Table. DW_OP_AARCH64_paciasp 0xeb === Authenticates the contents in X30/LR register as per A key for instruction pointer using current CFA as salt. The result is pushed onto the stack. DW_OP_AARCH64_paciasp_deref 0xec === Takes one signed LEB128 offset and retrieves 8-byte contents from the address calculated by CFA plus this offset, the contents then authenticated as per A key for instruction pointer using current CFA as salt. The result is pushed onto the stack. Action Code Table == Action Code | Note -- 0| Pops a single 8-byte operand from the stack representing a | signed instruction pointer, "drops" the authentication | signature and pushes the value onto stack. -- 1| Pops a single 8-byte operand from the stack representing a | signed data pointer, "drops" the authentication signature | and pushes the value on to stack. -- 2| Bits [7:4] of the Pointer Authentication Description contain | an Authentication Key Index. The operation then pops the top | two stack entries. The first is an 8-byte value to be | authenticated. The second is an 8-byte salt. The first value | is then authenticated as per the Authentication Key Index | using the salt. The result is pushed onto stack. Authentication Key Index = 0| A key for instruction pointer. - 1| B key for instruction pointer. - 2| A key for data pointer. - 3| B key for data pointer. - 4| A key for general pointer. DW_OP_AARCH64_pauth is designed to offer general description for all scenarios. DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref are two shortcut operations for return address signing. They offer more compact debug frame encoding. For DWARF operation vendor extension space between DW_OP_lo_user and DW_OP_hi_user, I think vendor is free to reserve any number and numbers for one vender can overlap with the other, as operations for different vendors are not supposed to co-exist. One exception is that GNU toolchain have reserved some numbers inside this space (DW_OP_GNU*), so vendor's numbers need to avoid overlapping with them. For these three numbers, they are not used in LLVM's implementation. NOTE: the assigned values are provisional, we may need to change them if they are found to be in conflict with on other toolchains. Please review, Thanks. include/ 2016-11-09 Richard Earnshaw Jiong Wang * dwarf2.def (DW_OP_AARCH64_pauth): Reserve the number 0xea. (DW_OP_AARCH64_paciasp): Reserve the number 0xeb. (Dw_OP_AARCH64_paciasp_deref): Reserve the number 0xec. diff --git a/include/dwarf2.def b/include/dwarf2.def index 5241fe8615e0e3b288fee80c08a67723686ef411..8eaa90c3b4748ecfc025a6c2dd6afcd5fd80be28 100644 --- a/include/dwarf2.def +++ b/include/dwarf2.def @@ -631,6 +631,16 @@ DW_OP (DW_OP_HP_unmod_range, 0xe5) DW_OP (DW_OP_HP_tls, 0xe6) /* PGI (STMicroelectronics) extensions. */ DW_OP (DW_OP_PGI_omp_thread_num, 0xf8) +/* ARM extension for pointer authentication + DW_OP_AARCH64_pauth: takes one uleb128 operand which is authentication + descriptor. Perform actions indicated by the descriptor. + DW_OP_AARCH64_paciasp: no operand. Authenticate value in X30/LR using A key + and CFA as salt. + DW_OP_AARCH64_paciasp_deref: takes one sleb128 operand as offset. + Authenticate value in [CFA + offset] using A key and salt is CFA. */ +DW_OP (DW_OP_AARCH64_pauth, 0xea) +DW_OP (DW_OP_AARCH64_paciasp, 0xeb) +DW_OP (DW_OP_AARCH64_paciasp_deref, 0xec) DW_END_OP DW_FIRST_ATE (DW_ATE_void, 0x0)
[0/9] Support ARMv8.3-A Pointer Authentication Extension
As the introduction at https://community.arm.com/groups/processors/blog/2016/10/27/armv8-a-architecture-2016-additions ARMv8.3-A includes a new hardware feature called "Pointer Authentication". This new extension support some new instructions which can sign and authenticate pointer value. One utilization of this feature is we can implement Return-Oriented-Programming protections. For example, we can sign the return address register at function start then authenticate it before return. If the content is modified unexpectedly, then exception will happen thus prevent redirecting program's execution flow. This type of prevention however requires the original content of return address be signed that unwinders (C++ EH unwinder, GDB unwinder etc...) can no longer do backtrace correctly without understanding how to restore the original value of return address. Therefore we need to describe any return address or frame related register mangling through DWARF information. This patchset includes implementation of such return address siging protection and AArch64 DWARF operation extension. Below are comparision of codesize overhead between standard gcc -fstack-protector-strong and -msign-return-address on AArch64. linux kernel opensslProtection Scope (libcrypto + libssl) --- ssp-strong (gcc) + 2.93% + 2.98% Overflow protect on risky functions --- sign LR+ 1.82% + 2.18% LR protect on All Please review this patchset. Thanks. Jiong Wang (9): [RFC] Reserve three DW_OP number in vendor extension space Encoding support for AArch64 DWARF operations Add commandline support for -march=armv8.3-a Return address protection support on AArch64 Generate dwarf information for -msign-return-address Add builtins support for pac/aut/xpac libgcc, let AArch64 use customized unwinder file libgcc, runtime support for AArch64 DWARF operations [RFC] Accelerate -fstack-protector through pointer authentication gcc/config/aarch64/aarch64-arches.def |1 + gcc/config/aarch64/aarch64-builtins.c | 110 ++ gcc/config/aarch64/aarch64-opts.h | 32 + gcc/config/aarch64/aarch64-protos.h|1 + gcc/config/aarch64/aarch64.c | 296 +++- gcc/config/aarch64/aarch64.h | 54 + gcc/config/aarch64/aarch64.md | 128 +- gcc/config/aarch64/aarch64.opt | 45 + gcc/config/aarch64/predicates.md |4 + gcc/defaults.h |8 + gcc/doc/extend.texi| 12 + gcc/doc/invoke.texi| 24 +- gcc/doc/tm.texi|8 + gcc/doc/tm.texi.in |8 + gcc.target/aarch64/return_address_sign_1.c | 57 + gcc.target/aarch64/return_address_sign_scope_1.c | 57 + include/dwarf2.def | 10 + libgcc/config.host |2 + libgcc/config/aarch64/t-eh-aarch64 |3 + libgcc/config/aarch64/unwind-aarch64.c | 1820
Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy
On 07/11/16 17:04, Bernd Schmidt wrote: On 11/03/2016 03:00 PM, Eric Botcazou wrote: FWIW here's a more complete version of my patch which I'm currently testing. Let me know if you think it's at least a good enough intermediate step to be installed. It is, thanks. Testing showed the same issue as Jiong found, so I've committed it with that extra tweak. Thanks very much! I have closed PR middle-end/78016 Regards, Jiong
Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy
On 03/11/16 13:01, Bernd Schmidt wrote: Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c (revision 241233) +++ gcc/emit-rtl.c (working copy) @@ -6169,17 +6169,18 @@ emit_copy_of_insn_after (rtx_insn *insn, which may be duplicated by the basic block reordering code. */ RTX_FRAME_RELATED_P (new_rtx) = RTX_FRAME_RELATED_P (insn); + /* Locate the end of existing REG_NOTES in NEW_RTX. */ + rtx *ptail = ®_NOTES (new_rtx); + gcc_assert (*ptail == NULL_RTX); + Looks like new_rtx may contain it's own REG_NOTES when reached here thus triggered ICE, I guess mark_jump_label may generate REG_LABEL_OPERAND as the comments says. After replace the gcc_assert with the following loop, this patch passed bootstrap on both AArch64 and X86-64, and regression OK on gcc and g++. + while (*ptail != NULL_RTX) +ptail = &XEXP (*ptail, 1); Regards, Jiong
Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy
On 03/11/16 12:06, Eric Botcazou wrote: What's your decision on this? I think that we ought to standardize on a single order for note copying in the RTL middle-end and the best way to enforce it is to have a single primitive in rtlanal.c, with an optional filtering. Bernd's patch is a step in the right direction, but doesn't enforce the single order. Maybe something based on a macro calling duplicate_reg_note, but not clear whether it's really better. Thanks for the feedback, I'll try to work through this. Regards, Jiong
Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
On 02/11/16 13:42, Jakub Jelinek wrote: On Wed, Nov 02, 2016 at 01:26:48PM +, Jiong Wang wrote: -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ Too long line. Hmm, it shows 80 columns under my editor. I guess '+' is counted in? +/* RTL sequences inside PARALLEL are raw expression representation. + + mem_loc_descriptor can be used to build generic DWARF expressions for + DW_CFA_expression and DW_CFA_val_expression where the expression may can + not be represented using normal RTL sequences. In this case, group all + expression operations (DW_OP_*) inside a PARALLEL. For those DW_OP which + doesn't have RTL mapping, wrap it using UNSPEC. The logic for parsing + PARALLEL sequences is: + + foreach elem inside PARALLEL + if (elem is UNSPEC) + dw_op = XINT (elem, 1) (DWARF operation is kept as UNSPEC number) + oprnd1 = XVECEXP (elem, 0, 0) + oprnd2 = XVECEXP (elem, 0, 1) + else + call mem_loc_descriptor */ Not sure if it is a good idea to document in weirdly formatted pseudo-language what the code actually does a few lines below. IMHO either express it in words, or don't express it at all. OK, fixed. I replaced these comments as some brief words. + exp_result = + new_loc_descr ((enum dwarf_location_atom) dw_op, oprnd1, +oprnd2); Wrong formatting, = should be on the next line. + } + else + exp_result = + mem_loc_descriptor (elem, mode, mem_mode, + VAR_INIT_STATUS_INITIALIZED); Likewise. Both fixed. Patch updated, please review. Thanks. gcc/ 2016-11-02 Jiong Wang * reg-notes.def (CFA_VAL_EXPRESSION): New entry. * dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New function. (dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION. (output_cfa_loc): Support DW_CFA_val_expression. (output_cfa_loc_raw): Likewise. (output_cfi): Likewise. (output_cfi_directive): Likewise. * dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression. (dw_cfi_oprnd2_desc): Likewise. (mem_loc_descriptor): Recognize new pattern generated for value expression. commit 36de0173c17efcc30c56ef10304377e71313e8bc Author: Jiong Wang Date: Wed Oct 19 15:42:04 2016 +0100 dwarf val expression diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 6491d5a..b8c88fb 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set) reg_save (sregno, dregno, 0); } -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ static void dwarf2out_frame_debug_cfa_expression (rtx set) @@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set) update_row_reg_save (cur_row, regno, cfi); } +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION + note. */ + +static void +dwarf2out_frame_debug_cfa_val_expression (rtx set) +{ + rtx dest = SET_DEST (set); + gcc_assert (REG_P (dest)); + + rtx span = targetm.dwarf_register_span (dest); + gcc_assert (!span); + + rtx src = SET_SRC (set); + dw_cfi_ref cfi = new_cfi (); + cfi->dw_cfi_opc = DW_CFA_val_expression; + cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest); + cfi->dw_cfi_oprnd2.dw_cfi_loc += mem_loc_descriptor (src, GET_MODE (src), + GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED); + add_cfi (cfi); + update_row_reg_save (cur_row, dwf_regno (dest), cfi); +} + /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note. */ static void @@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn) break; case REG_CFA_EXPRESSION: + case REG_CFA_VAL_EXPRESSION: n = XEXP (note, 0); if (n == NULL) n = single_set (insn); - dwarf2out_frame_debug_cfa_expression (n); + + if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION) + dwarf2out_frame_debug_cfa_expression (n); + else + dwarf2out_frame_debug_cfa_val_expression (n); + handled_one = true; break; @@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression + || cfi->dw_cfi_opc == DW_CFA_val_expression) { unsigned r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh); @@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression +
Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
On 01/11/16 16:48, Jason Merrill wrote: > It seems to me that a CFA_*expression note would never use target > UNSPEC codes, and a DWARF UNSPEC would never appear outside of such a > note, so we don't need to worry about conflicts. Indeed. DWARF UNSPEC is deeper inside DW_CFA_*expression note. My worry of conflict makes no sense. I updated the patch to put DWARF operation in to UNSPEC number field. x86-64 bootstrap OK, no regression on gcc/g++. Please review. Thanks. gcc/ 2016-11-02 Jiong Wang * reg-notes.def (CFA_VAL_EXPRESSION): New entry. * dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New function. (dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION. (output_cfa_loc): Support DW_CFA_val_expression. (output_cfa_loc_raw): Likewise. (output_cfi): Likewise. (output_cfi_directive): Likewise. * dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression. (dw_cfi_oprnd2_desc): Likewise. (mem_loc_descriptor): Recognize new pattern generated for value expression. diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 6491d5aaf4c4a21241cc718bfff1016f6d149951..b8c88fbae1df80a2664a414d8ae016a5343bf435 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set) reg_save (sregno, dregno, 0); } -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ static void dwarf2out_frame_debug_cfa_expression (rtx set) @@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set) update_row_reg_save (cur_row, regno, cfi); } +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION + note. */ + +static void +dwarf2out_frame_debug_cfa_val_expression (rtx set) +{ + rtx dest = SET_DEST (set); + gcc_assert (REG_P (dest)); + + rtx span = targetm.dwarf_register_span (dest); + gcc_assert (!span); + + rtx src = SET_SRC (set); + dw_cfi_ref cfi = new_cfi (); + cfi->dw_cfi_opc = DW_CFA_val_expression; + cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest); + cfi->dw_cfi_oprnd2.dw_cfi_loc += mem_loc_descriptor (src, GET_MODE (src), + GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED); + add_cfi (cfi); + update_row_reg_save (cur_row, dwf_regno (dest), cfi); +} + /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note. */ static void @@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn) break; case REG_CFA_EXPRESSION: + case REG_CFA_VAL_EXPRESSION: n = XEXP (note, 0); if (n == NULL) n = single_set (insn); - dwarf2out_frame_debug_cfa_expression (n); + + if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION) + dwarf2out_frame_debug_cfa_expression (n); + else + dwarf2out_frame_debug_cfa_val_expression (n); + handled_one = true; break; @@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression + || cfi->dw_cfi_opc == DW_CFA_val_expression) { unsigned r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh); @@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression + || cfi->dw_cfi_opc == DW_CFA_val_expression) { unsigned r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1); @@ -3188,6 +3219,7 @@ output_cfi (dw_cfi_ref cfi, dw_fde_ref fde, int for_eh) case DW_CFA_def_cfa_expression: case DW_CFA_expression: + case DW_CFA_val_expression: output_cfa_loc (cfi, for_eh); break; @@ -3302,16 +3334,13 @@ output_cfi_directive (FILE *f, dw_cfi_ref cfi) break; case DW_CFA_def_cfa_expression: - if (f != asm_out_file) - { - fprintf (f, "\t.cfi_def_cfa_expression ...\n"); - break; - } - /* FALLTHRU */ case DW_CFA_expression: +case DW_CFA_val_expression: if (f != asm_out_file) { - fprintf (f, "\t.cfi_cfa_expression ...\n"); + fprintf (f, "\t.cfi_%scfa_%sexpression ...\n", + cfi->dw_cfi_opc == DW_CFA_def_cfa_expression ? "def_" : "", + cfi->dw_cfi_opc == DW_CFA_val_expression ? "val_" : ""); break; } fprintf (f, "\t.cfi_escape %#x,", cfi->dw_cfi_opc); diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 4a3df339df2c6a6816ac8b8dbdb2466a7492c592..7dac70d7392f2c457ffd3f677e07decb1ba488a1 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -518,6 +518,7 @@ dw_cfi_oprnd1_desc (enum dwarf_call_frame_info cfi) case DW_CFA_def_cfa_register: case DW_CFA_register: case DW_CFA_expression: +case DW
Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
On 01/11/16 16:48, Jason Merrill wrote: On Tue, Nov 1, 2016 at 11:59 AM, Jiong Wang wrote: On 01/11/16 15:24, Jason Merrill wrote: On Tue, Nov 1, 2016 at 11:12 AM, Jiong Wang wrote: On 31/10/16 19:50, Jason Merrill wrote: On 10/21/2016 04:30 AM, Jiong Wang wrote: All DW_OP_* of the expression are grouped together inside the PARALLEL, and those operations which don't have RTL mapping are wrapped by UNSPEC. The parsing algorithm is simply something like: foreach elem inside PARALLEL if (UNSPEC) { dw_op_code = INTVAL (XVECEXP (elem, 0, 0)); oprnd1 = INTVAL (XVECEXP (elem, 0, 1)); oprnd2 = INTVAL (XVECEXP (elem, 0, 2)); } else call standard RTL parser. Any comments on the approach? If you're going to use UNSPEC, why not put the DWARF operator in the second operand? Thanks for the review, but I still don't understand your meaning. Do you mean I should simply put the DWARF operator at XVECEXP (UNSPEC_RTX, 0, 2) instead of at XVECEXP (UNSPEC_RTX, 0, 0) No, at XINT (UNSPEC_RTX, 1). The documentation of UNSPEC says, /* A machine-specific operation. 1st operand is a vector of operands being used by the operation so that any needed reloads can be done. 2nd operand is a unique value saying which of a number of machine-specific operations is to be performed. Aha, understood now, thanks for the clarification. You mean we simply reuse the UNSPEC number field, so the RTX will be (UNSPEC [((reg) (reg)] DW_OP_XXX) Yeah, I do have tried to do that, but later give up, one reason I remember is suppose we want to push two value on the stack, the second value is an address, which we want a follow up DW_OP_deref to operate on that. then the value expression will be (set (reg A) (parallel [(reg A) (UNSPEC [DW_OP_deref, const0_rtx, const0_rtx] UNSPEC_PRIVATE_DW); (UNSPEC [DW_OP_XXX (const0_rtx) (const0_rtx)] UNSPEC_PRIVATE_DW)) And there might be some other expressions we need some complex RAW encoding, Why can't you do this putting the OP in the number field of both UNSPECs? I was demoing the RTX based on my current approach, and simplfy want to say we only need to define one unspec number (UNSPEC_PRIVATE_DW), while if we putting the OP in the number field of both UNSPECs, we need two unspec number, and we might need more for other similar expressions. If we don't need to worry about the conflicts, then your suggestion is definitely better. I will do more tests on this. Besides this issue, do you think the PARALLEL + UNSPEC based approach to represent DWARF RAW expression is acceptable? Thanks. Regards, Jiong so it seems to me if we want to offer user the most general way to do this, then it's better to encode the DWARF operation inside UNSPEC as reuse the UNSPEC number then you need to make sure there is no overlap with other backend UNSPEC enumeration number. It seems to me that a CFA_*expression note would never use target UNSPEC codes, and a DWARF UNSPEC would never appear outside of such a note, so we don't need to worry about conflicts. Jason
Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
On 01/11/16 15:24, Jason Merrill wrote: On Tue, Nov 1, 2016 at 11:12 AM, Jiong Wang wrote: On 31/10/16 19:50, Jason Merrill wrote: On 10/21/2016 04:30 AM, Jiong Wang wrote: All DW_OP_* of the expression are grouped together inside the PARALLEL, and those operations which don't have RTL mapping are wrapped by UNSPEC. The parsing algorithm is simply something like: foreach elem inside PARALLEL if (UNSPEC) { dw_op_code = INTVAL (XVECEXP (elem, 0, 0)); oprnd1 = INTVAL (XVECEXP (elem, 0, 1)); oprnd2 = INTVAL (XVECEXP (elem, 0, 2)); } else call standard RTL parser. Any comments on the approach? If you're going to use UNSPEC, why not put the DWARF operator in the second operand? Thanks for the review, but I still don't understand your meaning. Do you mean I should simply put the DWARF operator at XVECEXP (UNSPEC_RTX, 0, 2) instead of at XVECEXP (UNSPEC_RTX, 0, 0) No, at XINT (UNSPEC_RTX, 1). The documentation of UNSPEC says, /* A machine-specific operation. 1st operand is a vector of operands being used by the operation so that any needed reloads can be done. 2nd operand is a unique value saying which of a number of machine-specific operations is to be performed. Aha, understood now, thanks for the clarification. You mean we simply reuse the UNSPEC number field, so the RTX will be (UNSPEC [((reg) (reg)] DW_OP_XXX) Yeah, I do have tried to do that, but later give up, one reason I remember is suppose we want to push two value on the stack, the second value is an address, which we want a follow up DW_OP_deref to operate on that. then the value expression will be (set (reg A) (parallel [(reg A) (UNSPEC [DW_OP_deref, const0_rtx, const0_rtx] UNSPEC_PRIVATE_DW); (UNSPEC [DW_OP_XXX (const0_rtx) (const0_rtx)] UNSPEC_PRIVATE_DW)) And there might be some other expressions we need some complex RAW encoding, so it seems to me if we want to offer user the most general way to do this, then it's better to encode the DWARF operation inside UNSPEC as reuse the UNSPEC number then you need to make sure there is no overlap with other backend UNSPEC enumeration number. Does this explanation make sense to you? Thanks.
Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
On 31/10/16 19:50, Jason Merrill wrote: On 10/21/2016 04:30 AM, Jiong Wang wrote: All DW_OP_* of the expression are grouped together inside the PARALLEL, and those operations which don't have RTL mapping are wrapped by UNSPEC. The parsing algorithm is simply something like: foreach elem inside PARALLEL if (UNSPEC) { dw_op_code = INTVAL (XVECEXP (elem, 0, 0)); oprnd1 = INTVAL (XVECEXP (elem, 0, 1)); oprnd2 = INTVAL (XVECEXP (elem, 0, 2)); } else call standard RTL parser. Any comments on the approach? If you're going to use UNSPEC, why not put the DWARF operator in the second operand? Hi Jason, Thanks for the review, but I still don't understand your meaning. Do you mean I should simply put the DWARF operator at XVECEXP (UNSPEC_RTX, 0, 2) instead of at XVECEXP (UNSPEC_RTX, 0, 0), and the new parsing algorithm will be the following ? foreach elem inside PARALLEL if (UNSPEC) { oprnd1 = INTVAL (XVECEXP (elem, 0, 0)); oprnd2 = INTVAL (XVECEXP (elem, 0, 1)); dw_op_code = INTVAL (XVECEXP (elem, 0, 2)); } else call standard RTL parser. I actually don't see the benefit of this change, could you please give more comments on this? For this patch, suppose the unwinding rule for register A is poping two values from dwarf evalutaion stack, do some complex processing based on the two values, then push back the result on to stack. We can generate the dwarf value expression description like: (set (reg A) (parallel [(reg A) (reg B) (UNSPEC [(const_int DW_OP_XXX) (const0_rtx) (const0_rtx)] UNSPEC_NUM) then readelf dump will be something like === DW_CFA_val_expression: A (DW_OP_bregB: 0; DW_OP_bregC: 0; DW_OP_XXX) We can't do such description based on current GCC dwarf code, right?
Re: [PATCH][AArch64] Add function comments to some prologue/epilogue helpers
On 31/10/16 12:10, Kyrill Tkachov wrote: Ping. Thanks, Kyrill On 24/10/16 12:30, Kyrill Tkachov wrote: Ping. https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00839.html Thanks, Kyrill On 12/10/16 11:23, Kyrill Tkachov wrote: Hi all, I'm looking at the aarch64 prologue and epilogue generation code and I noticed many of the helper functions don't have function comments so it makes it harder than it has to to understand what's going on. This patch adds function comments to some of them. I hope I understood the functions correctly. Is this ok for trunk? Thanks, Kyrill 2016-10-12 Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_register_saved_on_entry): Add function comment. (aarch64_next_callee_save): Likewise. (aarch64_pushwb_single_reg): Likewise. (aarch64_gen_storewb_pair): Likewise. (aarch64_push_regs): Likewise. (aarch64_gen_loadwb_pair): Likewise. (aarch64_pop_regs): Likewise. (aarch64_gen_store_pair): Likewise. (aarch64_gen_load_pair): Likewise. (aarch64_save_callee_saves): Likewise. (aarch64_restore_callee_saves): Likewise. I "contributed" some of these functions without comments... The new added comments looks good to me though I can't approve. Thanks for fixing these. Regards, Jiong
Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy
On 21/10/16 13:30, Bernd Schmidt wrote: On 10/21/2016 02:04 PM, Jiong Wang wrote: + /* Locate the end of existing REG_NOTES in NEW_RTX. */ + rtx *ptail = ®_NOTES (new_rtx); + while (*ptail != NULL_RTX) +ptail = &XEXP (*ptail, 1); I was thinking along the lines of something like this (untested, emit-rtl.c part omitted). Eric can choose whether he likes either of these or wants something else. Hi Eric, What's your decision on this? Thanks. Regards, Jiong Bernd Index: gcc/rtl.h === --- gcc/rtl.h(revision 241233) +++ gcc/rtl.h(working copy) @@ -3008,6 +3008,7 @@ extern rtx alloc_reg_note (enum reg_note extern void add_reg_note (rtx, enum reg_note, rtx); extern void add_int_reg_note (rtx, enum reg_note, int); extern void add_shallow_copy_of_reg_note (rtx_insn *, rtx); +extern rtx duplicate_reg_note (rtx_insn *, rtx); extern void remove_note (rtx, const_rtx); extern void remove_reg_equal_equiv_notes (rtx_insn *); extern void remove_reg_equal_equiv_notes_for_regno (unsigned int); Index: gcc/rtlanal.c === --- gcc/rtlanal.c(revision 241233) +++ gcc/rtlanal.c(working copy) @@ -2304,6 +2304,21 @@ add_shallow_copy_of_reg_note (rtx_insn * add_reg_note (insn, REG_NOTE_KIND (note), XEXP (note, 0)); } +/* Duplicate NOTE and return the copy. */ +rtx +duplicate_reg_note (rtx note) +{ + rtx n; + reg_note_kind kind = REG_NOTE_KIND (note); + + if (GET_CODE (note) == INT_LIST) +return gen_rtx_INT_LIST ((machine_mode) kind, XINT (note, 0), NULL_RTX); + else if (GET_CODE (note) == EXPR_LIST) +return alloc_reg_note (kind, copy_insn_1 (XEXP (note, 0)), NULL_RTX); + else +return alloc_reg_note (kind, XEXP (note, 0), NULL_RTX); +} + /* Remove register note NOTE from the REG_NOTES of INSN. */ void Index: gcc/sel-sched-ir.c === --- gcc/sel-sched-ir.c(revision 241233) +++ gcc/sel-sched-ir.c(working copy) @@ -5762,6 +5762,11 @@ create_copy_of_insn_rtx (rtx insn_rtx) res = create_insn_rtx_from_pattern (copy_rtx (PATTERN (insn_rtx)), NULL_RTX); + /* Locate the end of existing REG_NOTES in NEW_RTX. */ + rtx *ptail = ®_NOTES (new_rtx); + while (*ptail != NULL_RTX) +ptail = &XEXP (*ptail, 1); + /* Copy all REG_NOTES except REG_EQUAL/REG_EQUIV and REG_LABEL_OPERAND since mark_jump_label will make them. REG_LABEL_TARGETs are created there too, but are supposed to be sticky, so we copy them. */ @@ -5770,11 +5775,8 @@ create_copy_of_insn_rtx (rtx insn_rtx) && REG_NOTE_KIND (link) != REG_EQUAL && REG_NOTE_KIND (link) != REG_EQUIV) { -if (GET_CODE (link) == EXPR_LIST) - add_reg_note (res, REG_NOTE_KIND (link), -copy_insn_1 (XEXP (link, 0))); -else - add_reg_note (res, REG_NOTE_KIND (link), XEXP (link, 0)); +*ptail = duplicate_reg_note (link); +ptail = &XEXP (*ptail, 1); } return res;
[Ping][gcc] Enable DW_CFA_val_expression support in dwarf module
On 21/10/16 09:30, Jiong Wang wrote: Currently, GCC only support DW_CFA_expression in dwarf module, this patch extend the support to DW_CFA_val_expression which share the same code mostly the same code with DW_CFA_expression. Meanwhile the existed dwarf expression parser only allows expressions which can be represented using GCC RTL. If one operation doesn't have a correspondent GCC RTL operator, then there is no way to attach that information in reg-note. This patch extends the current dwarf expression support to unlimited number of operations by using PARALLEL, and unlimited type of operations by using UNSPEC. All DW_OP_* of the expression are grouped together inside the PARALLEL, and those operations which don't have RTL mapping are wrapped by UNSPEC. The parsing algorithm is simply something like: foreach elem inside PARALLEL if (UNSPEC) { dw_op_code = INTVAL (XVECEXP (elem, 0, 0)); oprnd1 = INTVAL (XVECEXP (elem, 0, 1)); oprnd2 = INTVAL (XVECEXP (elem, 0, 2)); } else call standard RTL parser. Any comments on the approach? Ping ~ Thanks. gcc/ 2016-10-20 Jiong Wang * reg-notes.def (CFA_VAL_EXPRESSION): New entry. * dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New function. (dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION. (output_cfa_loc): Support DW_CFA_val_expression. (output_cfa_loc_raw): Likewise. (output_cfi): Likewise. (output_cfi_directive): Likewise. * dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression. (dw_cfi_oprnd2_desc): Likewise. (mem_loc_descriptor): Recognize new pattern generated for value expression.
Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE
On 24/10/16 16:22, Jeff Law wrote: On 10/20/2016 01:46 PM, Jiong Wang wrote: 2016-10-20 19:50 GMT+01:00 Jeff Law : On 10/20/2016 09:28 AM, Jiong Wang wrote: The current code suppose targetm.stack_protect_fail always generate something. But in case one target start to generate NULL_TREE, there will be ICE. This patch adds a simple sanity check to only call expand if it's not NULL_TREE. OK for trunk? gcc/ 2016-10-20 Jiong Wang * function.c (stack_protect_epilogue): Only expands targetm.stack_protect_fail if it's not NULL_TREE. Is there some reason we don't want to issue an error here and stop compilation? I'm not at all comfortable silently ignoring failure to generate stack protector code. jeff Hi Jeff, That's because I am doing some work where I will borrow stack-protector's analysis infrastructure but for those stack-protector standard rtl insn, they just need to be expanded into empty, for example stack_protect_set/test just need to be expanded into NOTE_INSN_DELETED. The same for targetm.stack_protect_fail () which I want to simply return NULL_TREE. but it's not an error. Right. But your change could mask backend problems. Specifically if their expander for stack_protect_fail did fail and returned NULL_TREE. That would cause it to silently ignore stack protector failures, which seems inadvisable. Is there another way you can re-use the analysis code without resorting to something like this? In my case, I only want the canary variable which is "crtl->stack_protect_guard", then I don't want the current runtime support which GCC will always generate once crl->stack_protect_guard is initialized. I was thinking to let stack_protect_fail to generate a tree that expand_call will expand into NULL_RTX unconditionally under any optimization level, but it seems impossible. Really appreicate for any idea on this. This do seems affect other targets (x86, rs6000) if NULL_TREE should never be returned for them. Currently I can see all of them use the either default_external_stack_protect_fail or default_hidden_stack_protect_fail, both of which are "return build_call_expr (..", so I should also assert the the return value of build_call_expr? Asserting couldn't hurt. I'd much rather have the compiler issue an error, ICE or somesuch than silently not generate a call to the stack protector fail routine.
Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy
On 21/10/16 11:13, Bernd Schmidt wrote: On 10/21/2016 09:43 AM, Eric Botcazou wrote: I disagree: there are currently n ways of copying NOTEs in the RTL middle-end, with different properties each time. We need only one primitive in rtlanal.c. I feel the fact that they have different properties means we shouldn't try to unify them: we'll just end up with a long list of boolean parameters, with no way of quickly telling what a given function call is doing. A copy loop is short enough that it can be implemented in-place and people can quickly tell what is going on by looking at it. Maybe the inner if statement could be a small helper function (append_copy_of_reg_note). Bernd Hi Bernd, Eric, How does the attached patch looks to you? x86_64 bootstrap & regression OK. I borrowed Bernd' code to write the tail pointer directly. 2016-10-21 Bernd Schmidt Jiong Wang gcc/ PR middle-end/78016 * emit-rtl.c (emit_copy_of_insn_after): Copy REG_NOTES in order instead of in reverse order. * sel-sched-ir.c (create_copy_of_insn_rtx): Likewise. diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index 2d6d1eb6c1311871f15dbed13d7c084ed3845a86..4d849ca6e64273bedc5bf8b9a62a5cc5d4606129 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -6168,17 +6168,31 @@ emit_copy_of_insn_after (rtx_insn *insn, rtx_insn *after) which may be duplicated by the basic block reordering code. */ RTX_FRAME_RELATED_P (new_rtx) = RTX_FRAME_RELATED_P (insn); + /* Locate the end of existing REG_NOTES in NEW_RTX. */ + rtx *ptail = ®_NOTES (new_rtx); + while (*ptail != NULL_RTX) +ptail = &XEXP (*ptail, 1); + /* Copy all REG_NOTES except REG_LABEL_OPERAND since mark_jump_label will make them. REG_LABEL_TARGETs are created there too, but are supposed to be sticky, so we copy them. */ for (link = REG_NOTES (insn); link; link = XEXP (link, 1)) if (REG_NOTE_KIND (link) != REG_LABEL_OPERAND) { - if (GET_CODE (link) == EXPR_LIST) - add_reg_note (new_rtx, REG_NOTE_KIND (link), - copy_insn_1 (XEXP (link, 0))); + rtx new_node; + + if (GET_CODE (link) == INT_LIST) + new_node = gen_rtx_INT_LIST ((machine_mode) REG_NOTE_KIND (link), + XINT (link, 0), NULL_RTX); else - add_shallow_copy_of_reg_note (new_rtx, link); + new_node = alloc_reg_note (REG_NOTE_KIND (link), + (GET_CODE (link) == EXPR_LIST + ? copy_insn_1 (XEXP (link, 0)) + : XEXP (link ,0)), + NULL_RTX); + + *ptail = new_node; + ptail = &XEXP (new_node, 1); } INSN_CODE (new_rtx) = INSN_CODE (insn); diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index 210b1e4edfb359a161cda4826704005ae9ab5a24..324ae8cf05209757a3a3f3dee97c9274876c7ed7 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -5761,6 +5761,11 @@ create_copy_of_insn_rtx (rtx insn_rtx) res = create_insn_rtx_from_pattern (copy_rtx (PATTERN (insn_rtx)), NULL_RTX); + /* Locate the end of existing REG_NOTES in RES. */ + rtx *ptail = ®_NOTES (res); + while (*ptail != NULL_RTX) +ptail = &XEXP (*ptail, 1); + /* Copy all REG_NOTES except REG_EQUAL/REG_EQUIV and REG_LABEL_OPERAND since mark_jump_label will make them. REG_LABEL_TARGETs are created there too, but are supposed to be sticky, so we copy them. */ @@ -5769,11 +5774,12 @@ create_copy_of_insn_rtx (rtx insn_rtx) && REG_NOTE_KIND (link) != REG_EQUAL && REG_NOTE_KIND (link) != REG_EQUIV) { - if (GET_CODE (link) == EXPR_LIST) - add_reg_note (res, REG_NOTE_KIND (link), - copy_insn_1 (XEXP (link, 0))); - else - add_reg_note (res, REG_NOTE_KIND (link), XEXP (link, 0)); + rtx new_node = alloc_reg_note (REG_NOTE_KIND (link), + (GET_CODE (link) == EXPR_LIST + ? copy_insn_1 (XEXP (link, 0)) + : XEXP (link ,0)), NULL_RTX); + *ptail = new_node; + ptail = &XEXP (new_node, 1); } return res;
Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy
On 21/10/16 08:43, Eric Botcazou wrote: That's also overcomplicated. Yes, I agree that's too heavy. rtx *ptail = ®_NOTES (to_insn); while (*ptail != NULL_RTX) ptail = &XEXP (*ptail, 1); Thanks very much Bernd, yes, this is better. And through manipulating pointer directly, those bidirectional new functions are unnecessary. gives you a pointer to the end which you can then use to append, unconditionally. As mentioned above, I think it would be simpler to keep this logic in the caller functions and avoid introducing append_insn_reg_notes. I disagree: there are currently n ways of copying NOTEs in the RTL middle-end, with different properties each time. We need only one primitive in rtlanal.c. That's my view, those duplicated code in emit-rtl.c and sel-sched-ir.c really can be shared and append all REG_NOTES from one insn to another seems qualify one primitive in rtlanal.c I will come up with a patch much lighter. Thanks.
[gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
Currently, GCC only support DW_OP_EXPRESSION in dwarf module, this patch extend the support to DW_OP_VAL_EXPRESSION which share the same code mostly the same code with DW_OP_EXPRESSION. Meanwhile the existed dwarf expression parser only allows expressions which can be represented using GCC RTL. If one operation doesn't have a correspondent GCC RTL operator, then there is no way to attach that information in reg-note. This patch extends the current dwarf expression support to unlimited number of operations by using PARALLEL, and unlimited type of operations by using UNSPEC. All DW_OP_* of the expression are grouped together inside the PARALLEL, and those operations which don't have RTL mapping are wrapped by UNSPEC. The parsing algorithm is simply something like: foreach elem inside PARALLEL if (UNSPEC) { dw_op_code = INTVAL (XVECEXP (elem, 0, 0)); oprnd1 = INTVAL (XVECEXP (elem, 0, 1)); oprnd2 = INTVAL (XVECEXP (elem, 0, 2)); } else call standard RTL parser. Any comments on the approach? Thanks. gcc/ 2016-10-20 Jiong Wang * reg-notes.def (CFA_VAL_EXPRESSION): New entry. * dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New function. (dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION. (output_cfa_loc): Support DW_CFA_val_expression. (output_cfa_loc_raw): Likewise. (output_cfi): Likewise. (output_cfi_directive): Likewise. * dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression. (dw_cfi_oprnd2_desc): Likewise. (mem_loc_descriptor): Recognize new pattern generated for value expression. diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 6491d5aaf4c4a21241cc718bfff1016f6d149951..b8c88fbae1df80a2664a414d8ae016a5343bf435 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set) reg_save (sregno, dregno, 0); } -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ static void dwarf2out_frame_debug_cfa_expression (rtx set) @@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set) update_row_reg_save (cur_row, regno, cfi); } +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION + note. */ + +static void +dwarf2out_frame_debug_cfa_val_expression (rtx set) +{ + rtx dest = SET_DEST (set); + gcc_assert (REG_P (dest)); + + rtx span = targetm.dwarf_register_span (dest); + gcc_assert (!span); + + rtx src = SET_SRC (set); + dw_cfi_ref cfi = new_cfi (); + cfi->dw_cfi_opc = DW_CFA_val_expression; + cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest); + cfi->dw_cfi_oprnd2.dw_cfi_loc += mem_loc_descriptor (src, GET_MODE (src), + GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED); + add_cfi (cfi); + update_row_reg_save (cur_row, dwf_regno (dest), cfi); +} + /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note. */ static void @@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn) break; case REG_CFA_EXPRESSION: + case REG_CFA_VAL_EXPRESSION: n = XEXP (note, 0); if (n == NULL) n = single_set (insn); - dwarf2out_frame_debug_cfa_expression (n); + + if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION) + dwarf2out_frame_debug_cfa_expression (n); + else + dwarf2out_frame_debug_cfa_val_expression (n); + handled_one = true; break; @@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression + || cfi->dw_cfi_opc == DW_CFA_val_expression) { unsigned r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh); @@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression + || cfi->dw_cfi_opc == DW_CFA_val_expression) { unsigned r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1); @@ -3188,6 +3219,7 @@ output_cfi (dw_cfi_ref cfi, dw_fde_ref fde, int for_eh) case DW_CFA_def_cfa_expression: case DW_CFA_expression: + case DW_CFA_val_expression: output_cfa_loc (cfi, for_eh); break; @@ -3302,16 +3334,13 @@ output_cfi_directive (FILE *f, dw_cfi_ref cfi) break; case DW_CFA_def_cfa_expression: - if (f != asm_out_file) - { - fprintf (f, "\t.cfi_def_cfa_expression ...\n"); - break; - } - /* FALLTHRU */ case DW_CFA_expression: +case DW_CFA_val_expression: if (f != asm_out_file) { - fprintf (f, "\t.cfi_cfa_expression ...\n"); + fprintf (f, "\t.cfi_%scfa_%sexpression ...\n", + cfi->dw_cfi_opc == DW_CFA_def_
Re: [Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE
2016-10-20 19:50 GMT+01:00 Jeff Law : > > On 10/20/2016 09:28 AM, Jiong Wang wrote: >> >> The current code suppose targetm.stack_protect_fail always generate >> something. >> But in case one target start to generate NULL_TREE, there will be ICE. >> This >> patch adds a simple sanity check to only call expand if it's not NULL_TREE. >> >> OK for trunk? >> >> gcc/ >> 2016-10-20 Jiong Wang >> >> * function.c (stack_protect_epilogue): Only expands >> targetm.stack_protect_fail if it's not NULL_TREE. > > Is there some reason we don't want to issue an error here and stop > compilation? I'm not at all comfortable silently ignoring failure to > generate stack protector code. > > jeff Hi Jeff, That's because I am doing some work where I will borrow stack-protector's analysis infrastructure but for those stack-protector standard rtl insn, they just need to be expanded into empty, for example stack_protect_set/test just need to be expanded into NOTE_INSN_DELETED. The same for targetm.stack_protect_fail () which I want to simply return NULL_TREE. but it's not an error. This do seems affect other targets (x86, rs6000) if NULL_TREE should never be returned for them. Currently I can see all of them use the either default_external_stack_protect_fail or default_hidden_stack_protect_fail, both of which are "return build_call_expr (..", so I should also assert the the return value of build_call_expr? Thanks.
[Patch] Don't expand targetm.stack_protect_fail if it's NULL_TREE
The current code suppose targetm.stack_protect_fail always generate something. But in case one target start to generate NULL_TREE, there will be ICE. This patch adds a simple sanity check to only call expand if it's not NULL_TREE. OK for trunk? gcc/ 2016-10-20 Jiong Wang * function.c (stack_protect_epilogue): Only expands targetm.stack_protect_fail if it's not NULL_TREE. diff --git a/gcc/function.c b/gcc/function.c index cdd2721cdf904be6457d090fe20345d3dee0b4dd..304c32ed2b1ace06139786680f30502d8483a8ed 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -5077,7 +5077,9 @@ stack_protect_epilogue (void) if (JUMP_P (tmp)) predict_insn_def (tmp, PRED_NORETURN, TAKEN); - expand_call (targetm.stack_protect_fail (), NULL_RTX, /*ignore=*/true); + tree fail_check = targetm.stack_protect_fail (); + if (fail_check != NULL_TREE) +expand_call (fail_check, NULL_RTX, /*ignore=*/true); free_temp_slots (); emit_label (label); }
[Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy
As discussed on PR middle-end/78016, here is the patch. This patch makes EXPR_LIST/INST_LIST/INT_LIST insertion bi-directional, the new node can be inserted to either the start or the end of the given list. The existed alloc_EXPR_LIST, alloc_INSN_LIST becomes wrapper of new bi-directional function, there is no functional change on them, callers of them are *not affected*. This patch then factor out those REG_NOTES copy code in emit-rtl.c and sel-sched-ir.c into a function append_insn_reg_notes in rtlanal.c, it use those new bi-directional interfaces to make sure the order of REG_NOTES are not changed during insn copy. Redundant code in emit-rtl.c and sel-sched-ir.c are deleted also. x86_64/aarch64 bootstrap OK. c/c++ regression OK. OK for trunk? gcc/ 2016-10-20 Jiong Wang PR middle-end/78016 * lists.c (alloc_INSN_LIST_bidirection): New function. The function body is cloned from alloc_INSN_LIST with minor changes to make it support bi-directional insertion. (alloc_EXPR_LIST_bidirection): Likewise. (alloc_INT_LIST_bidirection): New function. Alloc INT_LIST node, and support bi-directional insertion into given list. (alloc_INSN_LIST): Call alloc_INSN_LIST_bidirection. (alloc_EXPR_LIST): Call alloc_EXPR_LIST_bidirection. * rtl.h (append_insn_reg_notes): New declaration. (alloc_INSN_LIST_bidirection): New declaration. (alloc_EXPR_LIST_bidirection): New declaration. (alloc_INT_LIST_bidirection): New declaration. * rtlanal.c (alloc_reg_note_bidirection): New static function. Function body is cloned from alloc_reg_note with minor changes to make it support bi-directional insertion. (alloc_reg_note): Call alloc_reg_note_bidirection. (append_insn_reg_notes): New function. * emit-rtl.c (emit_copy_of_insn_after): Use append_insn_reg_notes. * sel-sched-ir.c (create_copy_of_insn_rtx): Likewise. diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index 2d6d1eb..87eb1e3 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -6125,7 +6125,6 @@ rtx_insn * emit_copy_of_insn_after (rtx_insn *insn, rtx_insn *after) { rtx_insn *new_rtx; - rtx link; switch (GET_CODE (insn)) { @@ -6171,15 +6170,7 @@ emit_copy_of_insn_after (rtx_insn *insn, rtx_insn *after) /* Copy all REG_NOTES except REG_LABEL_OPERAND since mark_jump_label will make them. REG_LABEL_TARGETs are created there too, but are supposed to be sticky, so we copy them. */ - for (link = REG_NOTES (insn); link; link = XEXP (link, 1)) -if (REG_NOTE_KIND (link) != REG_LABEL_OPERAND) - { - if (GET_CODE (link) == EXPR_LIST) - add_reg_note (new_rtx, REG_NOTE_KIND (link), - copy_insn_1 (XEXP (link, 0))); - else - add_shallow_copy_of_reg_note (new_rtx, link); - } + append_insn_reg_notes (new_rtx, insn, true, false); INSN_CODE (new_rtx) = INSN_CODE (insn); return new_rtx; diff --git a/gcc/lists.c b/gcc/lists.c index 96b4bc7..cd30b7c 100644 --- a/gcc/lists.c +++ b/gcc/lists.c @@ -98,11 +98,14 @@ remove_list_elem (rtx elem, rtx *listp) /* This call is used in place of a gen_rtx_INSN_LIST. If there is a cached node available, we'll use it, otherwise a call to gen_rtx_INSN_LIST - is made. */ + is made. The new node will be appended at the end of LIST if APPEND_P is + TRUE, otherwise list is appended to the new node. */ + rtx_insn_list * -alloc_INSN_LIST (rtx val, rtx next) +alloc_INSN_LIST_bidirection (rtx val, rtx list, bool append_p) { rtx_insn_list *r; + rtx next = append_p ? NULL_RTX : list; if (unused_insn_list) { @@ -117,16 +120,33 @@ alloc_INSN_LIST (rtx val, rtx next) else r = gen_rtx_INSN_LIST (VOIDmode, val, next); + if (append_p) +{ + gcc_assert (list != NULL_RTX); + XEXP (list, 1) = r; +} + return r; } +/* Allocate new INSN_LIST node for VAL, append NEXT to it. */ + +rtx_insn_list * +alloc_INSN_LIST (rtx val, rtx next) +{ + return alloc_INSN_LIST_bidirection (val, next, false); +} + /* This call is used in place of a gen_rtx_EXPR_LIST. If there is a cached node available, we'll use it, otherwise a call to gen_rtx_EXPR_LIST - is made. */ + is made. The new node will be appended at the end of LIST if APPEND_P is + TRUE, otherwise list is appended to the new node. */ + rtx_expr_list * -alloc_EXPR_LIST (int kind, rtx val, rtx next) +alloc_EXPR_LIST_bidirection (int kind, rtx val, rtx list, bool append_p) { rtx_expr_list *r; + rtx next = append_p ? NULL_RTX : list; if (unused_expr_list) { @@ -139,9 +159,23 @@ alloc_EXPR_LIST (int kind, rtx val, rtx next) else r = gen_rtx_EXPR_LIST ((machine_mode) kind, val, next); + if (append_p) +{ + gcc_assert (list != NULL_RTX); + XEXP (list, 1) = r; +} + return r; } +/* Allocate new EXPR_LIST node for KIND and VAL, append NEXT to it. */ + +rtx_expr_list * +alloc_EXPR_LIST
Re: [PATCH v2] aarch64: Add split-stack initial support
Hi Adhemerval, On 06/10/16 22:54, Adhemerval Zanella wrote: + bool split_stack_arg_pointer_used = split_stack_arg_pointer_used_p (); if (flag_stack_usage_info) current_function_static_stack_size = frame_size; @@ -3220,6 +3264,10 @@ aarch64_expand_prologue (void) aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, frame_size); } + /* Save split-stack argument pointer before stack adjustment. */ + if (split_stack_arg_pointer_used) +emit_move_insn (gen_rtx_REG (Pmode, R10_REGNUM), stack_pointer_rtx); + aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, -initial_adjust, true); if (callee_adjust != 0) @@ -3243,6 +3291,30 @@ aarch64_expand_prologue (void) callee_adjust != 0 || frame_pointer_needed); aarch64_add_constant (Pmode, SP_REGNUM, IP1_REGNUM, -final_adjust, !frame_pointer_needed); + + if (split_stack_arg_pointer_used_p ()) Redundant call? can use split_stack_arg_pointer_use" + + /* Always emit two insns to calculate the requested stack, so the linker + can edit them when adjusting size for calling non-split-stack code. */ + ninsn = aarch64_internal_mov_immediate (temp, GEN_INT (-frame_size), true, + Pmode); + gcc_assert (ninsn == 1 || ninsn == 2); + if (ninsn == 1) +emit_insn (gen_nop ()); If you expect the nop is kept together with the other I am still seeing nop scheduled away from the addtition. mov x10, -4144 add x10, sp, x10 nop + +#define BACKOFF0x2000 The BACKOFF value is 0x2000 here while in morestack-c.c it is 0x1000, is this deliberate? + + # Calculate requested stack size. + sub x12, sp, x10 + # Save parameters + stp x29, x30, [sp, -MORESTACK_FRAMESIZE]! + .cfi_def_cfa_offset MORESTACK_FRAMESIZE + .cfi_offset 29, -MORESTACK_FRAMESIZE + .cfi_offset 30, -MORESTACK_FRAMESIZE+8 + add x29, sp, 0 + .cfi_def_cfa_register 29 + # Adjust the requested stack size for the frame pointer save. + add x12, x12, 16 + stp x0, x1, [sp, 16] + stp x2, x3, [sp, 32] + add x12, x12, BACKOFF + stp x4, x5, [sp, 48] + stp x6, x7, [sp, 64] + stp x28, x12, [sp, 80] + + # Setup on x28 the function initial frame pointer. Its value will + # copied to function argument pointer. + add x28, sp, MORESTACK_FRAMESIZE + 16 + + # void __morestack_block_signals (void) + bl __morestack_block_signals + + # void *__generic_morestack (size_t *pframe_size, + #void *old_stack, + #size_t param_size) + # pframe_size: is the size of the required stack frame (the function + # amount of space remaining on the allocated stack). s/pframe_size: is the size/pframe_size: points at the size/ + + # Set up for a call to the target function. + ldr x30, [x28, STACKFRAME_BASE + 8] + ldp x0, x1, [x28, STACKFRAME_BASE + 16] + ldp x2, x3, [x28, STACKFRAME_BASE + 32] + ldp x4, x5, [x28, STACKFRAME_BASE + 48] + ldp x6, x7, [x28, STACKFRAME_BASE + 64] + add x9, x30, 8 + cmp x30, x9 We can remove this "cmp" by using "adds x9, x30, 8"? I am thinkg "adds" will set "c" bit in conditional flag to zero, then the bcs check in function prologue will fail, thus the argument pointer initialization will always be executed if the execution flow is from __morestack. bcs .L8 mov x10, x28 + blr x9 + + stp x0, x1, [x28, STACKFRAME_BASE + 16] + stp x2, x3, [x28, STACKFRAME_BASE + 32] + stp x4, x5, [x28, STACKFRAME_BASE + 48] + stp x6, x7, [x28, STACKFRAME_BASE + 64] +
Re: [AArch64][0/14] ARMv8.2-A FP16 extension support
On 27/09/16 17:03, Jiong Wang wrote: > > Now as ARM patches have gone in around r240427, I have done a quick confirmation > on the status of these four pending testsuite patches: > > https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00337.html > https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00338.html > https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00339.html > https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00340.html > > The result is they applies cleanly on gcc trunk, and there is no regression on > AArch64 native regression test. Testcases enabled without requirement of FP16 > all passed. > > I will give a final run on ARM native board and AArch64 emulation environment > with ARMv8.2-A FP16 enabled. (Have done this before, just in case something > changed during these days) > > OK for trunk if there is no regression? > > Thanks Finished the final tests on emulator with FP16 enabled. * No regression on AARCH64, all new testcases passed. * No regression on AARCH32, part of these new testcases UNRESOLVED because they should be skipped on AARCH32, fixed by the attached trivial patch which I will merge into the 4th patch (no affect on changelog). OK to commit these patches? diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c index f8c8c79..0bebec7 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */ +/* { dg-skip-if "" { arm*-*-* } } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c index 23c11a4..68ce599 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */ +/* { dg-skip-if "" { arm*-*-* } } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c index ae4c8b5..1b5a09b 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */ +/* { dg-skip-if "" { arm*-*-* } } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c index 56a6533..766c783 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */ +/* { dg-skip-if "" { arm*-*-* } } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c index fb54e96..8f5c14b 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */ +/* { dg-skip-if "" { arm*-*-* } } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c index 57c765c..ccfecf4 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */ +/* { dg-skip-if "" { arm*-*-* } } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c index f9a5bbe..161c7a0 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */ +/* { dg-skip-if &
Re: [AArch64][0/14] ARMv8.2-A FP16 extension support
On 25/07/16 12:26, James Greenhalgh wrote: On Thu, Jul 07, 2016 at 05:12:48PM +0100, Jiong Wang wrote: Hello, As a follow up of https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01240.html, This patch set adds ARMv8.2-A FP16 scalar and vector intrinsics support, gcc middle-end will also be aware of some standard operations that some instructions can be auto-generated. According to ACLE, ARMv8.2-A FP16 intrinsics for AArch64 is superset of intrinsics for AArch32, so all those intrinsic related testcases, particularly those under the directory advsimd-intrinsics, are also appliable to AArch64. This patch set has only included those testcases that are exclusive for AArch64. Jiong Wang (14) ARMv8.2-A FP16 data processing intrinsics ARMv8.2-A FP16 one operand vector intrinsics ARMv8.2-A FP16 two operands vector intrinsics ARMv8.2-A FP16 three operands vector intrinsics ARMv8.2-A FP16 lane vector intrinsics ARMv8.2-A FP16 reduction vector intrinsics ARMv8.2-A FP16 one operand scalar intrinsics ARMv8.2-A FP16 two operands scalar intrinsics ARMv8.2-A FP16 three operands scalar intrinsics ARMv8.2-A FP16 lane scalar intrinsics At this point, I've OKed the first 10 patches in the series, these represent the functional changes to the compiler. I'm leaving the testsuite patches for now, as they depend on testsuite changes that have yet to be approved for the ARM port. To save you from having to continue to rebase the functional parts of this patch while you wait for review of the ARM changes, I would be OK with you committing them now, on the understanding that you'll continue to check the testsuite in the time between now and the testsuite changes are approved, and that you'll fix any issues that you find. ARMv8.2-A FP16 testsuite selector ARMv8.2-A testsuite for new data movement intrinsics ARMv8.2-A testsuite for new vector intrinsics ARMv8.2-A testsuite for new scalar intrinsics I've taken a brief look through these testsuite changes and they look OK to me. I'll revisit them properly once I've seen the ARM patches go in. Now as ARM patches have gone in around r240427, I have done a quick confirmation on the status of these four pending testsuite patches: https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00337.html https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00338.html https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00339.html https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00340.html The result is they applies cleanly on gcc trunk, and there is no regression on AArch64 native regression test. Testcases enabled without requirement of FP16 all passed. I will give a final run on ARM native board and AArch64 emulation environment with ARMv8.2-A FP16 enabled. (Have done this before, just in case something changed during these days) OK for trunk if there is no regression? Thanks
[COMMITTED, aarch64] Delete one redundant word in target-supports.exp comment
This patch deletes one redundant word in target-supports.exp function comment for "check_effective_target_arm_v8_2a_fp16_scalar_hw". s/instructions floating point instructions/floating point instructions/ The comment is re-indented. No other changes. Committed as obivious as r240551. gcc/testsuite/ 2016-09-27 Jiong Wang * lib/target-supports.exp (check_effective_target_arm_v8_2a_fp16_scalar_hw): Delete redundant word in function comment. diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 3d11e28..50723de 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4015,9 +4015,8 @@ proc check_effective_target_arm_v8_1a_neon_hw { } { } [add_options_for_arm_v8_1a_neon ""]] } -# Return 1 if the target supports executing instructions floating point -# instructions from ARMv8.2 with the FP16 extension, 0 otherwise. The -# test is valid for ARM. +# Return 1 if the target supports executing floating point instructions from +# ARMv8.2 with the FP16 extension, 0 otherwise. The test is valid for ARM. proc check_effective_target_arm_v8_2a_fp16_scalar_hw { } { if { ![check_effective_target_arm_v8_2a_fp16_scalar_ok] } {
Re: [PATCH][AArch64 - v2] Simplify eh_return implementation
Wilco Dijkstra writes: > Ping > > I noticed it would still be a good idea to add an extra barrier in the epilog > as the > scheduler doesn't appear to handle aliases of frame accesses properly. > > This patch simplifies the handling of the EH return value. We force the use > of the > frame pointer so the return location is always at FP + 8. This means we can > emit > a simple volatile access in EH_RETURN_HANDLER_RTX without needing md > patterns, splitters and frame offset calculations. The new implementation > also > fixes various bugs in aarch64_final_eh_return_addr, which does not work with > -fomit-frame-pointer, alloca or outgoing arguments. The -fomit-frame-pointer is really broken on aarch64_find_eh_return_addr - return gen_frame_mem (DImode, - plus_constant (Pmode, - stack_pointer_rtx, - fp_offset - + cfun->machine->frame.saved_regs_size - - 2 * UNITS_PER_WORD)); the saved_regs_size includes both general and vector register saving area, while LR should be saved on top of general register area. Meanwhile saved_regs_size contains alignment amount. Given EH unwind code will invoke __builtin_unwind_init which pushes all callee-saved, both general and vector, the current function will always get wrong offset. I think the correct offset when -fomit-frame-pointer should be: "cfun->machine->frame.reg_offset[LR_REGNUM]" I have done a quick check on _Unwind_RaiseException which is the only code affected by this change. Without frame pointer, the exception handler's address is installed in different, thus wrong, stack slot. ... str x30, [sp, 112] ... str x19, [sp, 176] This approach used in this patch looks good to me. > 2016-08-10 Wilco Dijkstra > gcc/ > * config/aarch64/aarch64.md (eh_return): Remove pattern and splitter. > * config/aarch64/aarch64.h (AARCH64_EH_STACKADJ_REGNUM): Remove. > (EH_RETURN_HANDLER_RTX): New define. > * config/aarch64/aarch64.c (aarch64_frame_pointer_required): > Force frame pointer in EH return functions. > (aarch64_expand_epilogue): Add barrier for eh_return. > (aarch64_final_eh_return_addr): Remove. > (aarch64_eh_return_handler_rtx): New function. > * config/aarch64/aarch64-protos.h (aarch64_final_eh_return_addr): > Remove. > (aarch64_eh_return_handler_rtx): New prototype. -- Regards, Jiong
Re: [PATCH] aarch64: Add split-stack initial support
Adhemerval Zanella writes: > On 08/08/2016 07:58, Jiong Wang wrote: >> >> Adhemerval Zanella writes: >> > > Below it the last iteration patch, however I now seeing some similar issue > s390 hit when building libgo: > > ../../../gcc-git/libgo/go/syscall/socket_linux.go:90:1: error: flow control > insn inside a basic block > (jump_insn 90 89 91 14 (set (pc) > (if_then_else (geu (reg:CC 66 cc) > (const_int 0 [0])) > (label_ref 92) > (pc))) ../../../gcc-git/libgo/go/syscall/socket_linux.go:90 -1 > (nil) > -> 92) > ../../../gcc-git/libgo/go/syscall/socket_linux.go:90:1: internal compiler > error: in rtl_verify_bb_insns, at cfgrtl.c:2658 > 0xac35af _fatal_insn(char const*, rtx_def const*, char const*, int, char > const*) > > It shows only with -O2, which I think it due how the block is reorganized > internally and regarding the pseudo-return instruction inserted by > split-stack. > I am still debugging the issue and how to proper fix it, so if you have any > advice I open to suggestions. > ... > ... > +void > +aarch64_split_stack_space_check (rtx size, rtx label) > +{ > + rtx mem, ssvalue, compare, jump, temp; > + rtx requested = gen_reg_rtx (Pmode); > + /* Offset from thread pointer to __private_ss. */ > + int psso = 0x10; > + > + /* Load __private_ss from TCB. */ > + ssvalue = gen_rtx_REG (Pmode, R9_REGNUM); > + emit_insn (gen_aarch64_load_tp_hard (ssvalue)); > + mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso)); > + emit_move_insn (ssvalue, mem); > + > + /* And compare it with frame pointer plus required stack. */ > + if (CONST_INT_P (size)) > + emit_insn (gen_add3_insn (requested, stack_pointer_rtx, > +GEN_INT (-INTVAL (size; If the constant size doesn't fit into an add instruction, this statement will generates NULL, then the following comparision is wrong I guess. I am not sure if this is reason for the ICE you mentioned above. Meanwhile for the nop scheduling issue, I do see the following instruction sequences generated. "add" is scheduled before nop. mov x10, -4160 add x10, sp, x10 nop I currently don't have good idea on tie the "nop" with "mov", for TLS relaxation which require similar instruction tie, we are simply use single RTL pattern to output multiply instructions at final assembly output stage. > + else > +{ > + size = force_reg (Pmode, size); > + emit_move_insn (requested, gen_rtx_MINUS (Pmode, stack_pointer_rtx, > + size)); > +} > + > + /* Jump to __morestack call if current __private_ss is not suffice. */ > + compare = aarch64_gen_compare_reg (LT, requested, ssvalue); > + temp = gen_rtx_IF_THEN_ELSE (VOIDmode, > +gen_rtx_GEU (VOIDmode, compare, const0_rtx), > +gen_rtx_LABEL_REF (VOIDmode, label), > +pc_rtx); -- Regards, Jiong
Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
Jiong Wang writes: > Andrew Pinski writes: > >> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh >> wrote: >>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote: >>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang wrote: >>>> > >>>> > Andrew Pinski writes: >>>> > >>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang wrote: >>>> >>> >>>> >>> James Greenhalgh writes: >>>> >>> >>>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote: >>>> >>>>> Current IRA still use both target macros in a few places. >>>> >>>>> >>>> >>>>> Tell IRA to use the order we defined rather than with it's own cost >>>> >>>>> calculation. Allocate caller saved first, then callee saved. >>>> >>>>> >>>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is >>>> >>>>> pure caller saved when used in leaf function. >>>> >>>>> >>>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping >>>> >>>>> some >>>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the >>>> >>>>> number >>>> >>>>> is smaller. >>>> >>>>> >>>> >>>>> OK for trunk? >>>> >>>> >>>> >>>> OK, sorry for the delay. >>>> >>>> >>>> >>>> It might be mail client mangling, but please check that the trailing >>>> >>>> slashes >>>> >>>> line up in the version that gets committed. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> James >>>> >>>> >>>> >>>>> 2015-05-19 Jiong. Wang >>>> >>>>> >>>> >>>>> gcc/ >>>> >>>>> PR 63521 >>>> >>>>> * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define. >>>> >>>>> (HONOR_REG_ALLOC_ORDER): Define. >>>> >>> >>>> >>> Patch reverted. >>>> >> >>>> >> I did not see a reason why this patch was reverted. Maybe I am >>>> >> missing an email or something. >>>> > >>>> > There are several execution regressions under gcc testsuite, although as >>>> > far as I can see it's this patch exposed hidding bugs in those >>>> > testcases, but there might be one other issue, so to be conservative, I >>>> > temporarily reverted this patch. >>>> >>>> If you are talking about: >>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution >>>> Etc. >>>> >>>> These test cases are too dependent on the original register allocation >>>> order >>>> and really can be safely ignored. Really these three tests should be moved >>>> or >>>> written in a more sane way. >>> >>> Yup, completely agreed - but the testcases do throw up something >>> interesting. If we are allocating registers to hold 128-bit values, and >>> we pick x7 as highest preference, we implicitly allocate x8 along with it. >>> I think we probably see the same thing if the first thing we do in a >>> function is a structure copy through a back-end expanded movmem, which >>> will likely begin with a 128-bit LDP using x7, x8. >>> >>> If the argument for this patch is that we prefer to allocate x7-x0 first, >>> followed by x8, then we've potentially made a sub-optimal decision, our >>> allocation order for 128-bit values is x7,x8,x5,x6 etc. >>> >>> My hunch is that we *might* get better code generation in this corner case >>> out of some permutation of the allocation order for argument >>> registers. I'm thinking something along the lines of >>> >>> {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... } >>> >>> I asked Jiong to take a look at that, and I agree with his decision to >>> reduce the churn on trunk and just revert the patch until we've come to >>> a conclusion based on some evidence - rather than just my hunch! I agree >>&
Re: [PATCH] aarch64: Add split-stack initial support
Adhemerval Zanella writes: >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index e56398a..2cf239f 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -3227,6 +3227,34 @@ aarch64_expand_prologue (void) >>RTX_FRAME_RELATED_P (insn) = 1; >> } >> } >> + >> + if (flag_split_stack && offset) >> +{ >> + /* Setup the argument pointer (x10) for -fsplit-stack code. If >> + __morestack was called, it will left the arg pointer to the >> + old stack in x28. Otherwise, the argument pointer is the top >> + of current frame. */ >> + rtx x10 = gen_rtx_REG (Pmode, R10_REGNUM); >> + rtx x11 = gen_rtx_REG (Pmode, R11_REGNUM); >> + rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM); >> + rtx x29 = gen_rtx_REG (Pmode, R29_REGNUM); >> + rtx not_more = gen_label_rtx (); >> + rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM); >> + rtx jump; >> + >> + emit_move_insn (x11, GEN_INT (hard_fp_offset)); >> + emit_insn (gen_add3_insn (x10, x29, x11)); >> + jump = gen_rtx_IF_THEN_ELSE (VOIDmode, >> + gen_rtx_GEU (VOIDmode, cc_reg, >> +const0_rtx), >> + gen_rtx_LABEL_REF (VOIDmode, not_more), >> + pc_rtx); >> + jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump)); >> + JUMP_LABEL (jump) = not_more; >> + LABEL_NUSES (not_more) += 1; >> + emit_move_insn (x10, x28); >> + emit_label (not_more); >> +} >> } This part needs rebase, there are major changes in AArch64 prologue code recently. >> >> /* Return TRUE if we can use a simple_return insn. >> @@ -3303,6 +3331,7 @@ aarch64_expand_epilogue (bool for_sibcall) >>offset = offset - fp_offset; >> } >> >> + Unncessary new line. >>if (offset > 0) >> { >>unsigned reg1 = cfun->machine->frame.wb_candidate1; >> @@ -9648,7 +9677,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx >> nextarg ATTRIBUTE_UNUSED) >>/* Emit code to initialize STACK, which points to the next varargs stack >> argument. CUM->AAPCS_STACK_SIZE gives the number of stack words used >> by named arguments. STACK is 8-byte aligned. */ >> - t = make_tree (TREE_TYPE (stack), virtual_incoming_args_rtx); >> + t = make_tree (TREE_TYPE (stack), crtl->args.internal_arg_pointer); >>if (cum->aapcs_stack_size > 0) >> t = fold_build_pointer_plus_hwi (t, cum->aapcs_stack_size * >> UNITS_PER_WORD); >>t = build2 (MODIFY_EXPR, TREE_TYPE (stack), stack, t); >> @@ -14010,6 +14039,196 @@ aarch64_optab_supported_p (int op, machine_mode >> mode1, machine_mode, >> } >> } >> >> +/* -fsplit-stack support. */ >> + >> +/* A SYMBOL_REF for __morestack. */ >> +static GTY(()) rtx morestack_ref; >> + >> +/* Emit -fsplit-stack prologue, which goes before the regular function >> + prologue. */ >> +void >> +aarch64_expand_split_stack_prologue (void) >> +{ >> + HOST_WIDE_INT frame_size, args_size; >> + rtx_code_label *ok_label = NULL; >> + rtx mem, ssvalue, compare, jump, insn, call_fusage; >> + rtx reg11, reg30, temp; >> + rtx new_cfa, cfi_ops = NULL; >> + /* Offset from thread pointer to __private_ss. */ >> + int psso = 0x10; >> + int ninsn; >> + >> + gcc_assert (flag_split_stack && reload_completed); >> + >> + /* It limits total maximum stack allocation on 2G so its value can be >> + materialized with two instruction at most (movn/movk). It might be >> + used by the linker to add some extra space for split calling non split >> + stack functions. */ >> + frame_size = cfun->machine->frame.frame_size; >> + if (frame_size > ((HOST_WIDE_INT) 1 << 31)) >> +{ >> + sorry ("Stack frame larger than 2G is not supported for >> -fsplit-stack"); >> + return; >> +} >> + >> + if (morestack_ref == NULL_RTX) >> +{ >> + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); >> + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL >> + | SYMBOL_FLAG_FUNCTION); >> +} >> + >> + /* Load __private_ss from TCB. */ >> + ssvalue = gen_rtx_REG (Pmode, R9_REGNUM); >> + emit_insn (gen_aarch64_load_tp_hard (ssvalue)); >> + mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso)); >> + emit_move_insn (ssvalue, mem); >> + >> + temp = gen_rtx_REG (Pmode, R10_REGNUM); >> + >> + /* Always emit two insns to calculate the requested stack, so the linker >> + can edit them when adjusting size for calling non-split-stack code. */ >> + ninsn = aarch64_internal_mov_immediate (temp, GEN_INT (-frame_size), true, >> + Pmode); >> + gcc_assert (ninsn == 1 || ninsn == 2); >> + if (ninsn == 1) >> +emit_insn (gen_nop ()); there will be trouble to linker if the following add is scheduled before the nop? >> diff --git a/libgcc/config/aarch64/morestac
Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
Andrew Pinski writes: > On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh > wrote: >> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote: >>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang wrote: >>> > >>> > Andrew Pinski writes: >>> > >>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang wrote: >>> >>> >>> >>> James Greenhalgh writes: >>> >>> >>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote: >>> >>>>> Current IRA still use both target macros in a few places. >>> >>>>> >>> >>>>> Tell IRA to use the order we defined rather than with it's own cost >>> >>>>> calculation. Allocate caller saved first, then callee saved. >>> >>>>> >>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is >>> >>>>> pure caller saved when used in leaf function. >>> >>>>> >>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping some >>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the >>> >>>>> number >>> >>>>> is smaller. >>> >>>>> >>> >>>>> OK for trunk? >>> >>>> >>> >>>> OK, sorry for the delay. >>> >>>> >>> >>>> It might be mail client mangling, but please check that the trailing >>> >>>> slashes >>> >>>> line up in the version that gets committed. >>> >>>> >>> >>>> Thanks, >>> >>>> James >>> >>>> >>> >>>>> 2015-05-19 Jiong. Wang >>> >>>>> >>> >>>>> gcc/ >>> >>>>> PR 63521 >>> >>>>> * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define. >>> >>>>> (HONOR_REG_ALLOC_ORDER): Define. >>> >>> >>> >>> Patch reverted. >>> >> >>> >> I did not see a reason why this patch was reverted. Maybe I am >>> >> missing an email or something. >>> > >>> > There are several execution regressions under gcc testsuite, although as >>> > far as I can see it's this patch exposed hidding bugs in those >>> > testcases, but there might be one other issue, so to be conservative, I >>> > temporarily reverted this patch. >>> >>> If you are talking about: >>> gcc.target/aarch64/aapcs64/func-ret-2.c execution >>> Etc. >>> >>> These test cases are too dependent on the original register allocation order >>> and really can be safely ignored. Really these three tests should be moved >>> or >>> written in a more sane way. >> >> Yup, completely agreed - but the testcases do throw up something >> interesting. If we are allocating registers to hold 128-bit values, and >> we pick x7 as highest preference, we implicitly allocate x8 along with it. >> I think we probably see the same thing if the first thing we do in a >> function is a structure copy through a back-end expanded movmem, which >> will likely begin with a 128-bit LDP using x7, x8. >> >> If the argument for this patch is that we prefer to allocate x7-x0 first, >> followed by x8, then we've potentially made a sub-optimal decision, our >> allocation order for 128-bit values is x7,x8,x5,x6 etc. >> >> My hunch is that we *might* get better code generation in this corner case >> out of some permutation of the allocation order for argument >> registers. I'm thinking something along the lines of >> >> {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... } >> >> I asked Jiong to take a look at that, and I agree with his decision to >> reduce the churn on trunk and just revert the patch until we've come to >> a conclusion based on some evidence - rather than just my hunch! I agree >> that it would be harmless on trunk from a testing point of view, but I >> think Jiong is right to revert the patch until we better understand the >> code-generation implications. >> >> Of course, it might be that I am completely wrong! If you've already taken >> a look at using a register allocation order like the example I gave and >> have something to share, I'd be happy to read your advice! > > Any news on this patch? It has been a year since it was reverted for > a bad test that was failing. Hi Andrew, Yeah, those tests actually are expected to fail once register allocation order changed, it's clearly documented in the comments: gcc.target/aarch64/aapcs64/abitest-2.h: /* ... Note that for value that is returned in the caller-allocated memory block, we get the address from the saved x8 register. x8 is saved just after the callee is returned; we assume that x8 has not been clobbered at then, although there is no requirement for the callee preserve the value stored in x8. Luckily, all test cases here are simple enough that x8 doesn't normally get clobbered (although not guaranteed). */ I had a local fix which use the redundant value returned to x0 to repair the clobbered value in x8 as they will be identical for structure type return, however that trick can't play anymore as we recently defined TARGET_OMIT_STRUCT_RETURN_REG to true which will remove that redundant x8 to x0 copy. Anyway, I will come back with some benchmark results of this patch on top of latest trunk after the weekend run, and also answers to Jame's concerns. -- Regards, Jiong
Re: [5.0 Backport][AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm
Jiong Wang writes: > On 07/07/16 10:34, James Greenhalgh wrote: >> >> To make backporting easier, could you please write a very simple >> standalone test that exposes this bug, and submit this patch with just >> that simple test? I've already OKed the functional part of this patch, and >> I'm happy to pre-approve a simple testcase. >> >> With that committed to trunk, this needs to go to all active release >> branches please. > > Committed attached patch to trunk as r238166, fmax/fmin pattern were > introduced by [1] which is available since gcc 6, so backported to > gcc 6 branch as r238167. Here is the gcc 5 backport patch, it's slightly different from gcc 6 backport patch as fmin/fmax are not introduced yet. OK to backport? gcc/ 2016-07-29 Jiong Wang * config/aarch64/aarch64-simd-builtins.def (smax, smin): Don't register float variants. (fmax, fmin): New builtins for VDQF modes. * config/aarch64/arm_neon.h (vmaxnm_f32): Use __builtin_aarch64_fmaxv2sf. (vmaxnmq_f32): Likewise. (vmaxnmq_f64): Likewise. (vminnm_f32): Likewise. (vminnmq_f32): Likewise. (vminnmq_f64): Likewise. * config/aarch64/iterators.md (UNSPEC_FMAXNM, UNSPEC_FMINNM): New. (FMAXMIN_UNS): Support UNSPEC_FMAXNM and UNSPEC_FMINNM. (maxmin_uns, maxmin_uns_op): Likewise. gcc/testsuite/ 2016-07-29 Jiong Wang * gcc.target/aarch64/simd/vminmaxnm_1.c: New. -- Regards, Jiong diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index dd2bc47..446d826 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -240,15 +240,16 @@ BUILTIN_VDQF (UNOP, reduc_smax_nan_scal_, 10) BUILTIN_VDQF (UNOP, reduc_smin_nan_scal_, 10) - /* Implemented by 3. - smax variants map to fmaxnm, - smax_nan variants map to fmax. */ - BUILTIN_VDQIF (BINOP, smax, 3) - BUILTIN_VDQIF (BINOP, smin, 3) + /* Implemented by 3. */ + BUILTIN_VDQ_BHSI (BINOP, smax, 3) + BUILTIN_VDQ_BHSI (BINOP, smin, 3) BUILTIN_VDQ_BHSI (BINOP, umax, 3) BUILTIN_VDQ_BHSI (BINOP, umin, 3) + /* Implemented by 3. */ BUILTIN_VDQF (BINOP, smax_nan, 3) BUILTIN_VDQF (BINOP, smin_nan, 3) + BUILTIN_VDQF (BINOP, fmax, 3) + BUILTIN_VDQF (BINOP, fmin, 3) /* Implemented by aarch64_p. */ BUILTIN_VDQ_BHSI (BINOP, smaxp, 0) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 4c15312..283000e 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -17733,19 +17733,19 @@ vpminnms_f32 (float32x2_t a) __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vmaxnm_f32 (float32x2_t __a, float32x2_t __b) { - return __builtin_aarch64_smaxv2sf (__a, __b); + return __builtin_aarch64_fmaxv2sf (__a, __b); } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vmaxnmq_f32 (float32x4_t __a, float32x4_t __b) { - return __builtin_aarch64_smaxv4sf (__a, __b); + return __builtin_aarch64_fmaxv4sf (__a, __b); } __extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) vmaxnmq_f64 (float64x2_t __a, float64x2_t __b) { - return __builtin_aarch64_smaxv2df (__a, __b); + return __builtin_aarch64_fmaxv2df (__a, __b); } /* vmaxv */ @@ -17963,19 +17963,19 @@ vminq_u32 (uint32x4_t __a, uint32x4_t __b) __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vminnm_f32 (float32x2_t __a, float32x2_t __b) { - return __builtin_aarch64_sminv2sf (__a, __b); + return __builtin_aarch64_fminv2sf (__a, __b); } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vminnmq_f32 (float32x4_t __a, float32x4_t __b) { - return __builtin_aarch64_sminv4sf (__a, __b); + return __builtin_aarch64_fminv4sf (__a, __b); } __extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) vminnmq_f64 (float64x2_t __a, float64x2_t __b) { - return __builtin_aarch64_sminv2df (__a, __b); + return __builtin_aarch64_fminv2df (__a, __b); } /* vminv */ diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 2efbfab..c7e1d0c 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -186,9 +186,11 @@ UNSPEC_ASHIFT_UNSIGNED ; Used in aarch64-simd.md. UNSPEC_ABS ; Used in aarch64-simd.md. UNSPEC_FMAX ; Used in aarch64-simd.md. +UNSPEC_FMAXNM ; Used in aarch64-simd.md. UNSPEC_FMAXNMV ; Used in aarch64-simd.md. UNSPEC_FMAXV ; Used in aarch64-simd.md. UNSPEC_FMIN ; Used in aarch64-simd.md. +UNSPEC_FMINNM ; Used in aarch64-simd.md. UNSPEC_FMINNMV ; Used in aarch64-simd.md. UNSPEC_FMINV ; Used in aarch64-simd.md. UNSPEC_FADDV ; Used in aarch64-simd.md. @@ -876,7 +878,8 @@ (de
Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant
On 21/07/16 11:08, Richard Earnshaw (lists) wrote: On 20/07/16 16:02, Jiong Wang wrote: Richard, Thanks for the review, yes, I believe using aarch64_add_constant is unconditionally safe here. Because we have generated a stack tie to clobber the whole memory thus prevent any instruction which access stack be scheduled after that. The access to deallocated stack issue was there and fixed by https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html. aarch64_add_constant itself is generating the same instruction sequences as the original code, except for a few cases, it will prefer move scratch_reg, #imm add sp, sp, scratch_reg than: add sp, sp, #imm_part1 add sp, sp, #imm_part2 OK, I've had another look at this and I'm happy that we don't (currently) run into the problem I'm concerned about. However, this new usage does impose a constraint on aarch64_add_constant that will need to be respected in future, so please can you add the following to the comment that precedes that function: /* ... This function is sometimes used to adjust the stack pointer, so we must ensure that it can never cause transient stack deallocation by writing an invalid value into REGNUM. */ + bool frame_related_p = (regnum == SP_REGNUM); I think it would be better to make the frame-related decision be an explicit parameter passed to the routine (don't forget SP is not always the frame pointer). Then the new uses would pass 'true' and the existing uses 'false'. R. Thanks, attachment is the updated patch which: * Added above new comments for aarch64_add_constant. * One new parameter "frame_related_p" for aarch64_add_constant. I thought adding new gcc assertion for sanity check of frame_related_p and REGNUM, haven't done that as I found dwarf2cfi.c is doing that. OK for trunk? gcc/ 2016-07-25 Jiong Wang * config/aarch64/aarch64.c (aarch64_add_constant): New parameter "frame_related_p". Generate CFA annotation when it's necessary. (aarch64_expand_prologue): Use aarch64_add_constant. (aarch64_expand_epilogue): Likewise. (aarch64_output_mi_thunk): Pass "false" when calling aarch64_add_constant. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 41844a1..ca93f6e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1866,14 +1866,19 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) } /* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate value if - it is necessary. */ + it is necessary. + + This function is sometimes used to adjust the stack pointer, so we must + ensure that it can never cause transient stack deallocation by writing an + invalid value into REGNUM. */ static void aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, - HOST_WIDE_INT delta) + HOST_WIDE_INT delta, bool frame_related_p) { HOST_WIDE_INT mdelta = abs_hwi (delta); rtx this_rtx = gen_rtx_REG (mode, regnum); + rtx_insn *insn; /* Do nothing if mdelta is zero. */ if (!mdelta) @@ -1882,7 +1887,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, /* We only need single instruction if the offset fit into add/sub. */ if (aarch64_uimm12_shift (mdelta)) { - emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta))); + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta))); + RTX_FRAME_RELATED_P (insn) = frame_related_p; return; } @@ -1895,15 +1901,23 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, HOST_WIDE_INT low_off = mdelta & 0xfff; low_off = delta < 0 ? -low_off : low_off; - emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off))); - emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off))); + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off))); + RTX_FRAME_RELATED_P (insn) = frame_related_p; + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off))); + RTX_FRAME_RELATED_P (insn) = frame_related_p; return; } /* Otherwise use generic function to handle all other situations. */ rtx scratch_rtx = gen_rtx_REG (mode, scratchreg); aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode); - emit_insn (gen_add2_insn (this_rtx, scratch_rtx)); + insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx)); + if (frame_related_p) +{ + RTX_FRAME_RELATED_P (insn) = frame_related_p; + rtx adj = plus_constant (mode, this_rtx, delta); + add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj)); +} } static bool @@ -3038,36 +3052,7 @@ aarch64_expand_prologue (void) frame_size -= (offset + crtl->outgoing_args_size); fp_off
Re: [AArch64][8/14] ARMv8.2-A FP16 two operands scalar intrinsics
On 07/07/16 17:17, Jiong Wang wrote: This patch add ARMv8.2-A FP16 two operands scalar intrinsics. The updated patch resolve the conflict with https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html The change is to let aarch64_emit_approx_div return false for HFmode. gcc/ 2016-07-20 Jiong Wang * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64.md (hf3): New. (hf3): Likewise. (add3): Likewise. (sub3): Likewise. (mul3): Likewise. (div3): Likewise. (*div3): Likewise. (3): Extend to HF. * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return false for HFmode. * config/aarch64/aarch64-simd.md (aarch64_rsqrts): Likewise. (fabd3): Likewise. (3): Likewise. (3): Likewise. (aarch64_fmulx): Likewise. (aarch64_fac): Likewise. (aarch64_frecps): Likewise. (hfhi3): New. (hihf3): Likewise. * config/aarch64/iterators.md (VHSDF_SDF): Delete. (VSDQ_HSDI): Support HI. (fcvt_target, FCVT_TARGET): Likewise. * config/aarch64/arm_fp16.h: (vaddh_f16): New. (vsubh_f16): Likewise. (vabdh_f16): Likewise. (vcageh_f16): Likewise. (vcagth_f16): Likewise. (vcaleh_f16): Likewise. (vcalth_f16): Likewise.(vcleh_f16): Likewise. (vclth_f16): Likewise. (vcvth_n_f16_s16): Likewise. (vcvth_n_f16_s32): Likewise. (vcvth_n_f16_s64): Likewise. (vcvth_n_f16_u16): Likewise. (vcvth_n_f16_u32): Likewise. (vcvth_n_f16_u64): Likewise. (vcvth_n_s16_f16): Likewise. (vcvth_n_s32_f16): Likewise. (vcvth_n_s64_f16): Likewise. (vcvth_n_u16_f16): Likewise. (vcvth_n_u32_f16): Likewise. (vcvth_n_u64_f16): Likewise. (vdivh_f16): Likewise. (vmaxh_f16): Likewise. (vmaxnmh_f16): Likewise. (vminh_f16): Likewise. (vminnmh_f16): Likewise. (vmulh_f16): Likewise. (vmulxh_f16): Likewise. (vrecpsh_f16): Likewise. (vrsqrtsh_f16): Likewise. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 6f50d8405d3ee8c4823037bb2022a4f2f08b72fe..31abc077859254e3696adacb3f8f2b9b2da0647f 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -41,7 +41,7 @@ BUILTIN_VDC (COMBINE, combine, 0) BUILTIN_VB (BINOP, pmul, 0) - BUILTIN_VHSDF_SDF (BINOP, fmulx, 0) + BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0) BUILTIN_VHSDF_DF (UNOP, sqrt, 2) BUILTIN_VD_BHSI (BINOP, addp, 0) VAR1 (UNOP, addp, 0, di) @@ -393,13 +393,12 @@ /* Implemented by aarch64_frecp. */ BUILTIN_GPF_F16 (UNOP, frecpe, 0) - BUILTIN_GPF (BINOP, frecps, 0) BUILTIN_GPF_F16 (UNOP, frecpx, 0) BUILTIN_VDQ_SI (UNOP, urecpe, 0) BUILTIN_VHSDF (UNOP, frecpe, 0) - BUILTIN_VHSDF (BINOP, frecps, 0) + BUILTIN_VHSDF_HSDF (BINOP, frecps, 0) /* Implemented by a mixture of abs2 patterns. Note the DImode builtin is only ever used for the int64x1_t intrinsic, there is no scalar version. */ @@ -496,17 +495,23 @@ /* Implemented by <*><*>3. */ BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3) BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3) - BUILTIN_VHSDF_SDF (SHIFTIMM, fcvtzs, 3) - BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3) + BUILTIN_VHSDF_HSDF (SHIFTIMM, fcvtzs, 3) + BUILTIN_VHSDF_HSDF (SHIFTIMM_USS, fcvtzu, 3) + VAR1 (SHIFTIMM, scvtfsi, 3, hf) + VAR1 (SHIFTIMM, scvtfdi, 3, hf) + VAR1 (FCVTIMM_SUS, ucvtfsi, 3, hf) + VAR1 (FCVTIMM_SUS, ucvtfdi, 3, hf) + BUILTIN_GPI (SHIFTIMM, fcvtzshf, 3) + BUILTIN_GPI (SHIFTIMM_USS, fcvtzuhf, 3) /* Implemented by aarch64_rsqrte. */ BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0) /* Implemented by aarch64_rsqrts. */ - BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0) + BUILTIN_VHSDF_HSDF (BINOP, rsqrts, 0) /* Implemented by fabd3. */ - BUILTIN_VHSDF_SDF (BINOP, fabd, 3) + BUILTIN_VHSDF_HSDF (BINOP, fabd, 3) /* Implemented by aarch64_faddp. */ BUILTIN_VHSDF (BINOP, faddp, 0) @@ -522,10 +527,10 @@ BUILTIN_VHSDF_HSDF (UNOP, neg, 2) /* Implemented by aarch64_fac. */ - BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0) - BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0) - BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0) - BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, faclt, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, facle, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, facgt, 0) + BUILTIN_VHSDF_HSDF (BINOP_USS, facge, 0) /* Implemented by sqrt2. */ VAR1 (UNOP, sqrt, 2, hf) @@ -543,3 +548,7 @@ BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2) BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2) BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2) + + /* Implemented by 3. */ + VAR1 (BINOP, fmax, 3, hf) + VAR1 (BINOP, fmin, 3, hf) diff --git a/gcc/config/aarch64/aarch64-s
Re: [AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics
On 07/07/16 17:17, Jiong Wang wrote: This patch add ARMv8.2-A FP16 one operand scalar intrinsics Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h. The updated patch resolve the conflict with https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html The change is to let aarch64_emit_approx_sqrt return false for HFmode. gcc/ 2016-07-20 Jiong Wang * config.gcc (aarch64*-*-*): Install arm_fp16.h. * config/aarch64/aarch64-builtins.c (hi_UP): New. * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_frsqrte): Extend to HF mode. (aarch64_frecp): Likewise. (aarch64_cm): Likewise. * config/aarch64/aarch64.md (2): Likewise. (l2): Likewise. (fix_trunc2): Likewise. (sqrt2): Likewise. (*sqrt2): Likewise. (abs2): Likewise. (hf2): New pattern for HF mode. (hihf2): Likewise. * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return for HF mode. * config/aarch64/arm_neon.h: Include arm_fp16.h. * config/aarch64/iterators.md (GPF_F16): New. (GPI_F16): Likewise. (VHSDF_HSDF): Likewise. (w1): Support HF mode. (w2): Likewise. (v): Likewise. (s): Likewise. (q): Likewise. (Vmtype): Likewise. (V_cmp_result): Likewise. (fcvt_iesize): Likewise. (FCVT_IESIZE): Likewise. * config/aarch64/arm_fp16.h: New file. (vabsh_f16): New. (vceqzh_f16): Likewise. (vcgezh_f16): Likewise. (vcgtzh_f16): Likewise. (vclezh_f16): Likewise. (vcltzh_f16): Likewise. (vcvth_f16_s16): Likewise. (vcvth_f16_s32): Likewise. (vcvth_f16_s64): Likewise. (vcvth_f16_u16): Likewise. (vcvth_f16_u32): Likewise. (vcvth_f16_u64): Likewise. (vcvth_s16_f16): Likewise. (vcvth_s32_f16): Likewise. (vcvth_s64_f16): Likewise. (vcvth_u16_f16): Likewise. (vcvth_u32_f16): Likewise. (vcvth_u64_f16): Likewise. (vcvtah_s16_f16): Likewise. (vcvtah_s32_f16): Likewise. (vcvtah_s64_f16): Likewise. (vcvtah_u16_f16): Likewise. (vcvtah_u32_f16): Likewise. (vcvtah_u64_f16): Likewise. (vcvtmh_s16_f16): Likewise. (vcvtmh_s32_f16): Likewise. (vcvtmh_s64_f16): Likewise. (vcvtmh_u16_f16): Likewise. (vcvtmh_u32_f16): Likewise. (vcvtmh_u64_f16): Likewise. (vcvtnh_s16_f16): Likewise. (vcvtnh_s32_f16): Likewise. (vcvtnh_s64_f16): Likewise. (vcvtnh_u16_f16): Likewise. (vcvtnh_u32_f16): Likewise. (vcvtnh_u64_f16): Likewise. (vcvtph_s16_f16): Likewise. (vcvtph_s32_f16): Likewise. (vcvtph_s64_f16): Likewise. (vcvtph_u16_f16): Likewise. (vcvtph_u32_f16): Likewise. (vcvtph_u64_f16): Likewise. (vnegh_f16): Likewise. (vrecpeh_f16): Likewise. (vrecpxh_f16): Likewise. (vrndh_f16): Likewise. (vrndah_f16): Likewise. (vrndih_f16): Likewise. (vrndmh_f16): Likewise. (vrndnh_f16): Likewise. (vrndph_f16): Likewise. (vrndxh_f16): Likewise. (vrsqrteh_f16): Likewise. (vsqrth_f16): Likewise. diff --git a/gcc/config.gcc b/gcc/config.gcc index 1f75f17877334c2bb61cd16b69539ec7514db8ae..8827dc830d374c2512be5713d6dd143913f53c7d 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -300,7 +300,7 @@ m32c*-*-*) ;; aarch64*-*-*) cpu_type=aarch64 - extra_headers="arm_neon.h arm_acle.h" + extra_headers="arm_fp16.h arm_neon.h arm_acle.h" c_target_objs="aarch64-c.o" cxx_target_objs="aarch64-c.o" extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o" diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index af5fac5b29cf5373561d9bf9a69c401d2bec5cec..ca91d9108ead3eb83c21ee86d9e6ed44c8f4ad2d 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -62,6 +62,7 @@ #define si_UPSImode #define sf_UPSFmode #define hi_UPHImode +#define hf_UPHFmode #define qi_UPQImode #define UP(X) X##_UP diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 363e131327d6be04dd94e664ef839e46f26940e4..6f50d8405d3ee8c4823037bb2022a4f2f08b72fe 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -274,6 +274,14 @@ BUILTIN_VHSDF (UNOP, round, 2) BUILTIN_VHSDF_DF (UNOP, frintn, 2) + VAR1 (UNOP, btrunc, 2, hf) + VAR1 (UNOP, ceil, 2, hf) + VAR1 (UNOP, floor, 2, hf) + VAR1 (UNOP, frintn, 2, hf) + VAR1 (UNOP, nearbyint, 2, hf) + VAR1 (UNOP, rint, 2, hf) + VAR1 (UNOP, round, 2, hf) + /* Implemented by l2
Re: [AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics
On 07/07/16 17:15, Jiong Wang wrote: This patch add ARMv8.2-A FP16 two operands vector intrinsics. The updated patch resolve the conflict with https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html The change is to let aarch64_emit_approx_div return false for V4HFmode and V8HFmode. gcc/ 2016-07-20 Jiong Wang * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_rsqrts): Extend to HF modes. (fabd3): Likewise. (3): Likewise. (3): Likewise. (aarch64_p): Likewise. (3): Likewise. (3): Likewise. (3): Likewise. (aarch64_faddp): Likewise. (aarch64_fmulx): Likewise. (aarch64_frecps): Likewise. (*aarch64_fac): Rename to aarch64_fac. (add3): Extend to HF modes. (sub3): Likewise. (mul3): Likewise. (div3): Likewise. (*div3): Likewise. * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return false for V4HF and V8HF. * config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode iterator. * config/aarch64/arm_neon.h (vadd_f16): Likewise. (vaddq_f16): Likewise. (vabd_f16): Likewise. (vabdq_f16): Likewise. (vcage_f16): Likewise. (vcageq_f16): Likewise. (vcagt_f16): Likewise. (vcagtq_f16): Likewise. (vcale_f16): Likewise. (vcaleq_f16): Likewise. (vcalt_f16): Likewise. (vcaltq_f16): Likewise. (vceq_f16): Likewise. (vceqq_f16): Likewise. (vcge_f16): Likewise. (vcgeq_f16): Likewise. (vcgt_f16): Likewise. (vcgtq_f16): Likewise. (vcle_f16): Likewise. (vcleq_f16): Likewise. (vclt_f16): Likewise. (vcltq_f16): Likewise. (vcvt_n_f16_s16): Likewise. (vcvtq_n_f16_s16): Likewise. (vcvt_n_f16_u16): Likewise. (vcvtq_n_f16_u16): Likewise. (vcvt_n_s16_f16): Likewise. (vcvtq_n_s16_f16): Likewise. (vcvt_n_u16_f16): Likewise. (vcvtq_n_u16_f16): Likewise. (vdiv_f16): Likewise. (vdivq_f16): Likewise. (vdup_lane_f16): Likewise. (vdup_laneq_f16): Likewise. (vdupq_lane_f16): Likewise. (vdupq_laneq_f16): Likewise. (vdups_lane_f16): Likewise. (vdups_laneq_f16): Likewise. (vmax_f16): Likewise. (vmaxq_f16): Likewise. (vmaxnm_f16): Likewise. (vmaxnmq_f16): Likewise. (vmin_f16): Likewise. (vminq_f16): Likewise. (vminnm_f16): Likewise. (vminnmq_f16): Likewise. (vmul_f16): Likewise. (vmulq_f16): Likewise. (vmulx_f16): Likewise. (vmulxq_f16): Likewise. (vpadd_f16): Likewise. (vpaddq_f16): Likewise. (vpmax_f16): Likewise. (vpmaxq_f16): Likewise. (vpmaxnm_f16): Likewise. (vpmaxnmq_f16): Likewise. (vpmin_f16): Likewise. (vpminq_f16): Likewise. (vpminnm_f16): Likewise. (vpminnmq_f16): Likewise. (vrecps_f16): Likewise. (vrecpsq_f16): Likewise. (vrsqrts_f16): Likewise. (vrsqrtsq_f16): Likewise. (vsub_f16): Likewise. (vsubq_f16): Likewise. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 22c87be429ba1aac2bbe77f1119d16b6b8bd6e80..007dad60b6999158a1c9c1cf2a501a9f0712af54 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -41,7 +41,7 @@ BUILTIN_VDC (COMBINE, combine, 0) BUILTIN_VB (BINOP, pmul, 0) - BUILTIN_VALLF (BINOP, fmulx, 0) + BUILTIN_VHSDF_SDF (BINOP, fmulx, 0) BUILTIN_VHSDF_DF (UNOP, sqrt, 2) BUILTIN_VD_BHSI (BINOP, addp, 0) VAR1 (UNOP, addp, 0, di) @@ -248,22 +248,22 @@ BUILTIN_VDQ_BHSI (BINOP, smin, 3) BUILTIN_VDQ_BHSI (BINOP, umax, 3) BUILTIN_VDQ_BHSI (BINOP, umin, 3) - BUILTIN_VDQF (BINOP, smax_nan, 3) - BUILTIN_VDQF (BINOP, smin_nan, 3) + BUILTIN_VHSDF (BINOP, smax_nan, 3) + BUILTIN_VHSDF (BINOP, smin_nan, 3) /* Implemented by 3. */ - BUILTIN_VDQF (BINOP, fmax, 3) - BUILTIN_VDQF (BINOP, fmin, 3) + BUILTIN_VHSDF (BINOP, fmax, 3) + BUILTIN_VHSDF (BINOP, fmin, 3) /* Implemented by aarch64_p. */ BUILTIN_VDQ_BHSI (BINOP, smaxp, 0) BUILTIN_VDQ_BHSI (BINOP, sminp, 0) BUILTIN_VDQ_BHSI (BINOP, umaxp, 0) BUILTIN_VDQ_BHSI (BINOP, uminp, 0) - BUILTIN_VDQF (BINOP, smaxp, 0) - BUILTIN_VDQF (BINOP, sminp, 0) - BUILTIN_VDQF (BINOP, smax_nanp, 0) - BUILTIN_VDQF (BINOP, smin_nanp, 0) + BUILTIN_VHSDF (BINOP, smaxp, 0) + BUILTIN_VHSDF (BINOP, sminp, 0) + BUILTIN_VHSDF (BINOP, smax_nanp, 0) + BUILTIN_VHSDF (BINOP, smin_nanp, 0) /* Implemented by 2. */ BUILTIN_VHSDF (UNOP, btrunc, 2) @@ -383,7 +383,7 @@ BUILTIN_VDQ_SI (UNOP, urecpe, 0) BUILTIN_VHSDF (UNOP, frecpe, 0) - BUILTIN_VDQF (BINOP, frecps, 0
Re: [AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics
On 07/07/16 17:14, Jiong Wang wrote: This patch add ARMv8.2-A FP16 one operand vector intrinsics. We introduced new mode iterators to cover HF modes, qualified patterns which was using old mode iterators are switched to new ones. We can't simply extend old iterator like VDQF to conver HF modes, because not all patterns using VDQF are with new FP16 support, thus we introduced new, temperary iterators, and only apply new iterators on those patterns which do have FP16 supports. I noticed the patchset at https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00308.html has some modifications on the standard name "div" and "sqrt", thus there are minor conflicts as this patch touch "sqrt" as well. This patch resolve the conflict and the change is to let aarch64_emit_approx_sqrt simply return false for V4HFmode and V8HFmode. gcc/ 2016-07-20 Jiong Wang * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New. * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_rsqrte): Extend to HF modes. (neg2): Likewise. (abs2): Likewise. (2): Likewise. (l2): Likewise. (2): Likewise. (2): Likewise. (ftrunc2): Likewise. (2): Likewise. (sqrt2): Likewise. (*sqrt2): Likewise. (aarch64_frecpe): Likewise. (aarch64_cm): Likewise. * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Return false for V4HF and V8HF. * config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New. (VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to HF modes. (stype): New. * config/aarch64/arm_neon.h (vdup_n_f16): New. (vdupq_n_f16): Likewise. (vld1_dup_f16): Use vdup_n_f16. (vld1q_dup_f16): Use vdupq_n_f16. (vabs_f16): New. (vabsq_f16): Likewise. (vceqz_f16): Likewise. (vceqzq_f16): Likewise. (vcgez_f16): Likewise. (vcgezq_f16): Likewise. (vcgtz_f16): Likewise. (vcgtzq_f16): Likewise. (vclez_f16): Likewise. (vclezq_f16): Likewise. (vcltz_f16): Likewise. (vcltzq_f16): Likewise. (vcvt_f16_s16): Likewise. (vcvtq_f16_s16): Likewise. (vcvt_f16_u16): Likewise. (vcvtq_f16_u16): Likewise. (vcvt_s16_f16): Likewise. (vcvtq_s16_f16): Likewise. (vcvt_u16_f16): Likewise. (vcvtq_u16_f16): Likewise. (vcvta_s16_f16): Likewise. (vcvtaq_s16_f16): Likewise. (vcvta_u16_f16): Likewise. (vcvtaq_u16_f16): Likewise. (vcvtm_s16_f16): Likewise. (vcvtmq_s16_f16): Likewise. (vcvtm_u16_f16): Likewise. (vcvtmq_u16_f16): Likewise. (vcvtn_s16_f16): Likewise. (vcvtnq_s16_f16): Likewise. (vcvtn_u16_f16): Likewise. (vcvtnq_u16_f16): Likewise. (vcvtp_s16_f16): Likewise. (vcvtpq_s16_f16): Likewise. (vcvtp_u16_f16): Likewise. (vcvtpq_u16_f16): Likewise. (vneg_f16): Likewise. (vnegq_f16): Likewise. (vrecpe_f16): Likewise. (vrecpeq_f16): Likewise. (vrnd_f16): Likewise. (vrndq_f16): Likewise. (vrnda_f16): Likewise. (vrndaq_f16): Likewise. (vrndi_f16): Likewise. (vrndiq_f16): Likewise. (vrndm_f16): Likewise. (vrndmq_f16): Likewise. (vrndn_f16): Likewise. (vrndnq_f16): Likewise. (vrndp_f16): Likewise. (vrndpq_f16): Likewise. (vrndx_f16): Likewise. (vrndxq_f16): Likewise. (vrsqrte_f16): Likewise. (vrsqrteq_f16): Likewise. (vsqrt_f16): Likewise. (vsqrtq_f16): Likewise. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 6b90b2af5e9d2b5e7f48569ec1ebcb0ef16314ee..af5fac5b29cf5373561d9bf9a69c401d2bec5cec 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -139,6 +139,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_unsigned }; #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers) static enum aarch64_type_qualifiers +aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none, qualifier_none }; +#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers) +static enum aarch64_type_qualifiers aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_poly, qualifier_poly, qualifier_poly }; #define TYPES_BINOPP (aarch64_types_binopp_qualifiers) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index f1ad325f464f89c981cbdee8a8f6afafa938639a..22c87be429ba1aac2bbe77f1119d16b6b8bd6e80 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch
Re: [AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant
On 20/07/16 15:18, Richard Earnshaw (lists) wrote: On 20/07/16 14:03, Jiong Wang wrote: Those stack adjustment sequences inside aarch64_expand_prologue/epilogue are doing exactly what's aarch64_add_constant offered, except they also need to be aware of dwarf generation. This patch teach existed aarch64_add_constant about dwarf generation and currently SP register is supported. Whenever SP is updated, there should be CFA update, we then mark these instructions as frame related, and if the update is too complex for gcc to guess the adjustment, we attach explicit annotation. Both dwarf frame info size and pro/epilogue scheduling are improved after this patch as aarch64_add_constant has better utilization of scratch register. OK for trunk? gcc/ 2016-07-20 Jiong Wang * config/aarch64/aarch64.c (aarch64_add_constant): Mark instruction as frame related when it is. Generate CFA annotation when it's necessary. (aarch64_expand_prologue): Use aarch64_add_constant. (aarch64_expand_epilogue): Likewise. Are you sure using aarch64_add_constant is unconditionally safe? Stack adjustments need to be done very carefully to ensure that we never transiently deallocate part of the stack. Richard, Thanks for the review, yes, I believe using aarch64_add_constant is unconditionally safe here. Because we have generated a stack tie to clobber the whole memory thus prevent any instruction which access stack be scheduled after that. The access to deallocated stack issue was there and fixed by https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html. aarch64_add_constant itself is generating the same instruction sequences as the original code, except for a few cases, it will prefer move scratch_reg, #imm add sp, sp, scratch_reg than: add sp, sp, #imm_part1 add sp, sp, #imm_part2
[AArch64][3/3] Migrate aarch64_expand_prologue/epilogue to aarch64_add_constant
Those stack adjustment sequences inside aarch64_expand_prologue/epilogue are doing exactly what's aarch64_add_constant offered, except they also need to be aware of dwarf generation. This patch teach existed aarch64_add_constant about dwarf generation and currently SP register is supported. Whenever SP is updated, there should be CFA update, we then mark these instructions as frame related, and if the update is too complex for gcc to guess the adjustment, we attach explicit annotation. Both dwarf frame info size and pro/epilogue scheduling are improved after this patch as aarch64_add_constant has better utilization of scratch register. OK for trunk? gcc/ 2016-07-20 Jiong Wang * config/aarch64/aarch64.c (aarch64_add_constant): Mark instruction as frame related when it is. Generate CFA annotation when it's necessary. (aarch64_expand_prologue): Use aarch64_add_constant. (aarch64_expand_epilogue): Likewise. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 41844a101247c939ecb31f8a8c17cf79759255aa..b38f3f1e8f85a5f3191d0c96080327dac7b2eaed 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1874,6 +1874,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, { HOST_WIDE_INT mdelta = abs_hwi (delta); rtx this_rtx = gen_rtx_REG (mode, regnum); + bool frame_related_p = (regnum == SP_REGNUM); + rtx_insn *insn; /* Do nothing if mdelta is zero. */ if (!mdelta) @@ -1882,7 +1884,8 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, /* We only need single instruction if the offset fit into add/sub. */ if (aarch64_uimm12_shift (mdelta)) { - emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta))); + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta))); + RTX_FRAME_RELATED_P (insn) = frame_related_p; return; } @@ -1895,15 +1898,23 @@ aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, HOST_WIDE_INT low_off = mdelta & 0xfff; low_off = delta < 0 ? -low_off : low_off; - emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off))); - emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off))); + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off))); + RTX_FRAME_RELATED_P (insn) = frame_related_p; + insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off))); + RTX_FRAME_RELATED_P (insn) = frame_related_p; return; } /* Otherwise use generic function to handle all other situations. */ rtx scratch_rtx = gen_rtx_REG (mode, scratchreg); aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode); - emit_insn (gen_add2_insn (this_rtx, scratch_rtx)); + insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx)); + if (frame_related_p) +{ + RTX_FRAME_RELATED_P (insn) = frame_related_p; + rtx adj = plus_constant (mode, this_rtx, delta); + add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj)); +} } static bool @@ -3038,36 +3049,7 @@ aarch64_expand_prologue (void) frame_size -= (offset + crtl->outgoing_args_size); fp_offset = 0; - if (frame_size >= 0x100) - { - rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM); - emit_move_insn (op0, GEN_INT (-frame_size)); - insn = emit_insn (gen_add2_insn (stack_pointer_rtx, op0)); - - add_reg_note (insn, REG_CFA_ADJUST_CFA, - gen_rtx_SET (stack_pointer_rtx, - plus_constant (Pmode, stack_pointer_rtx, - -frame_size))); - RTX_FRAME_RELATED_P (insn) = 1; - } - else if (frame_size > 0) - { - int hi_ofs = frame_size & 0xfff000; - int lo_ofs = frame_size & 0x000fff; - - if (hi_ofs) - { - insn = emit_insn (gen_add2_insn -(stack_pointer_rtx, GEN_INT (-hi_ofs))); - RTX_FRAME_RELATED_P (insn) = 1; - } - if (lo_ofs) - { - insn = emit_insn (gen_add2_insn -(stack_pointer_rtx, GEN_INT (-lo_ofs))); - RTX_FRAME_RELATED_P (insn) = 1; - } - } + aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, -frame_size); } else frame_size = -1; @@ -3287,31 +3269,7 @@ aarch64_expand_epilogue (bool for_sibcall) if (need_barrier_p) emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx)); - if (frame_size >= 0x100) - { - rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM); - emit_move_insn (op0, GEN_INT (frame_size)); - insn = emit_insn (gen_add2_insn (stack_pointer_rtx, op0)); - } - else - { - int hi_ofs = frame_size & 0xfff000; - int lo_ofs = frame_size & 0x000fff; - - if (hi_ofs && lo_ofs) - { - insn = emit_insn (gen_add2_insn -(stack_pointer_rtx, GEN_INT (hi_ofs))); - RTX_FRAME_RELATED_P (insn) = 1; - frame_size = lo_ofs; - } - insn = emit_insn (gen_add2_insn - (stack_pointer_rtx,
[AArch64][2/3] Optimize aarch64_add_constant to generate better addition sequences
This patch optimize immediate addition sequences generated by aarch64_add_constant. The current addition sequences generated are: * If immediate fit into unsigned 12bit range, generate single add/sub. * Otherwise if it fit into unsigned 24bit range, generate two add/sub. * Otherwise invoke general constant build function. This haven't considered the situation where immedate can't fit into unsigned 12bit range, but can fit into single mov instruction for which case we generate one move and one addition. The move won't touch the destination register thus the sequences is better than two additions which both touch the destination register. This patch thus optimize the addition sequences into: * If immediate fit into unsigned 12bit range, generate single add/sub. * Otherwise if it fit into unsigned 24bit range, generate two add/sub. And don't do this if it fit into single move instruction, in which case move the immedaite to scratch register firstly, then generate one addition to add the scratch register to the destination register. * Otherwise invoke general constant build function. OK for trunk? gcc/ 2016-07-20 Jiong Wang * config/aarch64/aarch64.c (aarch64_add_constant): Optimize instruction sequences. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index aeea3b3ebc514663043ac8d7cd13361f06f78502..41844a101247c939ecb31f8a8c17cf79759255aa 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1865,6 +1865,47 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) aarch64_internal_mov_immediate (dest, imm, true, GET_MODE (dest)); } +/* Add DELTA onto REGNUM in MODE, using SCRATCHREG to held intermediate value if + it is necessary. */ + +static void +aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, + HOST_WIDE_INT delta) +{ + HOST_WIDE_INT mdelta = abs_hwi (delta); + rtx this_rtx = gen_rtx_REG (mode, regnum); + + /* Do nothing if mdelta is zero. */ + if (!mdelta) +return; + + /* We only need single instruction if the offset fit into add/sub. */ + if (aarch64_uimm12_shift (mdelta)) +{ + emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta))); + return; +} + + /* We need two add/sub instructions, each one perform part of the + addition/subtraction, but don't this if the addend can be loaded into + register by single instruction, in that case we prefer a move to scratch + register following by addition. */ + if (mdelta < 0x100 && !aarch64_move_imm (delta, mode)) +{ + HOST_WIDE_INT low_off = mdelta & 0xfff; + + low_off = delta < 0 ? -low_off : low_off; + emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off))); + emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off))); + return; +} + + /* Otherwise use generic function to handle all other situations. */ + rtx scratch_rtx = gen_rtx_REG (mode, scratchreg); + aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode); + emit_insn (gen_add2_insn (this_rtx, scratch_rtx)); +} + static bool aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, tree exp ATTRIBUTE_UNUSED) @@ -3337,44 +3378,6 @@ aarch64_final_eh_return_addr (void) - 2 * UNITS_PER_WORD)); } -static void -aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, - HOST_WIDE_INT delta) -{ - HOST_WIDE_INT mdelta = delta; - rtx this_rtx = gen_rtx_REG (mode, regnum); - rtx scratch_rtx = gen_rtx_REG (mode, scratchreg); - - if (mdelta < 0) -mdelta = -mdelta; - - if (mdelta >= 4096 * 4096) -{ - aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode); - emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx)); -} - else if (mdelta > 0) -{ - if (mdelta >= 4096) - { - emit_insn (gen_rtx_SET (scratch_rtx, GEN_INT (mdelta / 4096))); - rtx shift = gen_rtx_ASHIFT (mode, scratch_rtx, GEN_INT (12)); - if (delta < 0) - emit_insn (gen_rtx_SET (this_rtx, -gen_rtx_MINUS (mode, this_rtx, shift))); - else - emit_insn (gen_rtx_SET (this_rtx, -gen_rtx_PLUS (mode, this_rtx, shift))); - } - if (mdelta % 4096 != 0) - { - scratch_rtx = GEN_INT ((delta < 0 ? -1 : 1) * (mdelta % 4096)); - emit_insn (gen_rtx_SET (this_rtx, - gen_rtx_PLUS (mode, this_rtx, scratch_rtx))); - } -} -} - /* Output code to add DELTA to the first argument, and then jump to FUNCTION. Used for C++ multiple inheritance. */ static void
[AArch64][1/3] Migrate aarch64_add_constant to new interface & kill aarch64_build_constant
Currently aarch64_add_constant is using aarch64_build_constant to move an immediate into the destination register. It has considered the following situations: * immediate can fit into bitmask pattern that only needs single instruction. * immediate can fit into single movz/movn. * immediate needs single movz/movn, and multiply movk. Actually we have another constant building helper function "aarch64_internal_mov_immediate" which cover all these situations and more. This patch thus migrate aarch64_add_constant to aarch64_internal_mov_immediate so that we can kill the old aarch64_build_constant. OK for trunk? gcc/ 2016-07-20 Jiong Wang * config/aarch64/aarch64.c (aarch64_add_constant): New parameter "mode". Use aarch64_internal_mov_immediate instead of aarch64_build_constant. (aarch64_build_constant): Delete. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 512ef10d158d2eaa1384d28c43b9a8f90387099d..aeea3b3ebc514663043ac8d7cd13361f06f78502 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3337,98 +3337,20 @@ aarch64_final_eh_return_addr (void) - 2 * UNITS_PER_WORD)); } -/* Possibly output code to build up a constant in a register. For - the benefit of the costs infrastructure, returns the number of - instructions which would be emitted. GENERATE inhibits or - enables code generation. */ - -static int -aarch64_build_constant (int regnum, HOST_WIDE_INT val, bool generate) -{ - int insns = 0; - - if (aarch64_bitmask_imm (val, DImode)) -{ - if (generate) - emit_move_insn (gen_rtx_REG (Pmode, regnum), GEN_INT (val)); - insns = 1; -} - else -{ - int i; - int ncount = 0; - int zcount = 0; - HOST_WIDE_INT valp = val >> 16; - HOST_WIDE_INT valm; - HOST_WIDE_INT tval; - - for (i = 16; i < 64; i += 16) - { - valm = (valp & 0x); - - if (valm != 0) - ++ zcount; - - if (valm != 0x) - ++ ncount; - - valp >>= 16; - } - - /* zcount contains the number of additional MOVK instructions - required if the constant is built up with an initial MOVZ instruction, - while ncount is the number of MOVK instructions required if starting - with a MOVN instruction. Choose the sequence that yields the fewest - number of instructions, preferring MOVZ instructions when they are both - the same. */ - if (ncount < zcount) - { - if (generate) - emit_move_insn (gen_rtx_REG (Pmode, regnum), - GEN_INT (val | ~(HOST_WIDE_INT) 0x)); - tval = 0x; - insns++; - } - else - { - if (generate) - emit_move_insn (gen_rtx_REG (Pmode, regnum), - GEN_INT (val & 0x)); - tval = 0; - insns++; - } - - val >>= 16; - - for (i = 16; i < 64; i += 16) - { - if ((val & 0x) != tval) - { - if (generate) - emit_insn (gen_insv_immdi (gen_rtx_REG (Pmode, regnum), - GEN_INT (i), - GEN_INT (val & 0x))); - insns++; - } - val >>= 16; - } -} - return insns; -} - static void -aarch64_add_constant (int regnum, int scratchreg, HOST_WIDE_INT delta) +aarch64_add_constant (machine_mode mode, int regnum, int scratchreg, + HOST_WIDE_INT delta) { HOST_WIDE_INT mdelta = delta; - rtx this_rtx = gen_rtx_REG (Pmode, regnum); - rtx scratch_rtx = gen_rtx_REG (Pmode, scratchreg); + rtx this_rtx = gen_rtx_REG (mode, regnum); + rtx scratch_rtx = gen_rtx_REG (mode, scratchreg); if (mdelta < 0) mdelta = -mdelta; if (mdelta >= 4096 * 4096) { - (void) aarch64_build_constant (scratchreg, delta, true); + aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode); emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx)); } else if (mdelta > 0) @@ -3436,19 +3358,19 @@ aarch64_add_constant (int regnum, int scratchreg, HOST_WIDE_INT delta) if (mdelta >= 4096) { emit_insn (gen_rtx_SET (scratch_rtx, GEN_INT (mdelta / 4096))); - rtx shift = gen_rtx_ASHIFT (Pmode, scratch_rtx, GEN_INT (12)); + rtx shift = gen_rtx_ASHIFT (mode, scratch_rtx, GEN_INT (12)); if (delta < 0) emit_insn (gen_rtx_SET (this_rtx, -gen_rtx_MINUS (Pmode, this_rtx, shift))); +gen_rtx_MINUS (mode, this_rtx, shift))); else emit_insn (gen_rtx_SET (this_rtx, -gen_rtx_PLUS (Pmode, this_rtx, shift))); +gen_rtx_PLUS (mode, this_rtx, shift))); } if (mdelta % 4096 != 0) { scratch_rtx = GEN_INT ((delta < 0 ? -1 : 1) * (mdelta % 4096)); emit_insn (gen_rtx_SET (this_rtx, - gen_rtx_PLUS (Pmode, this_rtx, scratch_rtx))); + gen_rtx_PLUS (mode, this_rtx, scratch_rtx))); } } } @@ -3473,7 +3395,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED, emit_note (NOTE_INSN_PROLOGUE_END); if (vcall_offset ==
[COMMITTED][AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm
On 07/07/16 10:34, James Greenhalgh wrote: To make backporting easier, could you please write a very simple standalone test that exposes this bug, and submit this patch with just that simple test? I've already OKed the functional part of this patch, and I'm happy to pre-approve a simple testcase. With that committed to trunk, this needs to go to all active release branches please. Committed attached patch to trunk as r238166, fmax/fmin pattern were introduced by [1] which is available since gcc 6, so backported to gcc 6 branch as r238167. -- [1] https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02654.html diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 3e4740c..f1ad325 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -244,13 +244,17 @@ /* Implemented by 3. smax variants map to fmaxnm, smax_nan variants map to fmax. */ - BUILTIN_VDQIF (BINOP, smax, 3) - BUILTIN_VDQIF (BINOP, smin, 3) + BUILTIN_VDQ_BHSI (BINOP, smax, 3) + BUILTIN_VDQ_BHSI (BINOP, smin, 3) BUILTIN_VDQ_BHSI (BINOP, umax, 3) BUILTIN_VDQ_BHSI (BINOP, umin, 3) BUILTIN_VDQF (BINOP, smax_nan, 3) BUILTIN_VDQF (BINOP, smin_nan, 3) + /* Implemented by 3. */ + BUILTIN_VDQF (BINOP, fmax, 3) + BUILTIN_VDQF (BINOP, fmin, 3) + /* Implemented by aarch64_p. */ BUILTIN_VDQ_BHSI (BINOP, smaxp, 0) BUILTIN_VDQ_BHSI (BINOP, sminp, 0) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index ed24b59..b0ab1d3 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -17588,19 +17588,19 @@ vpminnms_f32 (float32x2_t a) __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vmaxnm_f32 (float32x2_t __a, float32x2_t __b) { - return __builtin_aarch64_smaxv2sf (__a, __b); + return __builtin_aarch64_fmaxv2sf (__a, __b); } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vmaxnmq_f32 (float32x4_t __a, float32x4_t __b) { - return __builtin_aarch64_smaxv4sf (__a, __b); + return __builtin_aarch64_fmaxv4sf (__a, __b); } __extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) vmaxnmq_f64 (float64x2_t __a, float64x2_t __b) { - return __builtin_aarch64_smaxv2df (__a, __b); + return __builtin_aarch64_fmaxv2df (__a, __b); } /* vmaxv */ @@ -17818,19 +17818,19 @@ vminq_u32 (uint32x4_t __a, uint32x4_t __b) __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vminnm_f32 (float32x2_t __a, float32x2_t __b) { - return __builtin_aarch64_sminv2sf (__a, __b); + return __builtin_aarch64_fminv2sf (__a, __b); } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vminnmq_f32 (float32x4_t __a, float32x4_t __b) { - return __builtin_aarch64_sminv4sf (__a, __b); + return __builtin_aarch64_fminv4sf (__a, __b); } __extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) vminnmq_f64 (float64x2_t __a, float64x2_t __b) { - return __builtin_aarch64_sminv2df (__a, __b); + return __builtin_aarch64_fminv2df (__a, __b); } /* vminv */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c new file mode 100644 index 000..8333f03 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c @@ -0,0 +1,82 @@ +/* Test the `v[min|max]nm{q}_f*' AArch64 SIMD intrinsic. */ + +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include "arm_neon.h" + +extern void abort (); + +#define CHECK(T, N, R, E) \ + {\ +int i = 0;\ +for (; i < N; i++)\ + if (* (T *) &R[i] != * (T *) &E[i])\ + abort ();\ + } + +int +main (int argc, char **argv) +{ + float32x2_t f32x2_input1 = vdup_n_f32 (-1.0); + float32x2_t f32x2_input2 = vdup_n_f32 (0.0); + float32x2_t f32x2_exp_minnm = vdup_n_f32 (-1.0); + float32x2_t f32x2_exp_maxnm = vdup_n_f32 (0.0); + float32x2_t f32x2_ret_minnm = vminnm_f32 (f32x2_input1, f32x2_input2); + float32x2_t f32x2_ret_maxnm = vmaxnm_f32 (f32x2_input1, f32x2_input2); + + CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm); + CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm); + + f32x2_input1 = vdup_n_f32 (__builtin_nanf ("")); + f32x2_input2 = vdup_n_f32 (1.0); + f32x2_exp_minnm = vdup_n_f32 (1.0); + f32x2_exp_maxnm = vdup_n_f32 (1.0); + f32x2_ret_minnm = vminnm_f32 (f32x2_input1, f32x2_input2); + f32x2_ret_maxnm = vmaxnm_f32 (f32x2_input1, f32x2_input2); + + CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm); + CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm); + + float32x4_t f32x4_input1 = vdupq_n_f32 (-1024.0); + float32x4_t f32x4_input2 = vdupq_n_f32 (77.0); + float32x4_t f32x4_exp_minnm = vdupq_n_f32 (-1024.0); + float32x4_t f32x4_exp_maxnm = vdupq_n_f32 (77.0); + float32x4_t f32x4_ret_minnm = vminnmq_f32 (f32x4_input1, f32x4_input2); + float32x4_t f32x4_ret_ma
[AArch64][4/14] ARMv8.2-A FP16 three operands vector intrinsics
This patch add ARMv8.2-A FP16 three operands vector intrinsics. Three operands intrinsics only contain fma and fms. 2016-07-07 Jiong Wang gcc/ * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (fma4): Extend to HF modes. (fnma4): Likewise. * config/aarch64/arm_neon.h (vfma_f16): New. (vfmaq_f16): Likewise. (vfms_f16): Likewise. (vfmsq_f16): Likewise. >From dc2121d586b759b864d9653e188a14d1f7296f25 Mon Sep 17 00:00:00 2001 From: Jiong Wang Date: Wed, 8 Jun 2016 10:21:25 +0100 Subject: [PATCH 04/14] [4/14] ARMv8.2 FP16 three operands vector intrinsics --- gcc/config/aarch64/aarch64-simd-builtins.def | 4 +++- gcc/config/aarch64/aarch64-simd.md | 28 ++-- gcc/config/aarch64/arm_neon.h| 26 ++ 3 files changed, 43 insertions(+), 15 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index fe17298..6ff5063 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -405,7 +405,9 @@ BUILTIN_VALL_F16 (STORE1, st1, 0) /* Implemented by fma4. */ - BUILTIN_VDQF (TERNOP, fma, 4) + BUILTIN_VHSDF (TERNOP, fma, 4) + /* Implemented by fnma4. */ + BUILTIN_VHSDF (TERNOP, fnma, 4) /* Implemented by aarch64_simd_bsl. */ BUILTIN_VDQQH (BSL_P, simd_bsl, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 0a80adb..576ad3c 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1526,13 +1526,13 @@ ) (define_insn "fma4" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (fma:VDQF (match_operand:VDQF 1 "register_operand" "w") -(match_operand:VDQF 2 "register_operand" "w") -(match_operand:VDQF 3 "register_operand" "0")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (fma:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w") + (match_operand:VHSDF 3 "register_operand" "0")))] "TARGET_SIMD" "fmla\\t%0., %1., %2." - [(set_attr "type" "neon_fp_mla_")] + [(set_attr "type" "neon_fp_mla_")] ) (define_insn "*aarch64_fma4_elt" @@ -1599,15 +1599,15 @@ ) (define_insn "fnma4" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (fma:VDQF - (match_operand:VDQF 1 "register_operand" "w") - (neg:VDQF - (match_operand:VDQF 2 "register_operand" "w")) - (match_operand:VDQF 3 "register_operand" "0")))] - "TARGET_SIMD" - "fmls\\t%0., %1., %2." - [(set_attr "type" "neon_fp_mla_")] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (fma:VHSDF + (match_operand:VHSDF 1 "register_operand" "w") + (neg:VHSDF + (match_operand:VHSDF 2 "register_operand" "w")) + (match_operand:VHSDF 3 "register_operand" "0")))] + "TARGET_SIMD" + "fmls\\t%0., %1., %2." + [(set_attr "type" "neon_fp_mla_")] ) (define_insn "*aarch64_fnma4_elt" diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index e78ff43..ad5b6fa 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -26458,6 +26458,32 @@ vsubq_f16 (float16x8_t __a, float16x8_t __b) return __a - __b; } +/* ARMv8.2-A FP16 three operands vector intrinsics. */ + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c) +{ + return __builtin_aarch64_fmav4hf (__b, __c, __a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c) +{ + return __builtin_aarch64_fmav8hf (__b, __c, __a); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c) +{ + return __builtin_aarch64_fnmav4hf (__b, __c, __a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c) +{ + return __builtin_aarch64_fnmav8hf (__b, __c, __a); +} + #pragma GCC pop_options #undef __aarch64_vget_lane_any -- 2.5.0
[AArch64][7/14] ARMv8.2-A FP16 one operand scalar intrinsics
This patch add ARMv8.2-A FP16 one operand scalar intrinsics Scalar intrinsics are kept in arm_fp16.h instead of arm_neon.h. gcc/ 2016-07-07 Jiong Wang * config.gcc (aarch64*-*-*): Install arm_fp16.h. * config/aarch64/aarch64-builtins.c (hi_UP): New. * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_frsqrte): Extend to HF mode. (aarch64_frecp): Likewise. (aarch64_cm): Likewise. * config/aarch64/aarch64.md (2): Likewise. (l2): Likewise. (fix_trunc2): Likewise. (sqrt2): Likewise. (abs2): Likewise. (hf2): New pattern for HF mode. (hihf2): Likewise. * config/aarch64/arm_neon.h: Include arm_fp16.h. * config/aarch64/iterators.md (GPF_F16): New. (GPI_F16): Likewise. (VHSDF_HSDF): Likewise. (w1): Support HF mode. (w2): Likewise. (v): Likewise. (s): Likewise. (q): Likewise. (Vmtype): Likewise. (V_cmp_result): Likewise. (fcvt_iesize): Likewise. (FCVT_IESIZE): Likewise. * config/aarch64/arm_fp16.h: New file. (vabsh_f16): New. (vceqzh_f16): Likewise. (vcgezh_f16): Likewise. (vcgtzh_f16): Likewise. (vclezh_f16): Likewise. (vcltzh_f16): Likewise. (vcvth_f16_s16): Likewise. (vcvth_f16_s32): Likewise. (vcvth_f16_s64): Likewise. (vcvth_f16_u16): Likewise. (vcvth_f16_u32): Likewise. (vcvth_f16_u64): Likewise. (vcvth_s16_f16): Likewise. (vcvth_s32_f16): Likewise. (vcvth_s64_f16): Likewise. (vcvth_u16_f16): Likewise. (vcvth_u32_f16): Likewise. (vcvth_u64_f16): Likewise. (vcvtah_s16_f16): Likewise. (vcvtah_s32_f16): Likewise. (vcvtah_s64_f16): Likewise. (vcvtah_u16_f16): Likewise. (vcvtah_u32_f16): Likewise. (vcvtah_u64_f16): Likewise. (vcvtmh_s16_f16): Likewise. (vcvtmh_s32_f16): Likewise. (vcvtmh_s64_f16): Likewise. (vcvtmh_u16_f16): Likewise. (vcvtmh_u32_f16): Likewise. (vcvtmh_u64_f16): Likewise. (vcvtnh_s16_f16): Likewise. (vcvtnh_s32_f16): Likewise. (vcvtnh_s64_f16): Likewise. (vcvtnh_u16_f16): Likewise. (vcvtnh_u32_f16): Likewise. (vcvtnh_u64_f16): Likewise. (vcvtph_s16_f16): Likewise. (vcvtph_s32_f16): Likewise. (vcvtph_s64_f16): Likewise. (vcvtph_u16_f16): Likewise. (vcvtph_u32_f16): Likewise. (vcvtph_u64_f16): Likewise. (vnegh_f16): Likewise. (vrecpeh_f16): Likewise. (vrecpxh_f16): Likewise. (vrndh_f16): Likewise. (vrndah_f16): Likewise. (vrndih_f16): Likewise. (vrndmh_f16): Likewise. (vrndnh_f16): Likewise. (vrndph_f16): Likewise. (vrndxh_f16): Likewise. (vrsqrteh_f16): Likewise. (vsqrth_f16): Likewise. >From f5f32c0867397594ae4e914acc69bc30d9b15ce9 Mon Sep 17 00:00:00 2001 From: Jiong Wang Date: Wed, 8 Jun 2016 10:31:40 +0100 Subject: [PATCH 07/14] [7/14] ARMv8.2 FP16 one operand scalar intrinsics --- gcc/config.gcc | 2 +- gcc/config/aarch64/aarch64-builtins.c| 1 + gcc/config/aarch64/aarch64-simd-builtins.def | 54 +++- gcc/config/aarch64/aarch64-simd.md | 42 ++- gcc/config/aarch64/aarch64.md| 52 ++-- gcc/config/aarch64/arm_fp16.h| 365 +++ gcc/config/aarch64/arm_neon.h| 2 + gcc/config/aarch64/iterators.md | 32 ++- 8 files changed, 495 insertions(+), 55 deletions(-) create mode 100644 gcc/config/aarch64/arm_fp16.h diff --git a/gcc/config.gcc b/gcc/config.gcc index e47535b..13fefee 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -307,7 +307,7 @@ m32c*-*-*) ;; aarch64*-*-*) cpu_type=aarch64 - extra_headers="arm_neon.h arm_acle.h" + extra_headers="arm_fp16.h arm_neon.h arm_acle.h" c_target_objs="aarch64-c.o" cxx_target_objs="aarch64-c.o" extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o" diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index af5fac5..ca91d91 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -62,6 +62,7 @@ #define si_UPSImode #define sf_UPSFmode #define hi_UPHImode +#define hf_UPHFmode #define qi_UPQImode #define UP(X) X##_UP diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 64c5f86..6a74daa 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -274,6 +274,14 @@ BUILTIN_VHSDF (UNOP, round, 2) BUILTIN_VHSDF_DF (UNOP, frintn, 2
[AArch64][5/14] ARMv8.2-A FP16 lane vector intrinsics
This patch add ARMv8.2-A FP16 lane vector intrinsics. Lane intrinsics are generally derivatives of multiply intrinsics, including multiply accumulate. All necessary backend support for them are there already except fmulx, the implementions are largely a combination of existed multiply intrinsics with vdup intrinsics 2016-07-07 Jiong Wang gcc/ * config/aarch64/aarch64-simd.md (*aarch64_mulx_elt_to_64v2df): Rename to "*aarch64_mulx_elt_from_dup". (*aarch64_mul3_elt): Update schedule type. (*aarch64_mul3_elt_from_dup): Likewise. (*aarch64_fma4_elt_from_dup): Likewise. (*aarch64_fnma4_elt_from_dup): Likewise. * config/aarch64/iterators.md (VMUL): Supprt half precision float modes. (f, fp): Support HF modes. * config/aarch64/arm_neon.h (vfma_lane_f16): New. (vfmaq_lane_f16): Likewise. (vfma_laneq_f16): Likewise. (vfmaq_laneq_f16): Likewise. (vfma_n_f16): Likewise. (vfmaq_n_f16): Likewise. (vfms_lane_f16): Likewise. (vfmsq_lane_f16): Likewise. (vfms_laneq_f16): Likewise. (vfmsq_laneq_f16): Likewise. (vfms_n_f16): Likewise. (vfmsq_n_f16): Likewise. (vmul_lane_f16): Likewise. (vmulq_lane_f16): Likewise. (vmul_laneq_f16): Likewise. (vmulq_laneq_f16): Likewise. (vmul_n_f16): Likewise. (vmulq_n_f16): Likewise. (vmulx_lane_f16): Likewise. (vmulxq_lane_f16): Likewise. (vmulx_laneq_f16): Likewise. (vmulxq_laneq_f16): Likewise. >From 25ed161255c4f0155f3c69c1ee4ec0e071ed115c Mon Sep 17 00:00:00 2001 From: Jiong Wang Date: Wed, 8 Jun 2016 10:22:38 +0100 Subject: [PATCH 05/14] [5/14] ARMv8.2 FP16 lane vector intrinsics --- gcc/config/aarch64/aarch64-simd.md | 28 --- gcc/config/aarch64/arm_neon.h | 154 + gcc/config/aarch64/iterators.md| 7 +- 3 files changed, 173 insertions(+), 16 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 576ad3c..c0600df 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -351,7 +351,7 @@ operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2]))); return "mul\\t%0., %3., %1.[%2]"; } - [(set_attr "type" "neon_mul__scalar")] + [(set_attr "type" "neon_mul__scalar")] ) (define_insn "*aarch64_mul3_elt_" @@ -379,7 +379,7 @@ (match_operand:VMUL 2 "register_operand" "w")))] "TARGET_SIMD" "mul\t%0., %2., %1.[0]"; - [(set_attr "type" "neon_mul__scalar")] + [(set_attr "type" "neon_mul__scalar")] ) (define_insn "aarch64_rsqrte" @@ -1579,7 +1579,7 @@ (match_operand:VMUL 3 "register_operand" "0")))] "TARGET_SIMD" "fmla\t%0., %2., %1.[0]" - [(set_attr "type" "neon_mla__scalar")] + [(set_attr "type" "neon_mla__scalar")] ) (define_insn "*aarch64_fma4_elt_to_64v2df" @@ -1657,7 +1657,7 @@ (match_operand:VMUL 3 "register_operand" "0")))] "TARGET_SIMD" "fmls\t%0., %2., %1.[0]" - [(set_attr "type" "neon_mla__scalar")] + [(set_attr "type" "neon_mla__scalar")] ) (define_insn "*aarch64_fnma4_elt_to_64v2df" @@ -3044,20 +3044,18 @@ [(set_attr "type" "neon_fp_mul_")] ) -;; vmulxq_lane_f64 +;; vmulxq_lane -(define_insn "*aarch64_mulx_elt_to_64v2df" - [(set (match_operand:V2DF 0 "register_operand" "=w") - (unspec:V2DF - [(match_operand:V2DF 1 "register_operand" "w") - (vec_duplicate:V2DF - (match_operand:DF 2 "register_operand" "w"))] +(define_insn "*aarch64_mulx_elt_from_dup" + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF + [(match_operand:VHSDF 1 "register_operand" "w") + (vec_duplicate:VHSDF + (match_operand: 2 "register_operand" "w"))] UNSPEC_FMULX))] "TARGET_SIMD" - { -return "fmulx\t%0.2d, %1.2d, %2.d[0]"; - } - [(set_attr "type" "neon_fp_mul_d_scalar_q")] + "fmulx\t%0., %1., %2.[0]"; + [(set_attr "type" "neon_mul__scalar")] ) ;; vmulxs_lane_f32, vmulxs_laneq_f32 diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index ad5b6fa..b09a3a7 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -26484,6 +26484,160 @@ vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c) return __builtin_aarch64_fnmav8hf (__b, __c, __a); } +/* ARMv8.2-A FP16 lane vector
[AArch64][2/14] ARMv8.2-A FP16 one operand vector intrinsics
This patch add ARMv8.2-A FP16 one operand vector intrinsics. We introduced new mode iterators to cover HF modes, qualified patterns which was using old mode iterators are switched to new ones. We can't simply extend old iterator like VDQF to conver HF modes, because not all patterns using VDQF are with new FP16 support, thus we introduced new, temperary iterators, and only apply new iterators on those patterns which do have FP16 supports. gcc/ 2016-07-07 Jiong Wang * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New. * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_rsqrte): Extend to HF modes. (neg2): Likewise. (abs2): Likewise. (2): Likewise. (l2): Likewise. (2): Likewise. (2): Likewise. (ftrunc2): Likewise. (2): Likewise. (sqrt2): Likwise. (aarch64_frecpe): Likewise. (aarch64_cm): Likewise. * config/aarch64/iterators.md (VHSDF, VHSDF_DF, VHSDF_SDF): New. (VDQF_COND, fcvt_target, FCVT_TARGET, hcon): Extend mode attribute to HF modes. (stype): New. * config/aarch64/arm_neon.h (vdup_n_f16): New. (vdupq_n_f16): Likewise. (vld1_dup_f16): Use vdup_n_f16. (vld1q_dup_f16): Use vdupq_n_f16. (vabs_f16): New. (vabsq_f16): Likewise. (vceqz_f16): Likewise. (vceqzq_f16): Likewise. (vcgez_f16): Likewise. (vcgezq_f16): Likewise. (vcgtz_f16): Likewise. (vcgtzq_f16): Likewise. (vclez_f16): Likewise. (vclezq_f16): Likewise. (vcltz_f16): Likewise. (vcltzq_f16): Likewise. (vcvt_f16_s16): Likewise. (vcvtq_f16_s16): Likewise. (vcvt_f16_u16): Likewise. (vcvtq_f16_u16): Likewise. (vcvt_s16_f16): Likewise. (vcvtq_s16_f16): Likewise. (vcvt_u16_f16): Likewise. (vcvtq_u16_f16): Likewise. (vcvta_s16_f16): Likewise. (vcvtaq_s16_f16): Likewise. (vcvta_u16_f16): Likewise. (vcvtaq_u16_f16): Likewise. (vcvtm_s16_f16): Likewise. (vcvtmq_s16_f16): Likewise. (vcvtm_u16_f16): Likewise. (vcvtmq_u16_f16): Likewise. (vcvtn_s16_f16): Likewise. (vcvtnq_s16_f16): Likewise. (vcvtn_u16_f16): Likewise. (vcvtnq_u16_f16): Likewise. (vcvtp_s16_f16): Likewise. (vcvtpq_s16_f16): Likewise. (vcvtp_u16_f16): Likewise. (vcvtpq_u16_f16): Likewise. (vneg_f16): Likewise. (vnegq_f16): Likewise. (vrecpe_f16): Likewise. (vrecpeq_f16): Likewise. (vrnd_f16): Likewise. (vrndq_f16): Likewise. (vrnda_f16): Likewise. (vrndaq_f16): Likewise. (vrndi_f16): Likewise. (vrndiq_f16): Likewise. (vrndm_f16): Likewise. (vrndmq_f16): Likewise. (vrndn_f16): Likewise. (vrndnq_f16): Likewise. (vrndp_f16): Likewise. (vrndpq_f16): Likewise. (vrndx_f16): Likewise. (vrndxq_f16): Likewise. (vrsqrte_f16): Likewise. (vrsqrteq_f16): Likewise. (vsqrt_f16): Likewise. (vsqrtq_f16): Likewise. >From 3ab3e91e81aa1aa01894a07083e226779145ec88 Mon Sep 17 00:00:00 2001 From: Jiong Wang Date: Wed, 8 Jun 2016 09:30:16 +0100 Subject: [PATCH 02/14] [2/14] ARMv8.2 FP16 one operand vector intrinsics --- gcc/config/aarch64/aarch64-builtins.c| 4 + gcc/config/aarch64/aarch64-simd-builtins.def | 56 - gcc/config/aarch64/aarch64-simd.md | 78 +++--- gcc/config/aarch64/arm_neon.h| 361 ++- gcc/config/aarch64/iterators.md | 37 ++- 5 files changed, 478 insertions(+), 58 deletions(-) diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 6b90b2a..af5fac5 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -139,6 +139,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_unsigned }; #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers) static enum aarch64_type_qualifiers +aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none, qualifier_none }; +#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers) +static enum aarch64_type_qualifiers aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_poly, qualifier_poly, qualifier_poly }; #define TYPES_BINOPP (aarch64_types_binopp_qualifiers) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index df0a7d8..3e48046 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -42,7 +42,7 @@ BUILTIN_VDC (COMBINE, combine, 0) BUILTIN_VB (BINOP, pmul, 0) BUILTIN_VALLF (BINO
[AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics
This patch add ARMv8.2-A FP16 two operands vector intrinsics. gcc/ 2016-07-07 Jiong Wang * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_rsqrts): Extend to HF modes. (fabd3): Likewise. (3): Likewise. (3): Likewise. (aarch64_p): Likewise. (3): Likewise. (3): Likewise. (3): Likewise. (aarch64_faddp): Likewise. (aarch64_fmulx): Likewise. (aarch64_frecps): Likewise. (*aarch64_fac): Rename to aarch64_fac. (add3): Extend to HF modes. (sub3): Likewise. (mul3): Likewise. (div3): Likewise. * config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode iterator. * config/aarch64/arm_neon.h (vadd_f16): Likewise. (vaddq_f16): Likewise. (vabd_f16): Likewise. (vabdq_f16): Likewise. (vcage_f16): Likewise. (vcageq_f16): Likewise. (vcagt_f16): Likewise. (vcagtq_f16): Likewise. (vcale_f16): Likewise. (vcaleq_f16): Likewise. (vcalt_f16): Likewise. (vcaltq_f16): Likewise. (vceq_f16): Likewise. (vceqq_f16): Likewise. (vcge_f16): Likewise. (vcgeq_f16): Likewise. (vcgt_f16): Likewise. (vcgtq_f16): Likewise. (vcle_f16): Likewise. (vcleq_f16): Likewise. (vclt_f16): Likewise. (vcltq_f16): Likewise. (vcvt_n_f16_s16): Likewise. (vcvtq_n_f16_s16): Likewise. (vcvt_n_f16_u16): Likewise. (vcvtq_n_f16_u16): Likewise. (vcvt_n_s16_f16): Likewise. (vcvtq_n_s16_f16): Likewise. (vcvt_n_u16_f16): Likewise. (vcvtq_n_u16_f16): Likewise. (vdiv_f16): Likewise. (vdivq_f16): Likewise. (vdup_lane_f16): Likewise. (vdup_laneq_f16): Likewise. (vdupq_lane_f16): Likewise. (vdupq_laneq_f16): Likewise. (vdups_lane_f16): Likewise. (vdups_laneq_f16): Likewise. (vmax_f16): Likewise. (vmaxq_f16): Likewise. (vmaxnm_f16): Likewise. (vmaxnmq_f16): Likewise. (vmin_f16): Likewise. (vminq_f16): Likewise. (vminnm_f16): Likewise. (vminnmq_f16): Likewise. (vmul_f16): Likewise. (vmulq_f16): Likewise. (vmulx_f16): Likewise. (vmulxq_f16): Likewise. (vpadd_f16): Likewise. (vpaddq_f16): Likewise. (vpmax_f16): Likewise. (vpmaxq_f16): Likewise. (vpmaxnm_f16): Likewise. (vpmaxnmq_f16): Likewise. (vpmin_f16): Likewise. (vpminq_f16): Likewise. (vpminnm_f16): Likewise. (vpminnmq_f16): Likewise. (vrecps_f16): Likewise. (vrecpsq_f16): Likewise. (vrsqrts_f16): Likewise. (vrsqrtsq_f16): Likewise. (vsub_f16): Likewise. (vsubq_f16): Likewise. commit 5ed72d355491365b3af5883cdc5a4fdaf5cb545b Author: Jiong Wang Date: Wed Jun 8 10:10:28 2016 +0100 [3/14] ARMv8.2 FP16 two operands vector intrinsics gcc/config/aarch64/aarch64-simd-builtins.def | 40 +-- gcc/config/aarch64/aarch64-simd.md | 152 +-- gcc/config/aarch64/arm_neon.h| 362 +++ gcc/config/aarch64/iterators.md | 10 + 4 files changed, 473 insertions(+), 91 deletions(-) commit 5ed72d355491365b3af5883cdc5a4fdaf5cb545b Author: Jiong Wang Date: Wed Jun 8 10:10:28 2016 +0100 [3/14] ARMv8.2 FP16 two operands vector intrinsics diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 3e48046..fe17298 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -41,7 +41,7 @@ BUILTIN_VDC (COMBINE, combine, 0) BUILTIN_VB (BINOP, pmul, 0) - BUILTIN_VALLF (BINOP, fmulx, 0) + BUILTIN_VHSDF_SDF (BINOP, fmulx, 0) BUILTIN_VHSDF_DF (UNOP, sqrt, 2) BUILTIN_VD_BHSI (BINOP, addp, 0) VAR1 (UNOP, addp, 0, di) @@ -248,22 +248,22 @@ BUILTIN_VDQ_BHSI (BINOP, smin, 3) BUILTIN_VDQ_BHSI (BINOP, umax, 3) BUILTIN_VDQ_BHSI (BINOP, umin, 3) - BUILTIN_VDQF (BINOP, smax_nan, 3) - BUILTIN_VDQF (BINOP, smin_nan, 3) + BUILTIN_VHSDF (BINOP, smax_nan, 3) + BUILTIN_VHSDF (BINOP, smin_nan, 3) /* Implemented by 3. */ - BUILTIN_VDQF (BINOP, fmax, 3) - BUILTIN_VDQF (BINOP, fmin, 3) + BUILTIN_VHSDF (BINOP, fmax, 3) + BUILTIN_VHSDF (BINOP, fmin, 3) /* Implemented by aarch64_p. */ BUILTIN_VDQ_BHSI (BINOP, smaxp, 0) BUILTIN_VDQ_BHSI (BINOP, sminp, 0) BUILTIN_VDQ_BHSI (BINOP, umaxp, 0) BUILTIN_VDQ_BHSI (BINOP, uminp, 0) - BUILTIN_VDQF (BINOP, smaxp, 0) - BUILTIN_VDQF (BINOP, sminp, 0) - BUILTIN_VDQF (BINOP, smax_nanp, 0) - BUILTIN_VDQF (BINOP, smin_nanp, 0) + BUILTIN_VHSDF (BINOP, smaxp, 0) + BUILTIN_VHSDF (BINOP, sminp, 0) + BUILTIN_VHSDF (BINOP, smax_nanp, 0) + BUILTIN_VHSDF (BINOP, smin_nanp, 0