RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset
Hi Richard > -Original Message- > From: Richard Sandiford > Sent: 11 November 2020 17:52 > To: Sudakshina Das > Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org; > Kyrylo Tkachov ; Richard Earnshaw > > Subject: Re: [PATCH] aarch64: Add backend support for expanding > __builtin_memset > > Sudakshina Das writes: > > Apologies for the delay. I have attached another version of the patch. > > I have disabled the test cases for ILP32. This is only because > > function body check fails because there is an addition unsigned extension > instruction for src pointer in > > every test (uxtwx0, w0). The actual inlining is not different. > > Yeah, agree that's the best way of handling the ILP32 difference. > > > […] > > +/* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant. > Without > > + -mstrict-align, make decisions in "setmem". Otherwise follow a sensible > > + default: when optimizing for size adjust the ratio to account for > > +the > > nit: should just be one space after “:” > > > […] > > @@ -21289,6 +21292,134 @@ aarch64_expand_cpymem (rtx *operands) > >return true; > > } > > > > +/* Like aarch64_copy_one_block_and_progress_pointers, except for > memset where > > + *src is a register we have created with the duplicated value to be > > +set. */ > > “*src” -> SRC > since there's no dereference now > > > […] > > + /* In case we are optimizing for size or if the core does not > > + want to use STP Q regs, lower the max_set_size. */ > > + max_set_size = (!speed_p > > + || (aarch64_tune_params.extra_tuning_flags > > + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)) > > + ? max_set_size/2 : max_set_size; > > Formatting nit: should be a space either side of “/”. > > > + while (n > 0) > > +{ > > + /* Find the largest mode in which to do the copy in without > > +over writing. */ > > s/in without/without/ > > > + opt_scalar_int_mode mode_iter; > > + FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT) > > + if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit)) > > + cur_mode = mode_iter.require (); > > + > > + gcc_assert (cur_mode != BLKmode); > > + > > + mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant (); > > + aarch64_set_one_block_and_progress_pointer (src, , > > + cur_mode); > > + > > + n -= mode_bits; > > + > > + /* Do certain trailing copies as overlapping if it's going to be > > +cheaper. i.e. less instructions to do so. For instance doing a 15 > > +byte copy it's more efficient to do two overlapping 8 byte copies > than > > +8 + 4 + 2 + 1. */ > > + if (n > 0 && n < copy_limit / 2) > > + { > > + next_mode = smallest_mode_for_size (n, MODE_INT); > > + int n_bits = GET_MODE_BITSIZE (next_mode).to_constant (); > > Sorry for the runaround, but looking at this again, I'm a bit worried that we > only indirectly test that n_bits is within the length of the original set. I > guess > it is because if n < copy_limit / 2 then n < mode_bits, and so n_bits will > never > exceed mode_bits. I think it might be worth adding an assert to make that > “clearer” (maybe only to me, probably obvious to everyone else): > > gcc_assert (n_bits <= mode_bits); > > OK with those changes, thanks. Thank you! Committed as 54bbde5 with those changes. Sudi > > Richard > > > + dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT); > > + n = n_bits; > > + } > > +} > > + > > + return true; > > +} > > + > > + > > /* Split a DImode store of a CONST_INT SRC to MEM DST as two > > SImode stores. Handle the case when the constant has identical > > bottom and top halves. This is beneficial when the two stores can > > be
RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset
Hi Richard > -Original Message- > From: Richard Sandiford > Sent: 03 November 2020 11:34 > To: Sudakshina Das > Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org; > Kyrylo Tkachov ; Richard Earnshaw > > Subject: Re: [PATCH] aarch64: Add backend support for expanding > __builtin_memset > > Sudakshina Das writes: > >> -Original Message- > >> From: Richard Sandiford > >> Sent: 30 October 2020 19:56 > >> To: Sudakshina Das > >> Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org; > >> Kyrylo Tkachov ; Richard Earnshaw > >> > >> Subject: Re: [PATCH] aarch64: Add backend support for expanding > >> __builtin_memset > >> > >> > + base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = > >> > + adjust_automodify_address (dst, VOIDmode, base, 0); > >> > + > >> > + /* Prepare the val using a DUP v0.16B, val. */ if (CONST_INT_P > >> > + (val)) > >> > +{ > >> > + val = force_reg (QImode, val); > >> > +} > >> > + src = gen_reg_rtx (V16QImode); > >> > + emit_insn (gen_aarch64_simd_dupv16qi(src, val)); > >> > >> I think we should use: > >> > >> src = expand_vector_broadcast (V16QImode, val); > >> > >> here (without the CONST_INT_P check), so that for constants we just > >> move a constant directly into a register. > >> > > > > Sorry to bring this up again. When I tried expand_vector_broadcast, I > > see the following behaviour: > > for __builtin_memset(p, 1, 24) where the duplicated constant fits > > moviv0.16b, 0x1 > > mov x1, 72340172838076673 > > str x1, [x0, 16] > > str q0, [x0] > > and an ICE for __builtin_memset(p, 1, 32) where I am guessing the > > duplicated constant does not fit > > x.c:7:30: error: unrecognizable insn: > > 7 | { __builtin_memset(p, 1, 32);} > > | ^ > > (insn 8 7 0 2 (parallel [ > > (set (mem:V16QI (reg:DI 94) [0 MEM [(void > > *)p_2(D)]+0 > S16 A8]) > > (const_vector:V16QI [ > > (const_int 1 [0x1]) repeated x16 > > ])) > > (set (mem:V16QI (plus:DI (reg:DI 94) > > (const_int 16 [0x10])) [0 MEM [(void > > *)p_2(D)]+16 > S16 A8]) > > (const_vector:V16QI [ > > (const_int 1 [0x1]) repeated x16 > > ])) > > ]) "x.c":7:3 -1 > > (nil)) > > during RTL pass: vregs > > Ah, yeah, I guess we need to call force_reg on the result. > > >> So yeah, I'm certainly not questioning the speed_p value of 256. > >> I'm sure you and Wilco have picked the best value for that. But -Os > >> stuff can usually be justified on first principles and I wasn't sure > >> where the value of 128 came from. > >> > > > > I had another chat with Wilco about the 128byte value for !speed_p. We > > estimate the average number of instructions upto 128byte would be ~3 > > which is similar to do a memset call. But I did go back and think > > about the tuning argument of > AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS a > > bit more because you are right that based on that the average instructions > can become double. > > I would propose using 256/128 based on speed_p but halving the value > > based on the tune parameter. Obviously the assumption here is that we > > are respecting the core's choice of avoiding stp of q registers (given > > that I do not see other uses of > AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS being changed by -Os). > > Yeah, but I think the lack of an -Os check in the existing code might be a > mistake. The point is that STP Q is smaller than two separate STR Qs, so > using > it is a size optimisation even if it's not a speed optimisation. > And like I say, -Os isn't supposed to be striking a balance between size and > speed: it's supposed to be going for size quite aggressively. > > So TBH I have slight preference for keeping the current value and only > checking the tuning flag for speed_p. But I agree that halving the value > would be self-consistent, so if you or Wilco believe strongly that halving is > better, that'd be OK with me too. > > > There might be a debate on how useful > > AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS > > is in the context of memset/memcpy but that needs more analysis and I > > would say should be a separate patch. >
RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset
Hi Richard > -Original Message- > From: Richard Sandiford > Sent: 30 October 2020 19:56 > To: Sudakshina Das > Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org; > Kyrylo Tkachov ; Richard Earnshaw > > Subject: Re: [PATCH] aarch64: Add backend support for expanding > __builtin_memset > > > + base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = > > + adjust_automodify_address (dst, VOIDmode, base, 0); > > + > > + /* Prepare the val using a DUP v0.16B, val. */ if (CONST_INT_P > > + (val)) > > +{ > > + val = force_reg (QImode, val); > > +} > > + src = gen_reg_rtx (V16QImode); > > + emit_insn (gen_aarch64_simd_dupv16qi(src, val)); > > I think we should use: > > src = expand_vector_broadcast (V16QImode, val); > > here (without the CONST_INT_P check), so that for constants we just move a > constant directly into a register. > Sorry to bring this up again. When I tried expand_vector_broadcast, I see the following behaviour: for __builtin_memset(p, 1, 24) where the duplicated constant fits moviv0.16b, 0x1 mov x1, 72340172838076673 str x1, [x0, 16] str q0, [x0] and an ICE for __builtin_memset(p, 1, 32) where I am guessing the duplicated constant does not fit x.c:7:30: error: unrecognizable insn: 7 | { __builtin_memset(p, 1, 32);} | ^ (insn 8 7 0 2 (parallel [ (set (mem:V16QI (reg:DI 94) [0 MEM [(void *)p_2(D)]+0 S16 A8]) (const_vector:V16QI [ (const_int 1 [0x1]) repeated x16 ])) (set (mem:V16QI (plus:DI (reg:DI 94) (const_int 16 [0x10])) [0 MEM [(void *)p_2(D)]+16 S16 A8]) (const_vector:V16QI [ (const_int 1 [0x1]) repeated x16 ])) ]) "x.c":7:3 -1 (nil)) during RTL pass: vregs > Sudakshina Das writes: > >> > + > >> > + /* "Cast" the *dst to the correct mode. */ *dst = > >> > + adjust_address (*dst, mode, 0); > >> > + /* Emit the memset. */ > >> > + emit_move_insn (*dst, reg); > >> > + /* Move the pointer forward. */ *dst = > >> > + aarch64_progress_pointer (*dst); } > >> > + > >> > +/* Expand setmem, as if from a __builtin_memset. Return true if > >> > + we succeed, otherwise return false. */ > >> > + > >> > +bool > >> > +aarch64_expand_setmem (rtx *operands) { > >> > + int n, mode_bits; > >> > + unsigned HOST_WIDE_INT len; > >> > + rtx dst = operands[0]; > >> > + rtx val = operands[2], src; > >> > + rtx base; > >> > + machine_mode cur_mode = BLKmode, next_mode; > >> > + bool speed_p = !optimize_function_for_size_p (cfun); > >> > + unsigned max_set_size = speed_p ? 256 : 128; > >> > >> What's the basis for the size value? AIUI (and I've probably got > >> this wrong), that effectively means a worst case of 3+2 stores > >> (3 STP Qs and 2 mop-up stores). Then we need one instruction to set > >> up the constant. So if that's right, it looks like the worst-case size is > >> 6 > instructions. > >> > >> AARCH64_CALL_RATIO has a value of 8, but I'm not sure how that > >> relates to the number of instructions in a call. I guess the best > >> case is 4 (3 instructions for the parameters and one for the call itself). > >> > > > > This one I will ask Wilco to chime in. We discussed offline what would > > be the largest case that this builtin should allow and he suggested > > 256-bytes. It would actually generate 9 instructions (its in the memset- > corner-case.c). > > Personally I am not sure what the best decisions are in this case so I > > will rely on Wilco's suggestions. > > Ah, sorry, by “the size value”, I meant the !speed_p value of 128. > I now realise that that was far from clear given that the variable is called > max_set_size :-) > > So yeah, I'm certainly not questioning the speed_p value of 256. > I'm sure you and Wilco have picked the best value for that. But -Os stuff can > usually be justified on first principles and I wasn't sure where the value of > 128 > came from. > I had another chat with Wilco about the 128byte value for !speed_p. We estimate the average number of instructions upto 128byte would be ~3 which is similar to do a memset call. But I did go back and think about the tuning argument of AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS a bit more because you are right that based on that the average instructions ca
RE: [PATCH] aarch64: Fix PR97638
Hi Richard > -Original Message- > From: Richard Sandiford > Sent: 02 November 2020 10:31 > To: Sudakshina Das > Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ; > Richard Earnshaw > Subject: Re: [PATCH] aarch64: Fix PR97638 > > Sudakshina Das writes: > > Hi > > > > Currently the testcase in the patch was failing to produce a 'bti c' > > at the beginning of the function. This was because in > > aarch64_pac_insn_p, we were wrongly returning at the first check. This > > patch fixes the return value. > > > > Bootstrap and regression tested on aarch64-none-linux-gnu. > > Is this ok for trunk and gcc 10 backport? > > OK for both, thanks. Thank you! Pushed to trunk. Will wait for a couple of days before backport. Thanks Sudi > > Richard
[PATCH] aarch64: Fix PR97638
Hi Currently the testcase in the patch was failing to produce a 'bti c' at the beginning of the function. This was because in aarch64_pac_insn_p, we were wrongly returning at the first check. This patch fixes the return value. Bootstrap and regression tested on aarch64-none-linux-gnu. Is this ok for trunk and gcc 10 backport? Thanks Sudi gcc/ChangeLog: 2020-10-30 Sudakshina Das PR target/97638 * config/aarch64/aarch64-bti-insert.c (aarch64_pac_insn_p): Update return value on INSN_P check. gcc/testsuite/ChangeLog: 2020-10-30 Sudakshina Das PR target/97638 * gcc.target/aarch64/pr97638.c: New test. ### Attachment also inlined for ease of reply### diff --git a/gcc/config/aarch64/aarch64-bti-insert.c b/gcc/config/aarch64/aarch64-bti-insert.c index 57663ee23b490162dbe7ffe2f618066e71cea455..98026695fdbbe2eda84e0befad94b5fe4ce22754 100644 --- a/gcc/config/aarch64/aarch64-bti-insert.c +++ b/gcc/config/aarch64/aarch64-bti-insert.c @@ -95,7 +95,7 @@ static bool aarch64_pac_insn_p (rtx x) { if (!INSN_P (x)) -return x; +return false; subrtx_var_iterator::array_type array; FOR_EACH_SUBRTX_VAR (iter, array, PATTERN (x), ALL) diff --git a/gcc/testsuite/gcc.target/aarch64/pr97638.c b/gcc/testsuite/gcc.target/aarch64/pr97638.c new file mode 100644 index ..e5869e86c449aef5606541c4c7a51069a1426793 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr97638.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mbranch-protection=bti" } */ + +char *foo (const char *s, const int c) +{ + const char *p = 0; + for (;;) + { +if (*s == c) +p = s; +if (p != 0 || *s++ == 0) +break; + } + return (char *)p; +} + +/* { dg-final { scan-assembler "hint\t34" } } */ rb13708.patch Description: rb13708.patch
RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset
Hi Richard Thank you for the review. Please find my comments inlined. > -Original Message- > From: Richard Sandiford > Sent: 30 October 2020 15:03 > To: Sudakshina Das > Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ; > Richard Earnshaw > Subject: Re: [PATCH] aarch64: Add backend support for expanding > __builtin_memset > > Sudakshina Das writes: > > diff --git a/gcc/config/aarch64/aarch64.h > > b/gcc/config/aarch64/aarch64.h index > > > 00b5f8438863bb52c348cfafd5d4db478fe248a7..bcb654809c9662db0f51fc1368 > e3 > > 7e42969efd29 100644 > > --- a/gcc/config/aarch64/aarch64.h > > +++ b/gcc/config/aarch64/aarch64.h > > @@ -1024,16 +1024,18 @@ typedef struct #define MOVE_RATIO(speed) \ > >(!STRICT_ALIGNMENT ? 2 : (((speed) ? 15 : AARCH64_CALL_RATIO) / 2)) > > > > -/* For CLEAR_RATIO, when optimizing for size, give a better estimate > > - of the length of a memset call, but use the default otherwise. */ > > +/* Like MOVE_RATIO, without -mstrict-align, make decisions in "setmem" > when > > + we would use more than 3 scalar instructions. > > + Otherwise follow a sensible default: when optimizing for size, give a > better > > + estimate of the length of a memset call, but use the default > > +otherwise. */ > > #define CLEAR_RATIO(speed) \ > > - ((speed) ? 15 : AARCH64_CALL_RATIO) > > + (!STRICT_ALIGNMENT ? 4 : (speed) ? 15 : AARCH64_CALL_RATIO) > > > > /* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant, so > when > > optimizing for size adjust the ratio to account for the overhead of > > loading > > the constant. */ > > #define SET_RATIO(speed) \ > > - ((speed) ? 15 : AARCH64_CALL_RATIO - 2) > > + (!STRICT_ALIGNMENT ? 0 : (speed) ? 15 : AARCH64_CALL_RATIO - 2) > > Think it would help to adjust the SET_RATIO comment too, otherwise it's not > obvious why its !STRICT_ALIGNMNENT value is 0. > Will do. > > > > /* Disable auto-increment in move_by_pieces et al. Use of auto- > increment is > > rarely a good idea in straight-line code since it adds an extra > > address diff --git a/gcc/config/aarch64/aarch64.c > > b/gcc/config/aarch64/aarch64.c index > > > a8cc545c37044345c3f1d3bf09151c8a9578a032..16ac0c076adcc82627af43473a9 > 3 > > 8e78d3a7ecdc 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -7058,6 +7058,9 @@ aarch64_gen_store_pair (machine_mode mode, > rtx mem1, rtx reg1, rtx mem2, > > case E_V4SImode: > >return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2); > > > > +case E_V16QImode: > > + return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2); > > + > > default: > >gcc_unreachable (); > > } > > @@ -21373,6 +21376,134 @@ aarch64_expand_cpymem (rtx *operands) > >return true; > > } > > > > +/* Like aarch64_copy_one_block_and_progress_pointers, except for > memset where > > + *src is a register we have created with the duplicated value to be > > +set. */ > > AIUI, *SRC doesn't accumulate across calls in the way that it does for > aarch64_copy_one_block_and_progress_pointers, so it might be better to > pass an rtx rather than an “rtx *”. > Will do. > > +static void > > +aarch64_set_one_block_and_progress_pointer (rtx *src, rtx *dst, > > + machine_mode mode) > > +{ > > + /* If we are copying 128bits or 256bits, we can do that straight from > > + the SIMD register we prepared. */ > > Nit: excess space before “the”. > Will do. > > + if (known_eq (GET_MODE_BITSIZE (mode), 256)) > > +{ > > + mode = GET_MODE (*src); > > Excess space before “GET_MODE”. > Will do. > > + /* "Cast" the *dst to the correct mode. */ > > + *dst = adjust_address (*dst, mode, 0); > > + /* Emit the memset. */ > > + emit_insn (aarch64_gen_store_pair (mode, *dst, *src, > > +aarch64_progress_pointer (*dst), > *src)); > > + > > + /* Move the pointers forward. */ > > + *dst = aarch64_move_pointer (*dst, 32); > > + return; > > +} > > + else if (known_eq (GET_MODE_BITSIZE (mode), 128)) > > Nit: more usual in GCC not to have an “else” after an early return. > Will do. > > +{ > > + /* "Cast" the *dst to the correct mode. */ > > + *dst = adjust_address (*dst, GET_MODE (*src), 0); > > + /* Emit the memset. */ > > +
[PATCH] aarch64: Add backend support for expanding __builtin_memset
Hi This patch implements aarch64 backend expansion for __builtin_memset. Most of the implementation is based on the expansion of __builtin_memcpy. We change the values of SET_RATIO and MOVE_RATIO for cases where we do not have to strictly align and where we can benefit from NEON instructions in the backend. So for a test case like: void foo (void* p) { __builtin_memset (p, 1, 7); } instead of generating: mov w3, 16843009 mov w2, 257 mov w1, 1 str w3, [x0] strhw2, [x0, 4] strbw1, [x0, 6] ret we now generate moviv0.16b, 0x1 str s0, [x0] str s0, [x0, 3] ret Bootstrapped and regression tested on aarch64-none-linux-gnu. With this patch I have seen an overall improvement of 0.27% in Spec2017 Int and 0.19% in Spec2017 FP benchmarks on Neoverse N1. Is this ok for trunk? gcc/ChangeLog: 2020-xx-xx Sudakshina Das * config/aarch64/aarch64-protos.h (aarch64_expand_setmem): New declaration. * config/aarch64/aarch64.c (aarch64_gen_store_pair): Add case for E_V16QImode. (aarch64_set_one_block_and_progress_pointer): New helper for aarch64_expand_setmem. (aarch64_expand_setmem): Define the expansion for memset. * config/aarch64/aarch64.h (CLEAR_RATIO): Tweak to favor aarch64_expand_setmem when allowed and profitable. (SET_RATIO): Likewise. * config/aarch64/aarch64.md: Define pattern for setmemdi. gcc/testsuite/ChangeLog: 2020-xx-xx Sudakshina Das * g++.dg/tree-ssa/pr90883.C: Remove xfail for aarch64. * gcc.dg/tree-prof/stringop-2.c: Add xfail for aarch64. * gcc.target/aarch64/memset-corner-cases.c: New test. * gcc.target/aarch64/memset-q-reg.c: New test. Thanks Sudi ### Attachment also inlined for ease of reply### diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 7a34c841355bad88365381912b163c61c5a35811..2aa3f1fddaafae58f0bfb26e5b33fe6a94e85e06 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -510,6 +510,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx); bool aarch64_emit_approx_sqrt (rtx, rtx, bool); void aarch64_expand_call (rtx, rtx, rtx, bool); bool aarch64_expand_cpymem (rtx *); +bool aarch64_expand_setmem (rtx *); bool aarch64_float_const_zero_rtx_p (rtx); bool aarch64_float_const_rtx_p (rtx); bool aarch64_function_arg_regno_p (unsigned); diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 00b5f8438863bb52c348cfafd5d4db478fe248a7..bcb654809c9662db0f51fc1368e37e42969efd29 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -1024,16 +1024,18 @@ typedef struct #define MOVE_RATIO(speed) \ (!STRICT_ALIGNMENT ? 2 : (((speed) ? 15 : AARCH64_CALL_RATIO) / 2)) -/* For CLEAR_RATIO, when optimizing for size, give a better estimate - of the length of a memset call, but use the default otherwise. */ +/* Like MOVE_RATIO, without -mstrict-align, make decisions in "setmem" when + we would use more than 3 scalar instructions. + Otherwise follow a sensible default: when optimizing for size, give a better + estimate of the length of a memset call, but use the default otherwise. */ #define CLEAR_RATIO(speed) \ - ((speed) ? 15 : AARCH64_CALL_RATIO) + (!STRICT_ALIGNMENT ? 4 : (speed) ? 15 : AARCH64_CALL_RATIO) /* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant, so when optimizing for size adjust the ratio to account for the overhead of loading the constant. */ #define SET_RATIO(speed) \ - ((speed) ? 15 : AARCH64_CALL_RATIO - 2) + (!STRICT_ALIGNMENT ? 0 : (speed) ? 15 : AARCH64_CALL_RATIO - 2) /* Disable auto-increment in move_by_pieces et al. Use of auto-increment is rarely a good idea in straight-line code since it adds an extra address diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index a8cc545c37044345c3f1d3bf09151c8a9578a032..16ac0c076adcc82627af43473a938e78d3a7ecdc 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7058,6 +7058,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2, case E_V4SImode: return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2); +case E_V16QImode: + return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2); + default: gcc_unreachable (); } @@ -21373,6 +21376,134 @@ aarch64_expand_cpymem (rtx *operands) return true; } +/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where + *src is a register we have created with the duplicated value to be set. */ +static void +aarch64_set_one_block_and_progress_pointer (rtx *src, rtx *dst, + machine_mode mode) +{ + /* If we are copying 128bits or 256bits,
RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion
Hi Richard Thank you for fixing this. I apologise for the trouble. I ran bootstrap only on an earlier version of the patch where I should have ran it again on the final one! ☹ I will be more careful in the future, Thanks Sudi > -Original Message- > From: Richard Sandiford > Sent: 05 August 2020 14:52 > To: Andreas Schwab > Cc: Sudakshina Das ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem > expansion > > Andreas Schwab writes: > > This breaks bootstrap. > > I've pushed the below to fix this after bootstrapping & regression testing on > aarch64-linux-gnu. > > Richard
RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion
Hi Richard > -Original Message- > From: Richard Sandiford > Sent: 31 July 2020 16:14 > To: Sudakshina Das > Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov > Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem > expansion > > Sudakshina Das writes: > > Hi > > > > This is my attempt at reviving the old patch > > https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html > > > > I have followed on Kyrill's comment upstream on the link above and I am > using the recommended option iii that he mentioned. > > "1) Adjust the copy_limit to 256 bits after checking > AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning. > > 2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256- > bit moves. by iii: > >iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs > > ldp/stps. This > wouldn't need any adjustments to > > MD patterns, but would make > aarch64_copy_one_block_and_progress_pointers more complex as it would > now have > > two paths, where one handles two adjacent memory addresses in one > calls." > > > > With this patch the following test > > > > #define N 8 > > extern int src[N], dst[N]; > > > > void > > foo (void) > > { > > __builtin_memcpy (dst, src, N * sizeof (int)); } > > > > which was originally giving > > foo: > > adrpx1, src > > add x1, x1, :lo12:src > > ldp x4, x5, [x1] > > adrpx0, dst > > add x0, x0, :lo12:dst > > ldp x2, x3, [x1, 16] > > stp x4, x5, [x0] > > stp x2, x3, [x0, 16] > > ret > > > > > > changes to the following > > foo: > > adrpx1, src > > add x1, x1, :lo12:src > > adrpx0, dst > > add x0, x0, :lo12:dst > > ldp q1, q0, [x1] > > stp q1, q0, [x0] > > ret > > > > This gives about 1.3% improvement on 523.xalancbmk_r in SPEC2017 and > > an overall code size reduction on most > > SPEC2017 Int benchmarks on Neoverse N1 due to more LDP/STP Q pair > registers. > > Sorry for the slow review. LGTM with a very minor nit (sorry)… Thanks. Committed with the change. > > > @@ -21150,9 +21177,12 @@ aarch64_expand_cpymem (rtx *operands) > >/* Convert n to bits to make the rest of the code simpler. */ > >n = n * BITS_PER_UNIT; > > > > - /* Maximum amount to copy in one go. The AArch64 back-end has > integer modes > > - larger than TImode, but we should not use them for loads/stores here. > */ > > - const int copy_limit = GET_MODE_BITSIZE (TImode); > > + /* Maximum amount to copy in one go. We allow 256-bit chunks based > on the > > + AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and > > +TARGET_SIMD. */ > > + const int copy_limit = ((aarch64_tune_params.extra_tuning_flags > > + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) > > + || !TARGET_SIMD) > > +? GET_MODE_BITSIZE (TImode) : 256; > > Should only be one space before “256”. > > I guess at some point we should consider handling fixed-length SVE too, but > that's only worth it for -msve-vector-bits=512 and higher. Yes sure I will add this for future backlog. > > Thanks, > Richard
[PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion
Hi This is my attempt at reviving the old patch https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html I have followed on Kyrill's comment upstream on the link above and I am using the recommended option iii that he mentioned. "1) Adjust the copy_limit to 256 bits after checking AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning. 2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256-bit moves. by iii: iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs ldp/stps. This wouldn't need any adjustments to MD patterns, but would make aarch64_copy_one_block_and_progress_pointers more complex as it would now have two paths, where one handles two adjacent memory addresses in one calls." With this patch the following test #define N 8 extern int src[N], dst[N]; void foo (void) { __builtin_memcpy (dst, src, N * sizeof (int)); } which was originally giving foo: adrpx1, src add x1, x1, :lo12:src ldp x4, x5, [x1] adrpx0, dst add x0, x0, :lo12:dst ldp x2, x3, [x1, 16] stp x4, x5, [x0] stp x2, x3, [x0, 16] ret changes to the following foo: adrpx1, src add x1, x1, :lo12:src adrpx0, dst add x0, x0, :lo12:dst ldp q1, q0, [x1] stp q1, q0, [x0] ret This gives about 1.3% improvement on 523.xalancbmk_r in SPEC2017 and an overall code size reduction on most SPEC2017 Int benchmarks on Neoverse N1 due to more LDP/STP Q pair registers. Bootstrapped and regression tested on aarch64-none-linux-gnu. Is this ok for trunk? Thanks Sudi gcc/ChangeLog: 2020-07-23 Sudakshina Das Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_gen_store_pair): Add case for E_V4SImode. (aarch64_gen_load_pair): Likewise. (aarch64_copy_one_block_and_progress_pointers): Handle 256 bit copy. (aarch64_expand_cpymem): Expand copy_limit to 256bits where appropriate. gcc/testsuite/ChangeLog: 2020-07-23 Sudakshina Das Kyrylo Tkachov * gcc.target/aarch64/cpymem-q-reg_1.c: New test. * gcc.target/aarch64/large_struct_copy_2.c: Update for ldp q regs. ** Attachment inlined ** diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3fe1feaa80ccb0a287ee1c7ea1056e8f0a830532..a38ff39c4d5d53f056bbba3114ebaf8f0414c037 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6920,6 +6920,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2, case E_TFmode: return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2); +case E_V4SImode: + return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2); + default: gcc_unreachable (); } @@ -6943,6 +6946,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2, case E_TFmode: return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2); +case E_V4SImode: + return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2); + default: gcc_unreachable (); } @@ -21097,6 +21103,27 @@ static void aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, machine_mode mode) { + /* Handle 256-bit memcpy separately. We do this by making 2 adjacent memory + address copies using V4SImode so that we can use Q registers. */ + if (known_eq (GET_MODE_BITSIZE (mode), 256)) +{ + mode = V4SImode; + rtx reg1 = gen_reg_rtx (mode); + rtx reg2 = gen_reg_rtx (mode); + /* "Cast" the pointers to the correct mode. */ + *src = adjust_address (*src, mode, 0); + *dst = adjust_address (*dst, mode, 0); + /* Emit the memcpy. */ + emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2, + aarch64_progress_pointer (*src))); + emit_insn (aarch64_gen_store_pair (mode, *dst, reg1, +aarch64_progress_pointer (*dst), reg2)); + /* Move the pointers forward. */ + *src = aarch64_move_pointer (*src, 32); + *dst = aarch64_move_pointer (*dst, 32); + return; +} + rtx reg = gen_reg_rtx (mode); /* "Cast" the pointers to the correct mode. */ @@ -21150,9 +21177,12 @@ aarch64_expand_cpymem (rtx *operands) /* Convert n to bits to make the rest of the code simpler. */ n = n * BITS_PER_UNIT; - /* Maximum amount to copy in one go. The AArch64 back-end has integer modes - larger than TImode, but we should not use them for loads/stores here. */ - const int copy_limit = GET_MODE_BITSIZE (TImode); + /* Maximum amount to copy in one go. We allow 256-bit chunks based on the + AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and TARGET_SIM
RE: [PATCH] Fix handling of OPT_mgeneral_regs_only in attribute.
Hi Martin > -Original Message- > From: Martin Liška > Sent: 21 May 2020 16:01 > To: gcc-patches@gcc.gnu.org > Cc: Sudakshina Das > Subject: [PATCH] Fix handling of OPT_mgeneral_regs_only in attribute. > > Hi. > > Similarly to: > > case OPT_mstrict_align: >if (val) > opts->x_target_flags |= MASK_STRICT_ALIGN; >else > opts->x_target_flags &= ~MASK_STRICT_ALIGN; >return true; > > the MASK_GENERAL_REGS_ONLY mask should be handled the same way. My old patch added the -mno-* version of the option and hence needed the change. Without the _no_ version for mgeneral-regs-only, I would imagine "val" to only ever have 1 as a value. Am I missing something here? Sudi > > @Sudakshina: The 'opts->x_target_flags |= MASK_STRICT_ALIGN' change is > not backported to all active branches. Can you please do it? > > Ready to be installed? > > gcc/ChangeLog: > > 2020-05-21 Martin Liska > > * common/config/aarch64/aarch64-common.c > (aarch64_handle_option): > Properly maask MASK_GENERAL_REGS_ONLY based on val. > --- > gcc/common/config/aarch64/aarch64-common.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) >
[Committed, testsuite] Fix PR92870
Hi With my recent commit, I added a test that is not passing on all targets. My change was valid for targets that have a vector/scalar shift/rotate optabs (optab that supports vector shifted by scalar). Since it does not seem to be easy to find out which targets would support it, I am limiting the test to the target that I know pass. Committed as obvious r279310. gcc/testsuite/ChangeLog 2019-12-12 Sudakshina Das PR testsuite/92870 * gcc.dg/vect/vect-shift-5.c: Add target to scan-tree-dump. diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c index c1fd4f2..68e517e 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c +++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c @@ -16,4 +16,7 @@ int foo (uint32_t arr[4][4]) return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1; } -/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */ +/* For a target that has a vector/scalar shift/rotate optab, check + that we are not adding the cost of creating a vector from the scalar + in the prologue. */ +/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" { target { aarch64*-*-* x86_64-*-* } } } } */
Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
Hi Christophe On 10/12/2019 09:01, Christophe Lyon wrote: > Hi, > > On Mon, 9 Dec 2019 at 11:23, Sudakshina Das wrote: >> >> Hi Jeff >> >> On 07/12/2019 17:44, Jeff Law wrote: >>> On Fri, 2019-12-06 at 14:05 +, Sudakshina Das wrote: >>>> Hi >>>> >>>> While looking at the vectorization for following example, we >>>> realized >>>> that even though vectorizable_shift function was distinguishing >>>> vector >>>> shifted by vector from vector shifted by scalar, while modeling the >>>> cost >>>> it would always add the cost of building a vector constant despite >>>> not >>>> needing it for vector shifted by scalar. >>>> >>>> This patch fixes this by using scalar_shift_arg to determine whether >>>> we >>>> need to build a vector for the second operand or not. This reduces >>>> prologue cost as shown in the test. >>>> >>>> Build and regression tests pass on aarch64-none-elf and >>>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in >>>> Spec2017 for AArch64. >>>> > > Looks like you didn't check on arm, where I can see that the new testcase > fails: > FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects > scan-tree-dump vect "vectorizable_shift > ===[\\n\\r][^\\n]*prologue_cost = 0" > FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect > "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0" > > Seen on arm-none-linux-gnueabihf > --with-mode arm > --with-cpu cortex-a9 > --with-fpu neon-fp16 > > Christophe Thanks for reporting this. There is already a bugzilla report PR92870 for powerpc that I am looking at. Apologies I couldn't find your email address there to add you to the cc list. Thanks Sudi > >>>> gcc/ChangeLog: >>>> >>>> 2019-xx-xx Sudakshina Das >>>> Richard Sandiford >>>> >>>> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for >>>> vect_model_simple_cost call on scalar_shift_arg. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> 2019-xx-xx Sudakshina Das >>>> >>>> * gcc.dg/vect/vect-shift-5.c: New test. >>> It's a bit borderline, but it's really just twiddling a cost, so OK. >> >> Thanks :) Committed as r279114. >> >> Sudi >> >>> >>> jeff >>> >>
Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
Hi Jeff On 07/12/2019 17:44, Jeff Law wrote: > On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote: >> Hi >> >> While looking at the vectorization for following example, we >> realized >> that even though vectorizable_shift function was distinguishing >> vector >> shifted by vector from vector shifted by scalar, while modeling the >> cost >> it would always add the cost of building a vector constant despite >> not >> needing it for vector shifted by scalar. >> >> This patch fixes this by using scalar_shift_arg to determine whether >> we >> need to build a vector for the second operand or not. This reduces >> prologue cost as shown in the test. >> >> Build and regression tests pass on aarch64-none-elf and >> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in >> Spec2017 for AArch64. >> >> gcc/ChangeLog: >> >> 2019-xx-xx Sudakshina Das >> Richard Sandiford >> >> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for >> vect_model_simple_cost call on scalar_shift_arg. >> >> gcc/testsuite/ChangeLog: >> >> 2019-xx-xx Sudakshina Das >> >> * gcc.dg/vect/vect-shift-5.c: New test. > It's a bit borderline, but it's really just twiddling a cost, so OK. Thanks :) Committed as r279114. Sudi > > jeff >
Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
Hi While looking at the vectorization for following example, we realized that even though vectorizable_shift function was distinguishing vector shifted by vector from vector shifted by scalar, while modeling the cost it would always add the cost of building a vector constant despite not needing it for vector shifted by scalar. This patch fixes this by using scalar_shift_arg to determine whether we need to build a vector for the second operand or not. This reduces prologue cost as shown in the test. Build and regression tests pass on aarch64-none-elf and x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in Spec2017 for AArch64. gcc/ChangeLog: 2019-xx-xx Sudakshina Das Richard Sandiford * tree-vect-stmt.c (vectorizable_shift): Condition ndts for vect_model_simple_cost call on scalar_shift_arg. gcc/testsuite/ChangeLog: 2019-xx-xx Sudakshina Das * gcc.dg/vect/vect-shift-5.c: New test. Is this ok for trunk? Thanks Sudi diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c new file mode 100644 index 000..c1fd4f2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_int } */ + +typedef unsigned int uint32_t; +typedef short unsigned int uint16_t; + +int foo (uint32_t arr[4][4]) +{ + int sum = 0; + for(int i = 0; i < 4; i++) +{ + sum += ((arr[0][i] >> 10) * 20) + ((arr[1][i] >> 11) & 53) + + ((arr[2][i] >> 12) * 7) + ((arr[3][i] >> 13) ^ 43); +} +return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1; +} + +/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */ diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 2cb6b15..396ff15 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -5764,7 +5764,8 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, { STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type; DUMP_VECT_SCOPE ("vectorizable_shift"); - vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, cost_vec); + vect_model_simple_cost (stmt_info, ncopies, dt, + scalar_shift_arg ? 1 : ndts, slp_node, cost_vec); return true; }
Re: [Patch, GCC] Fix a condition post r278611
Hi Richard On 05/12/2019 17:04, Richard Sandiford wrote: > Sudakshina Das writes: >> Hi >> >> While looking at vect_model_reduction_cost function, it seems Richard's >> change in a recent commit r278611 missed an update to the following if >> condition. Since the check for EXTRACT_LAST_REDUCTION is now split >> above, the same check in the if condition will never be true. >> >> gcc/ChangeLog >> >> 2019-xx-xx Sudakshina Das >> >> * tree-vect-loop.c (vect_model_reduction_cost): Remove >> reduction_type check from if condition. >> >> Is this ok for trunk? > > OK, thanks. Thanks. Committed as r279012. Sudi > > Richard > >> >> Thanks >> Sudi >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> index ca8c818..7469204 100644 >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -3933,7 +3933,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, >> internal_fn reduc_fn, >> /* No extra instructions needed in the prologue. */ >> prologue_cost = 0; >> >> - if (reduction_type == EXTRACT_LAST_REDUCTION || reduc_fn != IFN_LAST) >> + if (reduc_fn != IFN_LAST) >> /* Count one reduction-like operation per vector. */ >> inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar, >> stmt_info, 0, vect_body);
[Patch, GCC] Fix a condition post r278611
Hi While looking at vect_model_reduction_cost function, it seems Richard's change in a recent commit r278611 missed an update to the following if condition. Since the check for EXTRACT_LAST_REDUCTION is now split above, the same check in the if condition will never be true. gcc/ChangeLog 2019-xx-xx Sudakshina Das * tree-vect-loop.c (vect_model_reduction_cost): Remove reduction_type check from if condition. Is this ok for trunk? Thanks Sudi diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index ca8c818..7469204 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3933,7 +3933,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, internal_fn reduc_fn, /* No extra instructions needed in the prologue. */ prologue_cost = 0; - if (reduction_type == EXTRACT_LAST_REDUCTION || reduc_fn != IFN_LAST) + if (reduc_fn != IFN_LAST) /* Count one reduction-like operation per vector. */ inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar, stmt_info, 0, vect_body);
[Committed][Arm][testsuite] Fix failure for arm-fp16-ops-*.C
Hi Since r275022 which deprecates some uses of volatile, we have seen the following failures on arm-none-eabi and arm-none-linux-gnueabihf FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-1.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-2.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-5.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-6.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-7.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-8.C -std=gnu++2a (test for excess errors) Which catches the deprecated uses of volatile variables declared in arm-fp16-ops.h. This patch removes the volatile declarations from the header. Since none of the tests are run with any high optimization levels, this should change should not prevent the real function of the tests. Tests with RUNTESTFLAGS="dg.exp=arm-fp16-ops-*.C" now pass with the patch on arm-none-eabi. Committed as obvious r278905 gcc/testsuite/ChangeLog: 2019-xx-xx Sudakshina Das * g++.dg/ext/arm-fp16/arm-fp16-ops.h: Remove volatile keyword. Thanks Sudi diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h index 320494e..a92e081 100644 --- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h +++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h @@ -7,16 +7,16 @@ #define TEST(e) assert (e) #define TESTNOT(e) assert (!(e)) -volatile __fp16 h0 = 0.0; -volatile __fp16 h1 = 1.0; -volatile __fp16 h42 = 42.0; -volatile __fp16 hm2 = -2.0; -volatile __fp16 temp; - -volatile float f0 = 0.0; -volatile float f1 = 1.0; -volatile float f42 = 42.0; -volatile float fm2 = -2.0; +__fp16 h0 = 0.0; +__fp16 h1 = 1.0; +__fp16 h42 = 42.0; +__fp16 hm2 = -2.0; +__fp16 temp; + +float f0 = 0.0; +float f1 = 1.0; +float f42 = 42.0; +float fm2 = -2.0; int main (void) {
Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64
Hi Richard I apologise I should have given more explanation on my cover letter. Although the bug was filed for vectorization, the conversation on it talked about loops with two exits not being supported in the vectorizer and being not being possible without lto and peeling causing more harm than benefit. There was also no clear consensus among the discussion about the best way to do unrolling. So I looked at Wilco's suggestion of unrolling here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398#c8 Although unroll_stupid does not exactly unroll it as he shows but it gets closer than unroll_runtime_iterations. So I ran an experiment to see if unrolliong the loop with unroll_stupid gets any benefit. The code size benefit was easy to see with the small example but it also gave performance benefit on Spec2017. The benefit comes because unroll_runtime_iteration adds a switch case at the beginning for iteration check. This is less efficient because it creates too many branch close together specially for a loop which has more than 1 exit. beq .L70 cmp x12, 1 beq .L55 cmp x12, 2 beq .L57 cmp x12, 3 beq .L59 cmp x12, 4 beq .L61 cmp x12, 5 beq .L63 cmp x12, 6 bne .L72 Finally I agree that unroll_stupid by default did not touch loops with multiple exists but that was marked as a "TODO" to change later so I assumed that check was not a hard requirement for the unrolling alghorithm. /* Do not unroll loops with branches inside -- it increases number of mispredicts. TODO: this heuristic needs tunning; call inside the loop body is also relatively good reason to not unroll. */ unroll_stupid is also not touched unless there is -funroll-all-loops or a loop pragma incidcating that maybe this could be potentially harmful on certain targets. Since my experiments on AArch64 showed otherwise, I thought the easiest starting point would be to do this in a target hook and only for a specific case (multiple exits). Thanks Sudi From: Richard Biener Sent: Friday, November 15, 2019 9:32 AM To: Sudakshina Das Cc: gcc-patches@gcc.gnu.org ; Kyrill Tkachov ; James Greenhalgh ; Richard Earnshaw ; bin.ch...@linux.alibaba.com ; o...@ucw.cz Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64 On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das wrote: > > Hi > > This patch is trying to fix PR88398 for AArch64. As discussed in the PR, > loop unrolling is probably what we can do here. As an easy fix, the > existing unroll_stupid is unrolling the given example better than the > unroll_runtime_iterations since the the loop contains a break inside it. Hm, the bug reference doesn't help me at all in reviewing this - the bug is about vectorization. So why is unroll_stupid better than unroll_runtime_iterations for a loop with a break (or as your implementation, with multiple exists)? I don't like this target hook, it seems like general heuristics can be improved here, but it seems unroll-stupid doesn't consider loops with multiple exits at all? Richard. > So all I have done here is: > 1) Add a target hook so that this is AArch64 specific. > 2) We are not unrolling the loops that decide_unroll_runtime_iterations > would reject. > 3) Out of the ones that decide_unroll_runtime_iterations would accept, > check if the loop has more than 1 exit (this is done in the new target > hook) and if it does, try to unroll using unroll_stupid. > > Regression tested on AArch64 and added the test from the PR. This gives > an overall code size reduction of 2.35% and performance gain of 0.498% > on Spec2017 Intrate. > > Is this ok for trunk? > > Thanks > Sudi > > gcc/ChangeLog: > > 2019-xx-xx Sudakshina Das > > PR88398 > * cfgloop.h: Include target.h. > (lpt_dec): Move to... > * target.h (lpt_dec): ... Here. > * target.def: Define TARGET_LOOP_DECISION_ADJUST. > * loop-unroll.c (decide_unroll_runtime_iterations): Use new target > hook. > (decide_unroll_stupid): Likewise. > * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New > function. > (TARGET_LOOP_DECISION_ADJUST): Define for AArch64. > * doc/tm.texi: Regenerated. > * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST. > > gcc/testsuite/ChangeLog: > > 2019-xx-xx Sudakshina Das > > PR88398 > * gcc.target/aarch64/pr88398.c: New test.
[PATCH, GCC, AArch64] Fix PR88398 for AArch64
Hi This patch is trying to fix PR88398 for AArch64. As discussed in the PR, loop unrolling is probably what we can do here. As an easy fix, the existing unroll_stupid is unrolling the given example better than the unroll_runtime_iterations since the the loop contains a break inside it. So all I have done here is: 1) Add a target hook so that this is AArch64 specific. 2) We are not unrolling the loops that decide_unroll_runtime_iterations would reject. 3) Out of the ones that decide_unroll_runtime_iterations would accept, check if the loop has more than 1 exit (this is done in the new target hook) and if it does, try to unroll using unroll_stupid. Regression tested on AArch64 and added the test from the PR. This gives an overall code size reduction of 2.35% and performance gain of 0.498% on Spec2017 Intrate. Is this ok for trunk? Thanks Sudi gcc/ChangeLog: 2019-xx-xx Sudakshina Das PR88398 * cfgloop.h: Include target.h. (lpt_dec): Move to... * target.h (lpt_dec): ... Here. * target.def: Define TARGET_LOOP_DECISION_ADJUST. * loop-unroll.c (decide_unroll_runtime_iterations): Use new target hook. (decide_unroll_stupid): Likewise. * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New function. (TARGET_LOOP_DECISION_ADJUST): Define for AArch64. * doc/tm.texi: Regenerated. * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST. gcc/testsuite/ChangeLog: 2019-xx-xx Sudakshina Das PR88398 * gcc.target/aarch64/pr88398.c: New test. diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 0b0154ffd7bf031a005de993b101d9db6dd98c43..985c74e3b60728fc8c9d34b69634488cae3451cb 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -21,15 +21,7 @@ along with GCC; see the file COPYING3. If not see #define GCC_CFGLOOP_H #include "cfgloopmanip.h" - -/* Structure to hold decision about unrolling/peeling. */ -enum lpt_dec -{ - LPT_NONE, - LPT_UNROLL_CONSTANT, - LPT_UNROLL_RUNTIME, - LPT_UNROLL_STUPID -}; +#include "target.h" struct GTY (()) lpt_decision { enum lpt_dec decision; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 599d07a729e7438080f8b5240ee95037a49fb983..f31ac41d66257c01ead8d5f5b9b22379ecb5d276 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -21093,6 +21093,39 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn) } } +/* Implement TARGET_LOOP_DECISION_ADJUST. CONSIDER is the loop decision + currently being checked for loop LOOP. This returns a decision which could + either be LPT_UNROLL_STUPID or the current value in LOOP. */ +static enum lpt_dec +aarch64_loop_decision_adjust (enum lpt_dec consider, class loop *loop) +{ + switch (consider) +{ +case LPT_UNROLL_CONSTANT: + return loop->lpt_decision.decision; + +case LPT_UNROLL_RUNTIME: +/* Fall through. */ +case LPT_UNROLL_STUPID: + { + vec edges = get_loop_exit_edges (loop); + if (edges.length () > 1) + { + if (dump_file) + fprintf (dump_file, ";; Need change in loop decision\n"); + consider = LPT_UNROLL_STUPID; + return consider; + } + return loop->lpt_decision.decision; + } + +case LPT_NONE: +/* Fall through. */ +default: + gcc_unreachable (); +} +} + /* Implement TARGET_COMPUTE_PRESSURE_CLASSES. */ static int @@ -21839,6 +21872,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost +#undef TARGET_LOOP_DECISION_ADJUST +#define TARGET_LOOP_DECISION_ADJUST aarch64_loop_decision_adjust + #undef TARGET_SCHED_ADJUST_PRIORITY #define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index cd9aed9874f4e6b2b0e2f8956ed6155975e643a8..61bd00e84c8a2a8865e95ba579c3b94790ab1331 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11857,6 +11857,15 @@ is required only when the target has special constraints like maximum number of memory accesses. @end deftypefn +@deftypefn {Target Hook} {enum lpt_dec} TARGET_LOOP_DECISION_ADJUST (enum lpt_dec @var{consider}, class loop *@var{loop}) +This target hook returns either a new value for the loop unrolling +decision or the existing value in @var{loop}. The parameter @var{consider} +is the loop decision currently being tested. The parameter @var{loop} is a +pointer to the loop, which is going to be checked for unrolling. This target +hook is required only when the target wants to override the unrolling +decisions. +@end deftypefn + @defmac POWI_MAX_MULTS If defined, this macro is interpreted as a signed integer C expression that specifies the maximum number of floating point multiplications diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 2739e9ceec5ad7253ff9135da8dbe3bf6010e8d7..7a7f917fb45a6cc22f373ff16f8b78aa3e35f210 100644 --- a/gcc/
Re: [PATCH, GCC] Fix unrolling check.
On 11/11/2019 14:50, Eric Botcazou wrote: >> Thanks for the explanation. However, I do not understand why are we >> returning with the default value. > > The regression you reported should be clear enough though: if we don't do > that, we will unroll in cases where we would not have before. Try with a > compiler that predates the pragma and compare, there should be no changes. > >> What "do we always do"? > > What we do in the absence of specific unrolling directives for the loop. Yeah fair enough! Sorry for the trouble. Sudi >
Re: [PATCH, GCC] Fix unrolling check.
Hi Eric On 08/11/2019 19:16, Eric Botcazou wrote: >> I was fiddling around with the loop unrolling pass and noticed a check >> in decide_unroll_* functions (in the patch). The comment on top of this >> check says >> "/* If we were not asked to unroll this loop, just return back silently. >>*/" >> However the check returns when loop->unroll == 0 rather than 1. >> >> The check was added in r255106 where the ChangeLog suggests that the >> actual intention was probably to check the value 1 and not 0. > > No, this is intended, 0 is the default value of the field, not 1. And note > that decide_unroll_constant_iterations, decide_unroll_runtime_iterations and > decide_unroll_stupid *cannot* be called with loop->unroll == 1 because of this > check in decide_unrolling: Thanks for the explanation. However, I do not understand why are we returning with the default value. The comment for "unroll" is a bit ambiguous for value 0. /* The number of times to unroll the loop. 0 means no information given, just do what we always do. A value of 1 means do not unroll the loop. A value of USHRT_MAX means unroll with no specific unrolling factor. Other values means unroll with the given unrolling factor. */ unsigned short unroll; What "do we always do"? Thanks Sudi > >if (loop->unroll == 1) > { > if (dump_file) > fprintf (dump_file, >";; Not unrolling loop, user didn't want it unrolled\n"); > continue; > } > >> Tested on aarch64-none-elf with one new regression: >> FAIL: gcc.dg/pr40209.c (test for excess errors) >> This fails because the changes cause the loop to unroll 3 times using >> unroll_stupid and that shows up as excess error due -fopt-info. This >> option was added in r202077 but I am not sure why this particular test >> was chosen for it. > > That's a regression, there should be no unrolling. >
[PATCH, GCC] Fix unrolling check.
Hi I was fiddling around with the loop unrolling pass and noticed a check in decide_unroll_* functions (in the patch). The comment on top of this check says "/* If we were not asked to unroll this loop, just return back silently. */" However the check returns when loop->unroll == 0 rather than 1. The check was added in r255106 where the ChangeLog suggests that the actual intention was probably to check the value 1 and not 0. Tested on aarch64-none-elf with one new regression: FAIL: gcc.dg/pr40209.c (test for excess errors) This fails because the changes cause the loop to unroll 3 times using unroll_stupid and that shows up as excess error due -fopt-info. This option was added in r202077 but I am not sure why this particular test was chosen for it. Does this change look ok? Can I just remove the -fopt-info from the test or unrolling the loop in the test is not desirable? Thanks Sudi gcc/ChangeLog: 2019-11-07 Sudakshina Das * loop-unroll.c (decide_unroll_constant_iterations): Update condition to check loop->unroll. (decide_unroll_runtime_iterations): Likewise. (decide_unroll_stupid): Likewise. diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c index 63fccd23fae38f8918a7d94411aaa43c72830dd3..9f7ab4b5c1c9b2333148e452b84afbf040707456 100644 --- a/gcc/loop-unroll.c +++ b/gcc/loop-unroll.c @@ -354,7 +354,7 @@ decide_unroll_constant_iterations (class loop *loop, int flags) widest_int iterations; /* If we were not asked to unroll this loop, just return back silently. */ - if (!(flags & UAP_UNROLL) && !loop->unroll) + if (!(flags & UAP_UNROLL) && loop->unroll == 1) return; if (dump_enabled_p ()) @@ -674,7 +674,7 @@ decide_unroll_runtime_iterations (class loop *loop, int flags) widest_int iterations; /* If we were not asked to unroll this loop, just return back silently. */ - if (!(flags & UAP_UNROLL) && !loop->unroll) + if (!(flags & UAP_UNROLL) && loop->unroll == 1) return; if (dump_enabled_p ()) @@ -1159,7 +1159,7 @@ decide_unroll_stupid (class loop *loop, int flags) widest_int iterations; /* If we were not asked to unroll this loop, just return back silently. */ - if (!(flags & UAP_UNROLL_ALL) && !loop->unroll) + if (!(flags & UAP_UNROLL_ALL) && loop->unroll == 1) return; if (dump_enabled_p ())
[PATCH, GCC, AArch64] Enable Transactional Memory Extension
Hi This patch enables the new Transactional Memory Extension announced recently as part of Arm's new architecture technologies. We introduce a new optional extension "tme" to enable this. The following instructions are part of the extension: * tstart * ttest * tcommit * tcancel # The documentation for the above can be found here: https://developer.arm.com/docs/ddi0602/latest/base-instructions-alphabetic-order We have also added ACLE intrinsics for the instructions above according to: https://developer.arm.com/docs/101028/latest/transactional-memory-extension-tme-intrinsics Builds and regression tested on aarch64-none-linux-gnu and added new tests for the new instructions. Is this okay for trunk? Thanks Sudi *** gcc/ChangeLog *** 2019-xx-xx Sudakshina Das * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add AARCH64_TME_BUILTIN_TSTART, AARCH64_TME_BUILTIN_TCOMMIT, AARCH64_TME_BUILTIN_TTEST and AARCH64_TME_BUILTIN_TCANCEL. (aarch64_init_tme_builtins): New. (aarch64_init_builtins): Call aarch64_init_tme_builtins. (aarch64_expand_builtin_tme): New. (aarch64_expand_builtin): Handle TME builtins. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define __ARM_FEATURE_TME when enabled. * config/aarch64/aarch64-option-extensions.def: Add "tme". * config/aarch64/aarch64.h (AARCH64_FL_TME, AARCH64_ISA_TME): New. (TARGET_TME): New. * config/aarch64/aarch64.md (define_c_enum "unspec"): Add UNSPEC_TTEST. (define_c_enum "unspecv"): Add UNSPECV_TSTART, UNSPECV_TCOMMIT and UNSPECV_TCANCEL. (tstart, ttest, tcommit, tcancel): New instructions. * config/aarch64/arm_acle.h (__tstart, __tcommit): New. (__tcancel, __ttest): New. (_TMFAILURE_REASON, _TMFAILURE_RTRY, _TMFAILURE_CNCL): New macro. (_TMFAILURE_MEM, _TMFAILURE_IMP, _TMFAILURE_ERR): Likewise. (_TMFAILURE_SIZE, _TMFAILURE_NEST, _TMFAILURE_DBG): Likewise. (_TMFAILURE_INT, _TMFAILURE_TRIVIAL): Likewise. * config/arm/types.md: Add new tme type attr. * doc/invoke.texi: Document "tme". *** gcc/testsuite/ChangeLog *** 2019-xx-xx Sudakshina Das * gcc.target/aarch64/acle/tme.c: New test. * gcc.target/aarch64/pragma_cpp_predefs_2.c: New test. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 549a6c249243372eacb5d29923b5d1abce4ac79a..16c1d42ea2be0f477692be592e30ba8ce27f05a7 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -438,6 +438,11 @@ enum aarch64_builtins /* Special cased Armv8.3-A Complex FMA by Lane quad Builtins. */ AARCH64_SIMD_FCMLA_LANEQ_BUILTIN_BASE, AARCH64_SIMD_FCMLA_LANEQ_BUILTINS + /* TME builtins. */ + AARCH64_TME_BUILTIN_TSTART, + AARCH64_TME_BUILTIN_TCOMMIT, + AARCH64_TME_BUILTIN_TTEST, + AARCH64_TME_BUILTIN_TCANCEL, AARCH64_BUILTIN_MAX }; @@ -1067,6 +1072,35 @@ aarch64_init_pauth_hint_builtins (void) NULL_TREE); } +/* Initialize the transactional memory extension (TME) builtins. */ +static void +aarch64_init_tme_builtins (void) +{ + tree ftype_uint64_void += build_function_type_list (uint64_type_node, NULL); + tree ftype_void_void += build_function_type_list (void_type_node, NULL); + tree ftype_void_uint64 += build_function_type_list (void_type_node, uint64_type_node, NULL); + + aarch64_builtin_decls[AARCH64_TME_BUILTIN_TSTART] += add_builtin_function ("__builtin_aarch64_tstart", ftype_uint64_void, + AARCH64_TME_BUILTIN_TSTART, BUILT_IN_MD, + NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_TME_BUILTIN_TTEST] += add_builtin_function ("__builtin_aarch64_ttest", ftype_uint64_void, + AARCH64_TME_BUILTIN_TTEST, BUILT_IN_MD, + NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCOMMIT] += add_builtin_function ("__builtin_aarch64_tcommit", ftype_void_void, + AARCH64_TME_BUILTIN_TCOMMIT, BUILT_IN_MD, + NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCANCEL] += add_builtin_function ("__builtin_aarch64_tcancel", ftype_void_uint64, + AARCH64_TME_BUILTIN_TCANCEL, BUILT_IN_MD, + NULL, NULL_TREE); +} + void aarch64_init_builtins (void) { @@ -1104,6 +1138,9 @@ aarch64_init_builtins (void) register them. */ if (!TARGET_ILP32) aarch64_init_pauth_hint_builtins (); + + if (TARGET_TME) +aarch64_init_tme_builtins (); } tree @@ -1507,6 +1544,47 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode) return target; } +/* Function to expand an expression EXP which calls one of the Transactional + Memory Extension (TME) builtins FCODE with the result going to TARGET. */ +static rtx +aarch64_expand_builtin_tme (int fcode, tree exp, rtx target) +{
Re: [PATCH][AArch64] Make use of FADDP in simple reductions
Hi Elen Thank you for doing this. You will need a maintainer's approval but I would like to add a couple of comments. Please find them inline. On 08/05/2019 14:36, Elen Kalda wrote: > Hi, > > This patch adds a pattern to support the FADDP (scalar) instruction. > > Before the patch, the C code > > typedef double v2df __attribute__((vector_size (16))); > > double > foo (v2df x) > { >return x[1] + x[0]; > } > > generated: > foo: > dup d1, v0.d[0] > dup d0, v0.d[1] > faddd0, d1, d0 > ret > > After patch: > foo: > faddp d0, v0.2d > ret > > > Bootstrapped and done regression tests on aarch64-none-linux-gnu - > no issues found. > > Best wishes, > Elen > > > gcc/ChangeLog: > > 2019-04-24 Elen Kalda > > * config/aarch64/aarch64-simd.md (*aarch64_faddp): New. > > gcc/testsuite/ChangeLog: > > 2019-04-24 Elen Kalda > > * gcc.target/aarch64/simd/scalar_faddp.c: New test. > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index e3852c5d182b70978d7603225fce55c0b8ee2894..89fedc6cb3f0c6eb74c6f8d0b21cedb5ae20a095 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2372,6 +2372,21 @@ >[(set_attr "type" "neon_fp_reduc_add_")] > ) > > +(define_insn "*aarch64_faddp" > + [(set (match_operand: 0 "register_operand" "=w") > +(plus: > + (vec_select: (match_operand:VHSDF 1 "register_operand" "w") I do not think the VHSDF mode should be used here. I believe you may have taken this from the vector form of this instruction but that seems to be different than the scalar one. Someone with more floating point instruction experience can chime in here. > +(parallel[(match_operand 2 "const_int_operand" "n")])) > + (vec_select: (match_dup:VHSDF 1) > +(parallel[(match_operand 3 "const_int_operand" "n")]] > + "TARGET_SIMD > + && ((INTVAL (operands[2]) == 0 && INTVAL (operands[3]) == 1) Just some minor indentation issue. The && should be below T > +|| (INTVAL (operands[2]) == 1 && INTVAL (operands[3]) == 0))" Likewise this should be below the second opening brace '(' ... > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/simd/scalar_faddp.c > @@ -0,0 +1,31 @@ > +/* { dg-do assemble } */ This can be dg-do compile since you only want an assembly file > +/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */ > +/* { dg-add-options arm_v8_2a_fp16_scalar } */ > +/* { dg-additional-options "-save-temps -O1" } */ The --save-temps can then be removed as the dg-do compile will produce the .s file for you > +/* { dg-final { scan-assembler-not "dup" } } */ ... Thanks Sudi
RE: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.
Hi James -Original Message- From: James Greenhalgh Sent: 18 April 2019 09:56 To: Sudakshina Das Cc: Richard Henderson ; H.J. Lu ; Richard Henderson ; gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw ; Marcus Shawcroft ; ni...@redhat.com Subject: Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC. On Thu, Apr 04, 2019 at 05:01:06PM +0100, Sudakshina Das wrote: > Hi Richard > > On 03/04/2019 11:28, Richard Henderson wrote: > > On 4/3/19 5:19 PM, Sudakshina Das wrote: > >> + /* PT_NOTE header: namesz, descsz, type. > >> + namesz = 4 ("GNU\0") > >> + descsz = 16 (Size of the program property array) > >> + type = 5 (NT_GNU_PROPERTY_TYPE_0). */ > >> + assemble_align (POINTER_SIZE); > >> + assemble_integer (GEN_INT (4), 4, 32, 1); > >> + assemble_integer (GEN_INT (16), 4, 32, 1); > > > > So, it's 16 only if POINTER_SIZE == 64. > > > > I think ROUND_UP (12, POINTER_BYTES) is what you want here. > > > > > Ah yes. I have made that change now. This is OK, but instead of: > diff --git a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c > b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c > index > e8e3cdac51350b545e5c2a644a3e1f4d1c37f88d..1fe92ff08935d4c6f08affcbd77e > a91537030640 100644 > --- a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c > @@ -4,7 +4,9 @@ > int > f (int a, ...) > { > - /* { dg-final { scan-assembler-not "str" } } */ > + /* Fails on aarch64*-*-linux* if configured with > +--enable-standard-branch-protection because of the GNU NOTE > + section. */ > + /* { dg-final { scan-assembler-not "str" { target { ! > + aarch64*-*-linux* } || { ! default_branch_protection } } } } */ >return a; > } > Can you just change the regex to check for str followed by a tab, or > something that looks else which looks like the instruction and doesn't match > against 'string'. >Thanks, >James Ah yes, I have reduced the diff in this test to only update the scan directive to look for 'str\t' instead. Committed as r270515. Thanks Sudi > > Thanks > Sudi > > > > > r~ > > >
Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.
Ping. On 04/04/2019 17:01, Sudakshina Das wrote: > Hi Richard > > On 03/04/2019 11:28, Richard Henderson wrote: >> On 4/3/19 5:19 PM, Sudakshina Das wrote: >>> + /* PT_NOTE header: namesz, descsz, type. >>> + namesz = 4 ("GNU\0") >>> + descsz = 16 (Size of the program property array) >>> + type = 5 (NT_GNU_PROPERTY_TYPE_0). */ >>> + assemble_align (POINTER_SIZE); >>> + assemble_integer (GEN_INT (4), 4, 32, 1); >>> + assemble_integer (GEN_INT (16), 4, 32, 1); >> >> So, it's 16 only if POINTER_SIZE == 64. >> >> I think ROUND_UP (12, POINTER_BYTES) is what you want here. >> > > > Ah yes. I have made that change now. > > Thanks > Sudi > >> >> r~ >> >
Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.
Hi Richard On 03/04/2019 11:28, Richard Henderson wrote: > On 4/3/19 5:19 PM, Sudakshina Das wrote: >> + /* PT_NOTE header: namesz, descsz, type. >> + namesz = 4 ("GNU\0") >> + descsz = 16 (Size of the program property array) >> + type = 5 (NT_GNU_PROPERTY_TYPE_0). */ >> + assemble_align (POINTER_SIZE); >> + assemble_integer (GEN_INT (4), 4, 32, 1); >> + assemble_integer (GEN_INT (16), 4, 32, 1); > > So, it's 16 only if POINTER_SIZE == 64. > > I think ROUND_UP (12, POINTER_BYTES) is what you want here. > Ah yes. I have made that change now. Thanks Sudi > > r~ > diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h index 9d0292d64f20939ccedd7ab56027aa1282826b23..5e8b34ded03c78493f868e38647bf57c2da5187c 100644 --- a/gcc/config/aarch64/aarch64-linux.h +++ b/gcc/config/aarch64/aarch64-linux.h @@ -83,7 +83,7 @@ #define GNU_USER_TARGET_D_CRITSEC_SIZE 48 -#define TARGET_ASM_FILE_END file_end_indicate_exec_stack +#define TARGET_ASM_FILE_END aarch64_file_end_indicate_exec_stack /* Uninitialized common symbols in non-PIE executables, even with strong definitions in dependent shared libraries, will resolve diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b38505b0872688634b2d3f625ab8d313e89cfca0..83b8ef84808c19fa1214fa06c32957936f5eb520 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -18744,6 +18744,57 @@ aarch64_stack_protect_guard (void) return NULL_TREE; } +/* Implement TARGET_ASM_FILE_END for AArch64. This adds the AArch64 GNU NOTE + section at the end if needed. */ +#define GNU_PROPERTY_AARCH64_FEATURE_1_AND 0xc000 +#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1U << 0) +#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC (1U << 1) +void +aarch64_file_end_indicate_exec_stack () +{ + file_end_indicate_exec_stack (); + + unsigned feature_1_and = 0; + if (aarch64_bti_enabled ()) +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI; + + if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE) +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC; + + if (feature_1_and) +{ + /* Generate .note.gnu.property section. */ + switch_to_section (get_section (".note.gnu.property", + SECTION_NOTYPE, NULL)); + + /* PT_NOTE header: namesz, descsz, type. + namesz = 4 ("GNU\0") + descsz = 16 (Size of the program property array) + [(12 + padding) * Number of array elements] + type = 5 (NT_GNU_PROPERTY_TYPE_0). */ + assemble_align (POINTER_SIZE); + assemble_integer (GEN_INT (4), 4, 32, 1); + assemble_integer (GEN_INT (ROUND_UP (12, POINTER_BYTES)), 4, 32, 1); + assemble_integer (GEN_INT (5), 4, 32, 1); + + /* PT_NOTE name. */ + assemble_string ("GNU", 4); + + /* PT_NOTE contents for NT_GNU_PROPERTY_TYPE_0: + type = GNU_PROPERTY_AARCH64_FEATURE_1_AND + datasz = 4 + data = feature_1_and. */ + assemble_integer (GEN_INT (GNU_PROPERTY_AARCH64_FEATURE_1_AND), 4, 32, 1); + assemble_integer (GEN_INT (4), 4, 32, 1); + assemble_integer (GEN_INT (feature_1_and), 4, 32, 1); + + /* Pad the size of the note to the required alignment. */ + assemble_align (POINTER_SIZE); +} +} +#undef GNU_PROPERTY_AARCH64_FEATURE_1_PAC +#undef GNU_PROPERTY_AARCH64_FEATURE_1_BTI +#undef GNU_PROPERTY_AARCH64_FEATURE_1_AND /* Target-specific selftests. */ diff --git a/gcc/testsuite/gcc.target/aarch64/bti-1.c b/gcc/testsuite/gcc.target/aarch64/bti-1.c index a8c60412e310a4f322372f334ae5314f426d310e..5a556b08ed15679b25676a11fe9c7a64641ee671 100644 --- a/gcc/testsuite/gcc.target/aarch64/bti-1.c +++ b/gcc/testsuite/gcc.target/aarch64/bti-1.c @@ -61,3 +61,4 @@ lab2: } /* { dg-final { scan-assembler-times "hint\t34" 1 } } */ /* { dg-final { scan-assembler-times "hint\t36" 12 } } */ +/* { dg-final { scan-assembler ".note.gnu.property" { target *-*-linux* } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c index e8e3cdac51350b545e5c2a644a3e1f4d1c37f88d..1fe92ff08935d4c6f08affcbd77ea91537030640 100644 --- a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c +++ b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c @@ -4,7 +4,9 @@ int f (int a, ...) { - /* { dg-final { scan-assembler-not "str" } } */ + /* Fails on aarch64*-*-linux* if configured with +--enable-standard-branch-protection because of the GNU NOTE section. */ + /* { dg-final { scan-assembler-not "str" { target { ! aarch64*-*-linux* } || { ! default_branch_protection } } } } */ return a; }
Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.
Hi Richard On 02/04/2019 10:25, Sudakshina Das wrote: > Hi > > On 02/04/2019 03:27, H.J. Lu wrote: >> On Tue, Apr 2, 2019 at 10:05 AM Richard Henderson >> wrote: >>> >>> On 4/1/19 8:53 PM, Sudakshina Das wrote: >>>>> This could stand to use a comment, a moment's thinking about the >>>>> sizes, and to >>>>> use the existing asm output functions. >>>>> >>>>> /* PT_NOTE header: namesz, descsz, type. >>>>> namesz = 4 ("GNU\0") >>>>> descsz = 12 (see below) >>>> I was trying out these changes but the descsz of 12 gets rejected by >>>> readelf. It hits the following >>>> >>>> unsigned int size = is_32bit_elf ? 4 : 8; >>>> >>>> printf (_(" Properties: ")); >>>> >>>> if (pnote->descsz < 8 || (pnote->descsz % size) != 0) >>>> { >>>> printf (_("\n"), >>>> pnote->descsz); >>>> return; >>>> } >>> >>> Hmm, interesting. The docs say that padding is not to be included in >>> descsz >>> (gabi4.1, page 82). To my eye this is a bug in binutils, but perhaps >>> we will >>> have to live with it. >>> >>> Nick, thoughts? >> >> descsz is wrong. From: >> >> https://github.com/hjl-tools/linux-abi/wiki/Linux-Extensions-to-gABI >> >> n_desc The note descriptor. The first n_descsz bytes in n_desc is the >> pro- >> gram property array. >> >> The program property array >> Each array element represents one program property with type, data >> size and data. >> In 64-bit objects, each element is an array of 8-byte integers in the >> format of the >> target processor. In 32-bit objects, each element is an array of >> 4-byte integers in >> the format of the target processor. > > Thanks @HJ for clarifying that. I should have been more careful in > spotting the difference. > > @Richard I will update my patch according to your suggestions but > keeping in mind decssz should be the size of the entire program property > array so 16 in this case. > I have updated the patch as per your suggestions. The Changelog is still valid from my original patch. Thanks Sudi > Thanks > Sudi >> >> > diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h index 9d0292d64f20939ccedd7ab56027aa1282826b23..5e8b34ded03c78493f868e38647bf57c2da5187c 100644 --- a/gcc/config/aarch64/aarch64-linux.h +++ b/gcc/config/aarch64/aarch64-linux.h @@ -83,7 +83,7 @@ #define GNU_USER_TARGET_D_CRITSEC_SIZE 48 -#define TARGET_ASM_FILE_END file_end_indicate_exec_stack +#define TARGET_ASM_FILE_END aarch64_file_end_indicate_exec_stack /* Uninitialized common symbols in non-PIE executables, even with strong definitions in dependent shared libraries, will resolve diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b38505b0872688634b2d3f625ab8d313e89cfca0..f25f7da8f0224167db68e61a2ba88f0943316360 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -18744,6 +18744,56 @@ aarch64_stack_protect_guard (void) return NULL_TREE; } +/* Implement TARGET_ASM_FILE_END for AArch64. This adds the AArch64 GNU NOTE + section at the end if needed. */ +#define GNU_PROPERTY_AARCH64_FEATURE_1_AND 0xc000 +#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1U << 0) +#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC (1U << 1) +void +aarch64_file_end_indicate_exec_stack () +{ + file_end_indicate_exec_stack (); + + unsigned feature_1_and = 0; + if (aarch64_bti_enabled ()) +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI; + + if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE) +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC; + + if (feature_1_and) +{ + /* Generate .note.gnu.property section. */ + switch_to_section (get_section (".note.gnu.property", + SECTION_NOTYPE, NULL)); + + /* PT_NOTE header: namesz, descsz, type. + namesz = 4 ("GNU\0") + descsz = 16 (Size of the program property array) + type = 5 (NT_GNU_PROPERTY_TYPE_0). */ + assemble_align (POINTER_SIZE); + assemble_integer (GEN_INT (4), 4, 32, 1); + assemble_integer (GEN_INT (16), 4, 32, 1); + assemble_integer (GEN_INT (5), 4, 32, 1); + + /* PT_NOTE name. */ + assemble_string ("GNU", 4); + + /* PT_NOTE contents for NT_GNU_PROPERTY_TYPE_0: + type = GNU_PROPERTY_AARCH64_FEATURE_1_AND + datasz = 4 + data = feature_1_and. */ + assemble_
Re: [PATCH, GCC, DOCS, AArch64] Add missing documenation for mbranch-protection
Hi Sandra On 02/04/2019 16:32, Sandra Loosemore wrote: > On 4/2/19 6:45 AM, Sudakshina Das wrote: >> Hi >> >> This patch add the missing documentation bits for -mbranch-protection in >> both extend.texi and invoke.texi. >> >> Is this ok for trunk? >> >> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >> index >> ef7adb6a9c0fe1abd769e237fd8d0ce4c614aef8..7e1c28182138aeba163e50f5b7ed60812c1dfe27 >> >> 100644 >> --- a/gcc/doc/extend.texi >> +++ b/gcc/doc/extend.texi >> @@ -3925,7 +3925,15 @@ same as for the @option{-mcpu=} command-line >> option. >> @cindex @code{sign-return-address} function attribute, AArch64 >> Select the function scope on which return address signing will be >> applied. The >> behavior and permissible arguments are the same as for the >> command-line option >> -@option{-msign-return-address=}. The default value is @code{none}. >> +@option{-msign-return-address=}. The default value is @code{none}. >> This >> +attribute is @code{deprecated}. The @code{branch-protection} >> attribute should >> +be used instead. > > s/@code{deprecated}/deprecated/ > > The patch is OK with that tweak. Thanks. I have made the change and committed as r270119. Sudi > > -Sandra
[PATCH, GCC, DOCS, AArch64] Add missing documenation for mbranch-protection
Hi This patch add the missing documentation bits for -mbranch-protection in both extend.texi and invoke.texi. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2019-xx-xx Sudakshina Das * doc/extend.texi: Add deprecated comment on sign-return-address function attribute and add mbranch-protection. * doc/invoke.texi: Add bti to the options for mbranch-protection. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ef7adb6a9c0fe1abd769e237fd8d0ce4c614aef8..7e1c28182138aeba163e50f5b7ed60812c1dfe27 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -3925,7 +3925,15 @@ same as for the @option{-mcpu=} command-line option. @cindex @code{sign-return-address} function attribute, AArch64 Select the function scope on which return address signing will be applied. The behavior and permissible arguments are the same as for the command-line option -@option{-msign-return-address=}. The default value is @code{none}. +@option{-msign-return-address=}. The default value is @code{none}. This +attribute is @code{deprecated}. The @code{branch-protection} attribute should +be used instead. + +@item branch-protection +@cindex @code{branch-protection} function attribute, AArch64 +Select the function scope on which branch protection will be applied. The +behavior and permissible arguments are the same as for the command-line option +@option{-mbranch-protection=}. The default value is @code{none}. @end table diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 029b33a688060a558bb7b78312f090c64e6d0a4a..27b51aaab99680180f46383e5a4b22f7f3ceea91 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -632,7 +632,7 @@ Objective-C and Objective-C++ Dialects}. -mlow-precision-recip-sqrt -mlow-precision-sqrt -mlow-precision-div @gol -mpc-relative-literal-loads @gol -msign-return-address=@var{scope} @gol --mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}] @gol +-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]|@var{bti} @gol -march=@var{name} -mcpu=@var{name} -mtune=@var{name} @gol -moverride=@var{string} -mverbose-cost-dump @gol -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol @@ -15884,7 +15884,7 @@ functions, and @samp{all}, which enables pointer signing for all functions. The default value is @samp{none}. This option has been deprecated by -mbranch-protection. -@item -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}] +@item -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]|@var{bti} @opindex mbranch-protection Select the branch protection features to use. @samp{none} is the default and turns off all types of branch protection.
Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.
Hi On 02/04/2019 03:27, H.J. Lu wrote: > On Tue, Apr 2, 2019 at 10:05 AM Richard Henderson wrote: >> >> On 4/1/19 8:53 PM, Sudakshina Das wrote: >>>> This could stand to use a comment, a moment's thinking about the sizes, >>>> and to >>>> use the existing asm output functions. >>>> >>>> /* PT_NOTE header: namesz, descsz, type. >>>> namesz = 4 ("GNU\0") >>>> descsz = 12 (see below) >>> I was trying out these changes but the descsz of 12 gets rejected by >>> readelf. It hits the following >>> >>> unsigned intsize = is_32bit_elf ? 4 : 8; >>> >>> printf (_(" Properties: ")); >>> >>> if (pnote->descsz < 8 || (pnote->descsz % size) != 0) >>> { >>> printf (_("\n"), >>> pnote->descsz); >>> return; >>> } >> >> Hmm, interesting. The docs say that padding is not to be included in descsz >> (gabi4.1, page 82). To my eye this is a bug in binutils, but perhaps we will >> have to live with it. >> >> Nick, thoughts? > > descsz is wrong. From: > > https://github.com/hjl-tools/linux-abi/wiki/Linux-Extensions-to-gABI > > n_desc The note descriptor. The first n_descsz bytes in n_desc is the pro- > gram property array. > > The program property array > Each array element represents one program property with type, data > size and data. > In 64-bit objects, each element is an array of 8-byte integers in the > format of the > target processor. In 32-bit objects, each element is an array of > 4-byte integers in > the format of the target processor. Thanks @HJ for clarifying that. I should have been more careful in spotting the difference. @Richard I will update my patch according to your suggestions but keeping in mind decssz should be the size of the entire program property array so 16 in this case. Thanks Sudi > >
Re: [PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9
Hi James On 29/03/2019 13:41, Sudakshina Das wrote: > Hi James > > On 22/03/2019 16:25, James Greenhalgh wrote: >> On Wed, Mar 20, 2019 at 10:17:41AM +0000, Sudakshina Das wrote: >>> Hi Kyrill >>> >>> On 12/03/2019 12:03, Kyrill Tkachov wrote: >>>> Hi Sudi, >>>> >>>> On 2/22/19 10:45 AM, Sudakshina Das wrote: >>>>> Hi >>>>> >>>>> This patch documents the addition of the new Armv8.5-A and >>>>> corresponding >>>>> extensions in the gcc-9/changes.html. >>>>> As per https://gcc.gnu.org/about.html, I have used W3 Validator. >>>>> Is this ok for cvs? >>>>> >>>>> Thanks >>>>> Sudi >>>> >>>> >>>> Index: htdocs/gcc-9/changes.html >>>> === >>>> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v >>>> retrieving revision 1.43 >>>> diff -u -r1.43 changes.html >>>> --- htdocs/gcc-9/changes.html 21 Feb 2019 10:32:55 - 1.43 >>>> +++ htdocs/gcc-9/changes.html 21 Feb 2019 18:25:09 - >>>> @@ -283,6 +283,19 @@ >>>> >>>> The intrinsics are defined by the ACLE specification. >>>> >>>> + >>>> + The Armv8.5-A architecture is now supported. This can be used by >>>> specifying the >>>> + -march=armv8.5-a option. >>>> >>>> >>>> I tend to prefer the wording "... is now supported through the >>>> -march=armv8.5-a option". >>>> Otherwise it reads as the compiler "using" the architecture, whereas we >>>> usually talk about "targeting" an architecture. >>>> >>>> + >>>> + The Armv8.5-A architecture also adds some security features >>>> that >>>> are optional to all older >>>> + architecture versions. These are also supported now and only >>>> effect >>>> the assembler. >>>> + >>>> + Speculation Barrier instruction using >>>> -march=armv8-a+sb. >>>> + Execution and Data Prediction Restriction instructions using >>>> -march=armv8-a+predres. >>>> + Speculative Store Bypass Safe instruction using >>>> -march=armv8-a+ssbs. This does not >>>> + require a compiler option for Arm and thus >>>> -march=armv8-a+ssbs is a AArch64 specific option. >>>> >>>> "AArch64-specific" >>>> >>>> >>>> LGTM otherwise. >>>> Thanks, >>>> Kyrill >>> >>> Thanks for the review and sorry for the delay in response. I had edited >>> the language for adding new options in a few other places as well. >>> >>> + The Armv8.5-A architecture also adds some security features >>> that are >>> + optional to all older architecture versions. These are also >>> supported now >> >> s/also supported now/now supported/ >> >>> + and only effect the assembler. >> >> s/effect/affect/ >> >>> + >>> + Speculation Barrier instruction through the >>> + -march=armv8-a+sb option. >>> + Execution and Data Prediction Restriction instructions through >>> + the -march=armv8-a+predres option. >>> + Speculative Store Bypass Safe instruction through the >>> + -march=armv8-a+ssbs option. This does not >>> require a >>> + compiler option for Arm and thus >>> -march=armv8-a+ssbs >>> + is an AArch64-specific option. >>> + >>> + >>> >>> AArch64 specific >>> @@ -362,6 +380,23 @@ >>> The default value is 16 (64Kb) and can be changed at configure >>> time using the flag >>> --with-stack-clash-protection-guard-size=12|16. >>> >>> + >>> + The option -msign-return-address= has been >>> deprecated. This >>> + has been replaced by the new -mbranch-protection= >>> option. This >>> + new option can now be used to enable the return address signing >>> as well as >>> + the new Branch Target Identification feature of Armv8.5-A >>> architecture. For >>> + more information on the arguments accepted by this option, >>> please refer to >>> + >> href="https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;>AArch64-Options. >>> >>> >>> + >>> + The following optional extensions to Armv8.5-A architecture >>> are also >>> + supported now and only effect the assembler. >> >> s/effect/affect/ >> >>> + >>> + Random Number Generation instructions through the >>> + -march=armv8.5-a+rng option. >>> + Memory Tagging Extension through the >>> + -march=armv8.5-a+memtag option. >>> + >>> + >>> >>> Arm specific >> >> Otherwise, OK by me but feel free to wait for people with gooder >> grammar than me to have their say. >> > > Thanks for spotting those. So far no one else with gooder grammar has > pointed out anything else. I will commit the patch with the changes you > suggested on Monday if no one else has any other objections. > Committed as 1.56 Thanks Sudi > Thanks > Sudi > >> Thanks, >> James >> >
Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.
Hi Richard Thanks for the comments and pointing out the much cleaner existing asm output functions! On 29/03/2019 17:51, Richard Henderson wrote: >> +#define ASM_LONG "\t.long\t" > > Do not replicate targetm.asm_out.aligned_op.si, or integer_asm_op, really. > >> +aarch64_file_end_indicate_exec_stack () >> +{ >> + file_end_indicate_exec_stack (); >> + >> + if (!aarch64_bti_enabled () >> + && aarch64_ra_sign_scope == AARCH64_FUNCTION_NONE) >> +{ >> + return; >> +} > > This is redundant with... > >> + >> + unsigned feature_1_and = 0; >> + if (aarch64_bti_enabled ()) >> +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI; >> + >> + if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE) >> +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC; >> + >> + if (feature_1_and) > > ... this. I prefer the second, as it's obvious. > >> + ASM_OUTPUT_ALIGN (asm_out_file, p2align); >> + /* name length. */ >> + fprintf (asm_out_file, ASM_LONG " 1f - 0f\n"); >> + /* data length. */ >> + fprintf (asm_out_file, ASM_LONG " 4f - 1f\n"); >> + /* note type: NT_GNU_PROPERTY_TYPE_0. */ >> + fprintf (asm_out_file, ASM_LONG " 5\n"); >> + fprintf (asm_out_file, "0:\n"); >> + /* vendor name: "GNU". */ >> + fprintf (asm_out_file, STRING_ASM_OP " \"GNU\"\n"); >> + fprintf (asm_out_file, "1:\n"); >> + ASM_OUTPUT_ALIGN (asm_out_file, p2align); >> + /* pr_type: GNU_PROPERTY_AARCH64_FEATURE_1_AND. */ >> + fprintf (asm_out_file, ASM_LONG " 0x%x\n", >> + GNU_PROPERTY_AARCH64_FEATURE_1_AND); >> + /* pr_datasz. */\ >> + fprintf (asm_out_file, ASM_LONG " 3f - 2f\n"); >> + fprintf (asm_out_file, "2:\n"); >> + /* GNU_PROPERTY_AARCH64_FEATURE_1_XXX. */ >> + fprintf (asm_out_file, ASM_LONG " 0x%x\n", feature_1_and); >> + fprintf (asm_out_file, "3:\n"); >> + ASM_OUTPUT_ALIGN (asm_out_file, p2align); >> + fprintf (asm_out_file, "4:\n"); > > This could stand to use a comment, a moment's thinking about the sizes, and to > use the existing asm output functions. > > /* PT_NOTE header: namesz, descsz, type. > namesz = 4 ("GNU\0") > descsz = 12 (see below) I was trying out these changes but the descsz of 12 gets rejected by readelf. It hits the following unsigned intsize = is_32bit_elf ? 4 : 8; printf (_(" Properties: ")); if (pnote->descsz < 8 || (pnote->descsz % size) != 0) { printf (_("\n"), pnote->descsz); return; } Thanks Sudi > type = 5 (NT_GNU_PROPERTY_TYPE_0). */ > assemble_align (POINTER_SIZE); > assemble_integer (GEN_INT (4), 4, 32, 1); > assemble_integer (GEN_INT (12), 4, 32, 1); > assemble_integer (GEN_INT (5), 4, 32, 1); > > /* PT_NOTE name */ > assemble_string ("GNU", 4); > > /* PT_NOTE contents for NT_GNU_PROPERTY_TYPE_0: > type = 0xc000 (GNU_PROPERTY_AARCH64_FEATURE_1_AND), > datasz = 4 > data = feature_1_and > Note that the current section offset is 16, > and there has been no padding so far. */ > assemble_integer (GEN_INT (0xc000), 4, 32, 1); > assemble_integer (GEN_INT (4), 4, 32, 1); > assemble_integer (GEN_INT (feature_1_and), 4, 32, 1); > > /* Pad the size of the note to the required alignment. */ > assemble_align (POINTER_SIZE); > > > r~ >
[PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.
Hi This patch adds the GNU NOTE section to the BTI and/or PAC enabled objects for linux targets. The ABI document that we published mentioning GNU NOTE section is below: https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2018q4 The patches needed for these in binutils are already approved and committed. https://sourceware.org/ml/binutils/2019-03/msg00072.html Bootstrapped and regression tested with aarch64-none-linux-gnu. Is this ok for trunk? Thanks Sudi *** gcc/ChangeLog *** 2018-xx-xx Sudakshina Das * config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Define for AArch64. (aarch64_file_end_indicate_exec_stack): Add gnu note section. gcc/testsuite/ChangeLog: 2018-xx-xx Sudakshina Das * gcc.target/aarch64/bti-1.c: Add scan directive for gnu note section for linux targets. * gcc.target/aarch64/va_arg_1.c: Don't run for aarch64 linux targets with --enable-standard-branch-protection. diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h index 9d0292d64f20939ccedd7ab56027aa1282826b23..5e8b34ded03c78493f868e38647bf57c2da5187c 100644 --- a/gcc/config/aarch64/aarch64-linux.h +++ b/gcc/config/aarch64/aarch64-linux.h @@ -83,7 +83,7 @@ #define GNU_USER_TARGET_D_CRITSEC_SIZE 48 -#define TARGET_ASM_FILE_END file_end_indicate_exec_stack +#define TARGET_ASM_FILE_END aarch64_file_end_indicate_exec_stack /* Uninitialized common symbols in non-PIE executables, even with strong definitions in dependent shared libraries, will resolve diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b38505b0872688634b2d3f625ab8d313e89cfca0..d616c8360b396ebe3ab2ac0fb799b30830df2b3e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -18744,6 +18744,67 @@ aarch64_stack_protect_guard (void) return NULL_TREE; } +/* Implement TARGET_ASM_FILE_END for AArch64. This adds the AArch64 GNU NOTE + section at the end if needed. */ +#define ASM_LONG "\t.long\t" +#define GNU_PROPERTY_AARCH64_FEATURE_1_AND 0xc000 +#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1U << 0) +#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC (1U << 1) +void +aarch64_file_end_indicate_exec_stack () +{ + file_end_indicate_exec_stack (); + + if (!aarch64_bti_enabled () + && aarch64_ra_sign_scope == AARCH64_FUNCTION_NONE) +{ + return; +} + + unsigned feature_1_and = 0; + if (aarch64_bti_enabled ()) +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI; + + if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE) +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC; + + if (feature_1_and) +{ + int p2align = ptr_mode == SImode ? 2 : 3; + + /* Generate GNU_PROPERTY_AARCH64_FEATURE_1_XXX. */ + switch_to_section (get_section (".note.gnu.property", + SECTION_NOTYPE, NULL)); + + ASM_OUTPUT_ALIGN (asm_out_file, p2align); + /* name length. */ + fprintf (asm_out_file, ASM_LONG " 1f - 0f\n"); + /* data length. */ + fprintf (asm_out_file, ASM_LONG " 4f - 1f\n"); + /* note type: NT_GNU_PROPERTY_TYPE_0. */ + fprintf (asm_out_file, ASM_LONG " 5\n"); + fprintf (asm_out_file, "0:\n"); + /* vendor name: "GNU". */ + fprintf (asm_out_file, STRING_ASM_OP " \"GNU\"\n"); + fprintf (asm_out_file, "1:\n"); + ASM_OUTPUT_ALIGN (asm_out_file, p2align); + /* pr_type: GNU_PROPERTY_AARCH64_FEATURE_1_AND. */ + fprintf (asm_out_file, ASM_LONG " 0x%x\n", + GNU_PROPERTY_AARCH64_FEATURE_1_AND); + /* pr_datasz. */\ + fprintf (asm_out_file, ASM_LONG " 3f - 2f\n"); + fprintf (asm_out_file, "2:\n"); + /* GNU_PROPERTY_AARCH64_FEATURE_1_XXX. */ + fprintf (asm_out_file, ASM_LONG " 0x%x\n", feature_1_and); + fprintf (asm_out_file, "3:\n"); + ASM_OUTPUT_ALIGN (asm_out_file, p2align); + fprintf (asm_out_file, "4:\n"); +} +} +#undef GNU_PROPERTY_AARCH64_FEATURE_1_PAC +#undef GNU_PROPERTY_AARCH64_FEATURE_1_BTI +#undef GNU_PROPERTY_AARCH64_FEATURE_1_AND +#undef ASM_LONG /* Target-specific selftests. */ diff --git a/gcc/testsuite/gcc.target/aarch64/bti-1.c b/gcc/testsuite/gcc.target/aarch64/bti-1.c index a8c60412e310a4f322372f334ae5314f426d310e..5a556b08ed15679b25676a11fe9c7a64641ee671 100644 --- a/gcc/testsuite/gcc.target/aarch64/bti-1.c +++ b/gcc/testsuite/gcc.target/aarch64/bti-1.c @@ -61,3 +61,4 @@ lab2: } /* { dg-final { scan-assembler-times "hint\t34" 1 } } */ /* { dg-final { scan-assembler-times "hint\t36" 12 } } */ +/* { dg-final { scan-assembler ".note.gnu.property" { target *-*-linux* } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c index e8e
Re: [PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9
Hi James On 22/03/2019 16:25, James Greenhalgh wrote: > On Wed, Mar 20, 2019 at 10:17:41AM +0000, Sudakshina Das wrote: >> Hi Kyrill >> >> On 12/03/2019 12:03, Kyrill Tkachov wrote: >>> Hi Sudi, >>> >>> On 2/22/19 10:45 AM, Sudakshina Das wrote: >>>> Hi >>>> >>>> This patch documents the addition of the new Armv8.5-A and corresponding >>>> extensions in the gcc-9/changes.html. >>>> As per https://gcc.gnu.org/about.html, I have used W3 Validator. >>>> Is this ok for cvs? >>>> >>>> Thanks >>>> Sudi >>> >>> >>> Index: htdocs/gcc-9/changes.html >>> === >>> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v >>> retrieving revision 1.43 >>> diff -u -r1.43 changes.html >>> --- htdocs/gcc-9/changes.html 21 Feb 2019 10:32:55 - 1.43 >>> +++ htdocs/gcc-9/changes.html 21 Feb 2019 18:25:09 - >>> @@ -283,6 +283,19 @@ >>> >>> The intrinsics are defined by the ACLE specification. >>> >>> + >>> + The Armv8.5-A architecture is now supported. This can be used by >>> specifying the >>> + -march=armv8.5-a option. >>> >>> >>> I tend to prefer the wording "... is now supported through the >>> -march=armv8.5-a option". >>> Otherwise it reads as the compiler "using" the architecture, whereas we >>> usually talk about "targeting" an architecture. >>> >>> + >>> + The Armv8.5-A architecture also adds some security features that >>> are optional to all older >>> + architecture versions. These are also supported now and only effect >>> the assembler. >>> + >>> + Speculation Barrier instruction using >>> -march=armv8-a+sb. >>> + Execution and Data Prediction Restriction instructions using >>> -march=armv8-a+predres. >>> + Speculative Store Bypass Safe instruction using >>> -march=armv8-a+ssbs. This does not >>> + require a compiler option for Arm and thus >>> -march=armv8-a+ssbs is a AArch64 specific option. >>> >>> "AArch64-specific" >>> >>> >>> LGTM otherwise. >>> Thanks, >>> Kyrill >> >> Thanks for the review and sorry for the delay in response. I had edited >> the language for adding new options in a few other places as well. >> >> + The Armv8.5-A architecture also adds some security features that are >> +optional to all older architecture versions. These are also supported >> now > > s/also supported now/now supported/ > >> +and only effect the assembler. > > s/effect/affect/ > >> + >> + Speculation Barrier instruction through the >> + -march=armv8-a+sb option. >> + Execution and Data Prediction Restriction instructions through >> + the -march=armv8-a+predres option. >> + Speculative Store Bypass Safe instruction through the >> + -march=armv8-a+ssbs option. This does not require a >> + compiler option for Arm and thus -march=armv8-a+ssbs >> + is an AArch64-specific option. >> + >> + >> >> >> AArch64 specific >> @@ -362,6 +380,23 @@ >> The default value is 16 (64Kb) and can be changed at configure >> time using the flag >> --with-stack-clash-protection-guard-size=12|16. >> >> + >> +The option -msign-return-address= has been deprecated. This >> +has been replaced by the new -mbranch-protection= option. >> This >> +new option can now be used to enable the return address signing as well >> as >> +the new Branch Target Identification feature of Armv8.5-A architecture. >> For >> +more information on the arguments accepted by this option, please refer >> to >> + > href="https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;>AArch64-Options. >> + >> + The following optional extensions to Armv8.5-A architecture are also >> + supported now and only effect the assembler. > > s/effect/affect/ > >> + >> + Random Number Generation instructions through the >> + -march=armv8.5-a+rng option. >> + Memory Tagging Extension through the >> + -march=armv8.5-a+memtag option. >> + >> + >> >> >> Arm specific > > Otherwise, OK by me but feel free to wait for people with gooder > grammar than me to have their say. > Thanks for spotting those. So far no one else with gooder grammar has pointed out anything else. I will commit the patch with the changes you suggested on Monday if no one else has any other objections. Thanks Sudi > Thanks, > James >
Re: [PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9
Hi Kyrill On 12/03/2019 12:03, Kyrill Tkachov wrote: > Hi Sudi, > > On 2/22/19 10:45 AM, Sudakshina Das wrote: >> Hi >> >> This patch documents the addition of the new Armv8.5-A and corresponding >> extensions in the gcc-9/changes.html. >> As per https://gcc.gnu.org/about.html, I have used W3 Validator. >> Is this ok for cvs? >> >> Thanks >> Sudi > > > Index: htdocs/gcc-9/changes.html > === > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v > retrieving revision 1.43 > diff -u -r1.43 changes.html > --- htdocs/gcc-9/changes.html 21 Feb 2019 10:32:55 - 1.43 > +++ htdocs/gcc-9/changes.html 21 Feb 2019 18:25:09 - > @@ -283,6 +283,19 @@ > > The intrinsics are defined by the ACLE specification. > > + > + The Armv8.5-A architecture is now supported. This can be used by > specifying the > + -march=armv8.5-a option. > > > I tend to prefer the wording "... is now supported through the > -march=armv8.5-a option". > Otherwise it reads as the compiler "using" the architecture, whereas we > usually talk about "targeting" an architecture. > > + > + The Armv8.5-A architecture also adds some security features that > are optional to all older > + architecture versions. These are also supported now and only effect > the assembler. > + > + Speculation Barrier instruction using > -march=armv8-a+sb. > + Execution and Data Prediction Restriction instructions using > -march=armv8-a+predres. > + Speculative Store Bypass Safe instruction using > -march=armv8-a+ssbs. This does not > + require a compiler option for Arm and thus > -march=armv8-a+ssbs is a AArch64 specific option. > > "AArch64-specific" > > > LGTM otherwise. > Thanks, > Kyrill Thanks for the review and sorry for the delay in response. I had edited the language for adding new options in a few other places as well. Thanks Sudi > > + > + > > > AArch64 specific > @@ -298,6 +311,22 @@ > The default value is 16 (64Kb) and can be changed at configure > time using the flag > --with-stack-clash-protection-guard-size=12|16. > > + > + The option -msign-return-address= has been deprecated. > This has been replaced > + by the new -mbranch-protection= option. This new > option can now be used to > + enable the return address signing as well as the new Branch Target > Identification > + feature of Armv8.5-A architecture. For more information on the > arguments accepted by > + this option, please refer to > + href="https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;> > > > + AArch64-Options. > + > + The following optional extensions to Armv8.5-A architecture are > also supported now and > + only effect the assembler. > + > + Random Number Generation instructions using > -march=armv8.5-a+rng. > + Memory Tagging Extension using > -march=armv8.5-a+memtag. > + > + > > > Arm specific > Index: htdocs/gcc-9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v retrieving revision 1.52 diff -u -r1.52 changes.html --- htdocs/gcc-9/changes.html 7 Mar 2019 14:40:06 - 1.52 +++ htdocs/gcc-9/changes.html 18 Mar 2019 18:55:24 - @@ -342,6 +342,24 @@ The intrinsics are defined by the ACLE specification. + +The Armv8.5-A architecture is now supported through the +-march=armv8.5-a option. + + The Armv8.5-A architecture also adds some security features that are +optional to all older architecture versions. These are also supported now +and only effect the assembler. + + Speculation Barrier instruction through the + -march=armv8-a+sb option. + Execution and Data Prediction Restriction instructions through + the -march=armv8-a+predres option. + Speculative Store Bypass Safe instruction through the + -march=armv8-a+ssbs option. This does not require a + compiler option for Arm and thus -march=armv8-a+ssbs + is an AArch64-specific option. + + AArch64 specific @@ -362,6 +380,23 @@ The default value is 16 (64Kb) and can be changed at configure time using the flag --with-stack-clash-protection-guard-size=12|16. + +The option -msign-return-address= has been deprecated. This +has been replaced by the new -mbranch-protection= option. This +new option can now be used to enable the return address signing as well as +the new Branch Target Identification feature of A
Re: [PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9
Pinging and adding Gerald to the CC list. On 22/02/2019 10:45, Sudakshina Das wrote: > Hi > > This patch documents the addition of the new Armv8.5-A and corresponding > extensions in the gcc-9/changes.html. > As per https://gcc.gnu.org/about.html, I have used W3 Validator. > Is this ok for cvs? > > Thanks > Sudi
Re: [PATCH, GCC, AArch64] Fix a couple of bugs in BTI
On 21/02/2019 22:52, James Greenhalgh wrote: > On Thu, Feb 21, 2019 at 06:19:10AM -0600, Sudakshina Das wrote: >> Hi >> >> While doing more testing I found a couple of issues with my BTI patches. >> This patch fixes them: >> 1) Remove a reference to return address key. The original patch was >> written based on a different not yet committed patch ([PATCH >> 3/3][GCC][AARCH64] Add support for pointer authentication B key) and I >> missed out on cleaning this up. This is hidden behind the configuration >> option and thus went unnoticed. >> 2) Add a missed case for adding the BTI instruction in thunk functions. >> >> Bootstrapped on aarch64-none-linux-gnu and regression tested on >> aarch64-none-elf with configuration turned on. > > OK. > Thanks committed as r269112. Sudi > Thanks, > James > >> >> gcc/ChangeLog: >> >> 2019-xx-xx Sudakshina Das >> >> * config/aarch64/aarch64.c (aarch64_output_mi_thunk): Add bti >> instruction if enabled. >> (aarch64_override_options): Remove reference to return address >> key. >> >> >> Is this ok for trunk? >> Sudi >
[PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9
Hi This patch documents the addition of the new Armv8.5-A and corresponding extensions in the gcc-9/changes.html. As per https://gcc.gnu.org/about.html, I have used W3 Validator. Is this ok for cvs? Thanks Sudi Index: htdocs/gcc-9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v retrieving revision 1.43 diff -u -r1.43 changes.html --- htdocs/gcc-9/changes.html 21 Feb 2019 10:32:55 - 1.43 +++ htdocs/gcc-9/changes.html 21 Feb 2019 18:25:09 - @@ -283,6 +283,19 @@ The intrinsics are defined by the ACLE specification. + +The Armv8.5-A architecture is now supported. This can be used by specifying the + -march=armv8.5-a option. + + The Armv8.5-A architecture also adds some security features that are optional to all older +architecture versions. These are also supported now and only effect the assembler. + + Speculation Barrier instruction using -march=armv8-a+sb. + Execution and Data Prediction Restriction instructions using -march=armv8-a+predres. + Speculative Store Bypass Safe instruction using -march=armv8-a+ssbs. This does not + require a compiler option for Arm and thus -march=armv8-a+ssbs is a AArch64 specific option. + + AArch64 specific @@ -298,6 +311,22 @@ The default value is 16 (64Kb) and can be changed at configure time using the flag --with-stack-clash-protection-guard-size=12|16. + +The option -msign-return-address= has been deprecated. This has been replaced +by the new -mbranch-protection= option. This new option can now be used to +enable the return address signing as well as the new Branch Target Identification +feature of Armv8.5-A architecture. For more information on the arguments accepted by +this option, please refer to + https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;> + AArch64-Options. + + The following optional extensions to Armv8.5-A architecture are also supported now and + only effect the assembler. + + Random Number Generation instructions using -march=armv8.5-a+rng. + Memory Tagging Extension using -march=armv8.5-a+memtag. + + Arm specific
[PATCH, GCC, AArch64] Fix a couple of bugs in BTI
Hi While doing more testing I found a couple of issues with my BTI patches. This patch fixes them: 1) Remove a reference to return address key. The original patch was written based on a different not yet committed patch ([PATCH 3/3][GCC][AARCH64] Add support for pointer authentication B key) and I missed out on cleaning this up. This is hidden behind the configuration option and thus went unnoticed. 2) Add a missed case for adding the BTI instruction in thunk functions. Bootstrapped on aarch64-none-linux-gnu and regression tested on aarch64-none-elf with configuration turned on. gcc/ChangeLog: 2019-xx-xx Sudakshina Das * config/aarch64/aarch64.c (aarch64_output_mi_thunk): Add bti instruction if enabled. (aarch64_override_options): Remove reference to return address key. Is this ok for trunk? Sudi diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9f52cc9..7d9824a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5980,6 +5980,9 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED, rtx this_rtx, temp0, temp1, addr, funexp; rtx_insn *insn; + if (aarch64_bti_enabled ()) +emit_insn (gen_bti_c()); + reload_completed = 1; emit_note (NOTE_INSN_PROLOGUE_END); @@ -12032,7 +12035,6 @@ aarch64_override_options (void) { #ifdef TARGET_ENABLE_PAC_RET aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF; - aarch64_ra_sign_key = AARCH64_KEY_A; #else aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE; #endif
Re: [PATCH 2/2][GCC][ARM] Implement hint intrinsics for ARM
Hi Srinath On 10/01/19 19:20, Srinath Parvathaneni wrote: > Hi All, > > This patch implements the ACLE hint intrinsics (nop,yield,wfe,wfi,sev > and sevl), for all ARM targets. > > The intrinsics specification will be published on the Arm website [1]. > > [1] > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf > > Bootstrapped on arm-none-linux-gnueabihf, regression tested on > arm-none-eabi with no regressions and > ran the added tests for arm, thumb-1 and thumb-2 modes. > > Ok for trunk? If ok, could someone commit the patch on my behalf, I > don't have commit rights. > > Thanks, > Srinath > > gcc/ChangeLog: > > 2019-01-10 Srinath Parvathaneni > > * config/arm/arm-builtins.c (NOP_QUALIFIERS): New qualifier. > (arm_expand_builtin_args): New case. > * config/arm/arm.md (yield): New pattern name. > (wfe): Likewise. > (wfi): Likewise. > (sev): Likewise. > (sevl): Likewise. > * config/arm/arm_acle.h (__nop ): New inline function. > (__yield): Likewise. > (__sev): Likewise. > (__sevl): Likewise. > (__wfi): Likewise. > (__wfe): Likewise. > * config/arm/arm_acle_builtins.def (VAR1): > (nop): New builtin definitions. > (yield): Likewise. > (sev): Likewise. > (sevl): Likewise. > (wfi): Likewise. > (wfe): Likewise. > * config/arm/unspecs.md (unspecv): > (VUNSPEC_YIELD): New volatile unspec. > (VUNSPEC_SEV): Likewise. > (VUNSPEC_SEVL): Likewise. > (VUNSPEC_WFI): Likewise. > > gcc/testsuite/ChangeLog: > > 2019-01-10 Srinath Parvathaneni > > * gcc.target/arm/acle/nop.c: New test. > * gcc.target/arm/acle/sev-1.c: Likewise. > * gcc.target/arm/acle/sev-2.c: Likewise. > * gcc.target/arm/acle/sev-3.c: Likewise. > * gcc.target/arm/acle/sevl-1.c: Likewise. > * gcc.target/arm/acle/sevl-2.c: Likewise. > * gcc.target/arm/acle/sevl-3.c: Likewise. > * gcc.target/arm/acle/wfe-1.c: Likewise. > * gcc.target/arm/acle/wfe-2.c: Likewise. > * gcc.target/arm/acle/wfe-3.c: Likewise. > * gcc.target/arm/acle/wfi-1.c: Likewise. > * gcc.target/arm/acle/wfi-2.c: Likewise. > * gcc.target/arm/acle/wfi-3.c: Likewise. > * gcc.target/arm/acle/yield-1.c: Likewise. > * gcc.target/arm/acle/yield-2.c: Likewise. > * gcc.target/arm/acle/yield-3.c: Likewise. > Thanks for doing this and I am not a maintainer. I do have a few questions: ... diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index f6196e9316898e3258e08d8f2ece8fe9640676ca..36b24cfdfa6c61d952a5c704f54d37f2b0fdd34e 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -8906,6 +8906,76 @@ (set_attr "type" "mov_reg")] ) +(define_insn "yield" + [(unspec_volatile [(const_int 0)] VUNSPEC_YIELD)] + "" +{ + if (TARGET_ARM) +return ".inst\t0xe320f001\t//yield"; + else if(TARGET_THUMB2) There should be a space after the if. Likewise for all the other instructions. +return ".inst\t0xf3af8001\t//yield"; + else /* TARGET_THUMB1 */ +return ".inst\t0xbf10\t//yield"; +} + [(set_attr "type" "coproc")] Can you please explain the coproc attribute. Also I think maybe you can use the "length" attribute here. Likewise for all the other instructions. Finally, for the tests why not combine the tests like the AArch64 patch where all the intrinsics were tested in the same file with common testing options? You could have only three new files for all the testing? Thanks Sudi +) + > > >
Re: [PATCH 1/2][GCC][AArch64] Implement hint intrinsics for AArch64
Hi Srinath On 10/01/19 19:20, Srinath Parvathaneni wrote: > Hi All, > > This patch implements the ACLE hint intrinsics (nop, yield, wfe, wfi, > sev and sevl), for AArch64. > > The instructions are documented in the ArmARM[1] and the intrinsics > specification will be > published on the Arm website [2]. > > [1] > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile > [2] > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf > > Bootstrapped on aarch64-none-linux-gnu and regression tested on > aarch64-none-elf with no regressions. > > Ok for trunk? If ok, could someone commit the patch on my behalf, I > don't have commit rights. > > Thanks, > Srinath > > gcc/ChangeLog: > > 2019-01-10 Srinath Parvathaneni > > * config/aarch64/aarch64.md (yield): New pattern name. > (wfe): Likewise. > (wfi): Likewise. > (sev): Likewise. > (sevl): Likewise. > (UNSPECV_YIELD): New volatile unspec. > (UNSPECV_WFE): Likewise. > (UNSPECV_WFI): Likewise. > (UNSPECV_SEV): Likewise. > (UNSPECV_SEVL): Likewise. > * config/aarch64/aarch64-builtins.c (aarch64_builtins): > AARCH64_SYSHINTOP_BUILTIN_NOP: New builtin. > AARCH64_SYSHINTOP_BUILTIN_YIELD: Likewise. > AARCH64_SYSHINTOP_BUILTIN_WFE: Likewise. > AARCH64_SYSHINTOP_BUILTIN_WFI: Likewise. > AARCH64_SYSHINTOP_BUILTIN_SEV: Likewise. > AARCH64_SYSHINTOP_BUILTIN_SEVL: Likewise. > (aarch64_init_syshintop_builtins): New function. > (aarch64_init_builtins): New call statement. > (aarch64_expand_builtin): New case. > * config/aarch64/arm_acle.h (__nop ): New inline function. > (__yield): Likewise. > (__sev): Likewise. > (__sevl): Likewise. > (__wfi): Likewise. > (__wfe): Likewise. > > gcc/testsuite/ChangeLog: > > 2019-01-10 Srinath Parvathaneni > > * gcc.target/aarch64/acle/hint-1.c: New test. > * gcc.target/aarch64/acle/hint-2.c: Likewise. > > Thank you for doing this and I am not a maintainer. I have some comments bellow: diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 8cced94567008e28b1761ec8771589a3925f2904..d5424f98df1f5c8f206cbded097bdd2dfcd1ca8e 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -399,6 +399,13 @@ enum aarch64_builtins AARCH64_PAUTH_BUILTIN_AUTIA1716, AARCH64_PAUTH_BUILTIN_PACIA1716, AARCH64_PAUTH_BUILTIN_XPACLRI, + /* System Hint Operation Builtins for AArch64. */ + AARCH64_SYSHINTOP_BUILTIN_NOP, + AARCH64_SYSHINTOP_BUILTIN_YIELD, + AARCH64_SYSHINTOP_BUILTIN_WFE, + AARCH64_SYSHINTOP_BUILTIN_WFI, + AARCH64_SYSHINTOP_BUILTIN_SEV, + AARCH64_SYSHINTOP_BUILTIN_SEVL, AARCH64_BUILTIN_MAX }; Is there any reason for the naming? They don't seem to be part of any extensions? IMHO AARCH64_BUILTIN_NOP, etc looks cleaner and follows other builtins which are not part of any extensions. ... @@ -1395,6 +1436,29 @@ aarch64_expand_builtin (tree exp, } return target; +case AARCH64_SYSHINTOP_BUILTIN_NOP: + emit_insn (GEN_FCN (CODE_FOR_nop) ()); + return gen_reg_rtx (VOIDmode); + Needs a newline before the new case. ... +(define_insn "yield" + [(unspec_volatile [(const_int 0)] UNSPECV_YIELD)] + "" + "yield" + [(set_attr "type" "coproc")] +) I don't believe setting the type to coproc in AArch64 is correct. Likewise for the other instructions. ... +/* Test the nop ACLE hint intrinsic */ +/* { dg-do compile } */ +/* { dg-additional-options "-O0" } */ +/* { dg-options "-march=armv8-a" } */ + +#include "arm_acle.h" + +void +test_hint (void) +{ + __nop (); +} + +/* { dg-final { scan-assembler-times "\tnop" 3 } } */ Just curious, why are there 3 nops here? Thanks Sudi > > > >
[Committed, GCC, AArch64] Disable tests for ilp32.
Hi Currently Return Address Signing is only supported in lp64. Thus the tests that I added recently (that enables return address signing by the mbranch-protection=standard option), should also be exempted from testing in ilp32. This patch adds the needed dg-require-effective-target directive in the tests. *** gcc/testsuite/ChangeLog *** 2019-01-10 Sudakshina Das * gcc.target/aarch64/bti-1.c: Exempt for ilp32. * gcc.target/aarch64/bti-2.c: Likewise. * gcc.target/aarch64/bti-3.c: Likewise. Only test directive change, hence only tested the above tests with: RUNTESTFLAGS="--target_board \"unix{-mabi=ilp32}\" aarch64.exp=" Committed as obvious as r267818 Thanks Sudi diff --git a/gcc/testsuite/gcc.target/aarch64/bti-1.c b/gcc/testsuite/gcc.target/aarch64/bti-1.c index 975528cbf290af421f20d8c7edaef22a6bd6..5a556b08ed15679b25676a11fe9c7a64641ee671 100644 --- a/gcc/testsuite/gcc.target/aarch64/bti-1.c +++ b/gcc/testsuite/gcc.target/aarch64/bti-1.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* -Os to create jump table. */ /* { dg-options "-Os" } */ +/* { dg-require-effective-target lp64 } */ /* If configured with --enable-standard-branch-protection, don't use command line option. */ /* { dg-additional-options "-mbranch-protection=standard" { target { ! default_branch_protection } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/bti-2.c b/gcc/testsuite/gcc.target/aarch64/bti-2.c index 85943c3d6415b010c858cb948221e33b0d30a310..6ad89284e1b74ec92ff4661e6a71c92230450d58 100644 --- a/gcc/testsuite/gcc.target/aarch64/bti-2.c +++ b/gcc/testsuite/gcc.target/aarch64/bti-2.c @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-require-effective-target lp64 } */ /* { dg-require-effective-target aarch64_bti_hw } */ /* If configured with --enable-standard-branch-protection, don't use command line option. */ diff --git a/gcc/testsuite/gcc.target/aarch64/bti-3.c b/gcc/testsuite/gcc.target/aarch64/bti-3.c index 97cf5d37f42b9313da75481c2ceac884735ac995..9ff9f9d6be1d8708f34f50dc7303a1783c18f204 100644 --- a/gcc/testsuite/gcc.target/aarch64/bti-3.c +++ b/gcc/testsuite/gcc.target/aarch64/bti-3.c @@ -1,6 +1,7 @@ /* This is a copy of gcc/testsuite/gcc.c-torture/execute/pr56982.c to test the setjmp case of the bti pass. */ /* { dg-do run } */ +/* { dg-require-effective-target lp64 } */ /* { dg-require-effective-target aarch64_bti_hw } */ /* { dg-options "--save-temps -mbranch-protection=standard" } */
Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.
Hi On 20/12/18 16:40, Sudakshina Das wrote: > Hi James > > On 19/12/18 3:40 PM, James Greenhalgh wrote: >> On Fri, Dec 14, 2018 at 10:09:03AM -0600, Sudakshina Das wrote: >> >> >> >>> I have updated the patch according to our discussions offline. >>> The md pattern is now split into 4 patterns and i have added a new >>> test for the setjmp case along with some comments where missing. >> >> This is OK for trunk. >> > > Thanks for the approvals. With this my series is ready to go in trunk. I > will wait for Sam's options patch to go in trunk before I commit mine. > Series is committed with a rebase without Sam Tebbs's 3rd patch for B-Key addition as r267765 to r267770. Thanks Sudi > Thanks > Sudi > >> Thanks, >> James >> >>> *** gcc/ChangeLog *** >>> >>> 2018-xx-xx Sudakshina Das >>> Ramana Radhakrishnan >>> >>> * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o. >>> * gcc/config/aarch64/aarch64.h: Update comment for >>> TRAMPOLINE_SIZE. >>> * config/aarch64/aarch64.c (aarch64_asm_trampoline_template): >>> Update if bti is enabled. >>> * config/aarch64/aarch64-bti-insert.c: New file. >>> * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert >>> bti pass. >>> * config/aarch64/aarch64-protos.h (make_pass_insert_bti): >>> Declare the new bti pass. >>> * config/aarch64/aarch64.md (unspecv): Add UNSPECV_BTI_NOARG, >>> UNSPECV_BTI_C, UNSPECV_BTI_J and UNSPECV_BTI_JC. >>> (bti_noarg, bti_j, bti_c, bti_jc): New define_insns. >>> * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o. >>> >>> *** gcc/testsuite/ChangeLog *** >>> >>> 2018-xx-xx Sudakshina Das >>> >>> * gcc.target/aarch64/bti-1.c: New test. >>> * gcc.target/aarch64/bti-2.c: New test. >>> * gcc.target/aarch64/bti-3.c: New test. >>> * lib/target-supports.exp >>> (check_effective_target_aarch64_bti_hw): Add new check for >>> BTI hw. >>> >>> Thanks >>> Sudi
Re: [PATCH][GCC][Aarch64] Change expected bfxil count in gcc.target/aarch64/combine_bfxil.c to 18 (PR/87763)
Hi Sam On 04/01/19 10:26, Sam Tebbs wrote: > > On 12/19/18 4:47 PM, Sam Tebbs wrote: > >> Hi all, >> >> Since r265398 (combine: Do not combine moves from hard registers), the bfxil >> scan in gcc.target/aarch64/combine_bfxil.c has been failing. >> >> FAIL: gcc.target/aarch64/combine_bfxil.c scan-assembler-times bfxil\\t 13 >> >> This is because bfi was generated for the combine_* functions in the >> above test, >> but as of r265398, bfxil is preferred over bfi and so the bfxil count has >> increased. This patch increases the scan count to 18 to account for this so >> that the test passes. >> >> Before r265398 >> >> combine_zero_extended_int: >> bfxil x0, x1, 0, 16 >> ret >> >> combine_balanced: >> bfi x0, x1, 0, 32 >> ret >> >> combine_minimal: >> bfi x0, x1, 0, 1 >> ret >> >> combine_unbalanced: >> bfi x0, x1, 0, 24 >> ret >> >> combine_balanced_int: >> bfi w0, w1, 0, 16 >> ret >> >> combine_unbalanced_int: >> bfi w0, w1, 0, 8 >> ret >> >> With r265398 >> >> combine_zero_extended_int: >> bfxil x0, x1, 0, 16 >> ret >> >> combine_balanced: >> bfxil x0, x1, 0, 32 >> ret >> >> combine_minimal: >> bfxil x0, x1, 0, 1 >> ret >> >> combine_unbalanced: >> bfxil x0, x1, 0, 24 >> ret >> >> combine_balanced_int: >> bfxil w0, w1, 0, 16 >> ret >> >> combine_unbalanced_int: >> bfxil w0, w1, 0, 8 >> ret >> >> These bfxil and bfi invocations are equivalent, so this patch won't hide any >> incorrect code-gen. >> >> Bootstrapped on aarch64-none-linux-gnu and regression tested on >> aarch64-none-elf with no regressions. >> >> OK for trunk? >> I am not a maintainer but this looks ok to me on its own. However I see that you commented about this patch on PR87763. Can you please add the PR tag in your changelog entry. Also since I did not see anyone else comment on the PR after your comment, I am adding some of the people from the PR to the cc list. Thanks Sudi >> gcc/testsuite/Changelog: >> >> 2018-12-19 Sam Tebbs >> >> * gcc.target/aarch64/combine_bfxil.c: Change >> scan-assembler-times bfxil count to 18. > ping >
Re: Fix devirtualiation in expanded thunks
Hi Jan On 21/12/18 7:20 PM, Jan Hubicka wrote: > Hi, > this patch fixes polymorphic call analysis in thunks. Unlike normal > methods, thunks take THIS pointer offsetted by a known constant. This > needs t be compensated for when calculating address of outer type. > > Bootstrapped/regtested x86_64-linux, also tested with Firefox where this > bug trigger misoptimization in spellchecker. I plan to backport it to > release branches soon. > > Honza > > PR ipa/88561 > * ipa-polymorphic-call.c > (ipa_polymorphic_call_context::ipa_polymorphic_call_context): Handle > arguments of thunks correctly. > (ipa_polymorphic_call_context::get_dynamic_context): Be ready for > NULL instance pinter. > * lto-cgraph.c (lto_output_node): Always stream thunk info. > * g++.dg/tree-prof/devirt.C: New testcase. > Index: ipa-polymorphic-call.c > === > --- ipa-polymorphic-call.c(revision 267325) > +++ ipa-polymorphic-call.c(working copy) > @@ -995,9 +995,22 @@ ipa_polymorphic_call_context::ipa_polymo > { > outer_type >= TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (base_pointer))); > + cgraph_node *node = cgraph_node::get (current_function_decl); > gcc_assert (TREE_CODE (outer_type) == RECORD_TYPE > || TREE_CODE (outer_type) == UNION_TYPE); > > + /* Handle the case we inlined into a thunk. In this case > + thunk has THIS pointer of type bar, but it really receives > + address to its base type foo which sits in bar at > + 0-thunk.fixed_offset. It starts with code that adds > + think.fixed_offset to the pointer to compensate for this. > + > + Because we walked all the way to the begining of thunk, we now > + see pointer _offset and need to compensate > + for it. */ > + if (node->thunk.fixed_offset) > + offset -= node->thunk.fixed_offset * BITS_PER_UNIT; > + > /* Dynamic casting has possibly upcasted the type >in the hiearchy. In this case outer type is less >informative than inner type and we should forget > @@ -1005,7 +1018,11 @@ ipa_polymorphic_call_context::ipa_polymo > if ((otr_type > && !contains_type_p (outer_type, offset, > otr_type)) > - || !contains_polymorphic_type_p (outer_type)) > + || !contains_polymorphic_type_p (outer_type) > + /* If we compile thunk with virtual offset, the THIS pointer > + is adjusted by unknown value. We can't thus use outer info > + at all. */ > + || node->thunk.virtual_offset_p) > { > outer_type = NULL; > if (instance) > @@ -1030,7 +1047,15 @@ ipa_polymorphic_call_context::ipa_polymo > maybe_in_construction = false; > } > if (instance) > - *instance = base_pointer; > + { > + /* If method is expanded thunk, we need to apply thunk offset > + to instance pointer. */ > + if (node->thunk.virtual_offset_p > + || node->thunk.fixed_offset) > + *instance = NULL; > + else > + *instance = base_pointer; > + } > return; > } > /* Non-PODs passed by value are really passed by invisible > @@ -1547,6 +1572,9 @@ ipa_polymorphic_call_context::get_dynami > HOST_WIDE_INT instance_offset = offset; > tree instance_outer_type = outer_type; > > + if (!instance) > +return false; > + > if (otr_type) > otr_type = TYPE_MAIN_VARIANT (otr_type); > > Index: lto-cgraph.c > === > --- lto-cgraph.c (revision 267325) > +++ lto-cgraph.c (working copy) > @@ -547,7 +547,11 @@ lto_output_node (struct lto_simple_outpu > streamer_write_bitpack (); > streamer_write_data_stream (ob->main_stream, section, strlen (section) + > 1); > > - if (node->thunk.thunk_p) > + /* Stream thunk info always because we use it in > + ipa_polymorphic_call_context::ipa_polymorphic_call_context > + to properly interpret THIS pointers for thunks that has been converted > + to Gimple. */ > + if (node->definition) > { > streamer_write_uhwi_stream >(ob->main_stream, > @@ -1295,7 +1299,7 @@ input_node (struct lto_file_decl_data *f > if (section) > node->set_section_for_node (section); > > - if (node->thunk.thunk_p) > + if (node->definition) > { > int type = streamer_read_uhwi (ib); > HOST_WIDE_INT fixed_offset = streamer_read_uhwi (ib); > Index: testsuite/g++.dg/tree-prof/devirt.C > === > --- testsuite/g++.dg/tree-prof/devirt.C (nonexistent) > +++ testsuite/g++.dg/tree-prof/devirt.C (working copy) > @@ -0,0
Re: GCC 8 backports
Hi Martin On 27/12/18 12:32 PM, Martin Liška wrote: > On 11/20/18 11:58 AM, Martin Liška wrote: >> On 10/3/18 11:23 AM, Martin Liška wrote: >>> On 9/25/18 8:48 AM, Martin Liška wrote: Hi. One more tested patch. Martin >>> One more tested patch. >>> >>> Martin >>> >> Hi. >> >> One another tested patch that I'm going to install. >> >> Martin >> > Hi. > > One another tested patch that I'm going to install. > > Thanks, > Martin The last backport of r267338 causes the following failures on arm-none-linux-gnueabihf and aarch64-none-linux-gnu UNRESOLVED: g++.dg/tree-prof/devirt.C scan-ipa-dump-times dom3 "3" folding virtual function call to virtual unsigned int mozPersonalDictionary::AddRef UNRESOLVED: g++.dg/tree-prof/devirt.C scan-ipa-dump-times dom3 "3" folding virtual function call to virtual unsigned int mozPersonalDictionary::_ZThn16 with g++.dg/tree-prof/devirt.C: dump file does not exist Thanks Sudi
Re: [PATCH] PR fortran/81509 and fortran/45513
Hi Steve On 27/12/18 8:58 PM, Steve Kargl wrote: > On Thu, Dec 27, 2018 at 11:24:07AM +0000, Sudakshina Das wrote: >> With the failure as: >> >> Excess errors: >> /build/src/gcc/libgomp/testsuite/libgomp.fortran/aligned1.f03:55:14: >> Error: Arguments of 'iand' have different kind type parameters at (1) >> /build/src/gcc/libgomp/testsuite/libgomp.fortran/aligned1.f03:59:14: >> Error: Arguments of 'iand' have different kind type parameters at (1) >> > This should be fixed, now. Sorry about the breakage. Thanks for the quick fix! Sudi
Re: [PATCH] PR fortran/81509 and fortran/45513
Hi Steve On 23/12/18 6:49 PM, Steve Kargl wrote: > This is a re-submission of a patch I submitted 15 months ago. > See https://gcc.gnu.org/ml/fortran/2017-09/msg00124.html > > At that time one reviewer OK'd the patch for committing, > and one reviewer raised objections to the patch as I > chose to remove dubious extensions to the Fortran standard. > I withdrew that patch with the expection that Someone > would fix the bug. Well, Someone has not materialized. > > The patch has been retested on i586-*-freebsd and x86_64-*-freebsd. > > OK to commit as-is? > > Here's the text from the above URL. > > In short, F2008 now allows boz-literal-constants in IAND, IOR, IEOR, > DSHIFTL, DSHIFTR, and MERGE_BITS. gfortran currently allows a BOZ > argument, but she was not enforcing restrictions in F2008. The > attach patch causes gfortran to conform to F2008. > > As a side effect, the patch removes a questionable GNU Fortran > extension that allowed arguments to IAND, IOR, and IEOR to have > different kind type parameters. The behavior of this extension > was not documented. > > 2017-09-27 Steven G. Kargl > > PR fortran/45513 > PR fortran/81509 > * check.c: Rename function gfc_check_iand to gfc_check_iand_ieor_ior. > * check.c (boz_args_check): New function. Check I and J not both BOZ. > (gfc_check_dshift,gfc_check_iand_ieor_ior, gfc_check_ishft, >gfc_check_and, gfc_check_merge_bits): Use it. > * check.c (gfc_check_iand_ieor_ior): Force conversion of BOZ to kind > type of other agrument. Remove silly GNU extension. > (gfc_check_ieor, gfc_check_ior): Delete now unused functions. > * intrinsic.c (add_functions): Use gfc_check_iand_ieor_ior. Wrap long > line. > * intrinsic.h: Rename gfc_check_iand to gfc_check_iand_ieor_ior. > Delete prototype for bool gfc_check_ieor and gfc_check_ior > * intrinsic.texi: Update documentation for boz-literal-constant. > > 2017-09-27 Steven G. Kargl > > PR fortran/45513 > PR fortran/81509 > * gfortran.dg/graphite/id-26.f03: Fix non-conforming use of IAND. > * gfortran.dg/pr81509_1.f90: New test. > * gfortran.dg/pr81509_2.f90: New test. > This patch has caused the following failures on aarch64-none-linux-gnu: FAIL: libgomp.fortran/aligned1.f03 -O0 (test for excess errors) FAIL: libgomp.fortran/aligned1.f03 -O1 (test for excess errors) FAIL: libgomp.fortran/aligned1.f03 -O2 (test for excess errors) FAIL: libgomp.fortran/aligned1.f03 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: libgomp.fortran/aligned1.f03 -O3 -g (test for excess errors) FAIL: libgomp.fortran/aligned1.f03 -Os (test for excess errors) With the failure as: Excess errors: /build/src/gcc/libgomp/testsuite/libgomp.fortran/aligned1.f03:55:14: Error: Arguments of 'iand' have different kind type parameters at (1) /build/src/gcc/libgomp/testsuite/libgomp.fortran/aligned1.f03:59:14: Error: Arguments of 'iand' have different kind type parameters at (1) Thanks Sudi
Re: [Committed] XFAIL gfortran.dg/ieee/ieee_9.f90
Hi On 25/12/18 5:13 PM, Steve Kargl wrote: > On Tue, Dec 25, 2018 at 09:51:03AM +0200, Janne Blomqvist wrote: >> On Mon, Dec 24, 2018 at 9:42 PM Steve Kargl < >> s...@troutmask.apl.washington.edu> wrote: >> >>> On Mon, Dec 24, 2018 at 09:29:50PM +0200, Janne Blomqvist wrote: On Mon, Dec 24, 2018 at 8:05 PM Steve Kargl < s...@troutmask.apl.washington.edu> wrote: > I've added the following patch to a recently committed testcase. > > Index: gcc/testsuite/gfortran.dg/ieee/ieee_9.f90 > === > --- gcc/testsuite/gfortran.dg/ieee/ieee_9.f90 (revision 267413) > +++ gcc/testsuite/gfortran.dg/ieee/ieee_9.f90 (working copy) > @@ -1,4 +1,4 @@ > -! { dg-do run } > +! { dg-do run { xfail arm*-*-gnueabi arm*-*-gnueabihf } } > program foo > use ieee_arithmetic > use iso_fortran_env > The problem seems to be that GFortran says the real128 kind value is > 0 (i.e. that the target supports quad precision floating point (with >>> software emulation, presumably)), but then trying to use it fails. Would be nice if somebody who cares about arm-none-linux-gnueabihf could help figure out the proper resolution instead of papering over it with XFAIL. But I guess XFAIL is good enough until said somebody turns up. >>> Thanks for chasing down the details. I have no access to arm*-*-*. >>> >>> It's a shame the real128 is defined, and arm*-*-* doesn't >>> actually use it. I certainly have no time or interest in >>> fix this. >>> >> I think there are arm systems on the compile farm, but I haven't actually >> checked myself, just going by the error messages Sudi Das reported. >> >> That being said, having slept over it, I actually think there is a problem >> with the testcase, and not with arm*. So the errors in the testcase occurs >> in code like >> >> if (real128 > 0) then >> p = int(ieee_scalb(real(x, real128), int(i, int8))) >> if (p /= 64) stop 3 >> end if >> >> So if real128 is negative, as it should be if the target doesn't support >> quad precision float, the branch will never be taken, but the frontend will >> still generate code for it (though it will later be optimized away as >> unreachable), and that's where the error occurs. So the testcase would need >> something like >> >> integer, parameter :: large_real = max (real64, real128) >> ! ... >> if (real128 > 0) then >> p = int(ieee_scalb(real(x, large_real), int(i, int8))) >> if (p /= 64) stop 3 >> end if >> >> If you concur, please consider a patch fixing the testcase and removing the >> xfail pre-approved. >> > Indeed, you are probably correct that gfortran will generate > intermediate code and then garbage collect it. This then will > give an error for real(..., real128) in the statement for p. > If real128 /= 4, 8, 10, or 16. I'll fix the testcase. > > Do you know if we can get gfortran to pre-define macros for cpp? > That is, it would be nice if gfortran would recognize, say, > HAVE_GFC_REAL_10 and HAVE_GFC_REAL_16 if the target supports those > types. Then the testcase could be copied to ieee_9.F90, and > modified to > > #ifdef HAVE_REAL_16 > p = int(ieee_scalb(real(x, 16), int(i, int8))) > if (p /= 64) stop 3 > #endif > Thanks for looking into this. Sorry I was on holiday for Christmas. CC'ing Arm maintainers in case they have something to add. Thanks Sudi
Re: Fix devirtualization with LTO
Hi Jan On 22/12/18 8:08 PM, Jan Hubicka wrote: > Hi, > while fixing Firefox issues I also noticed that type simplification > completely disabled type based devirtualization on LTO path. Problem > is that method pointers now point to simplified type and > obj_type_ref_class is not ready for that. > > I also moved testcases where it makes sense to lto so this does not > happen again. This is not trivial task since one needs to work out why > testcases behaves differently when they do, so I will follow up on this > and convert more. > > Bootstrapped/regtested x86_64-linux, comitted. > > Honza > > * tree.c: (obj_type_ref_class): Move to... > * ipa-devirt.c (obj_type_ref_class): Move to here; lookup main > odr type. > (get_odr_type): Compensate for type simplification. > > * g++.dg/ipa/devirt-30.C: Add dg-do. > * g++.dg/lto/devirt-1_0.C: New testcase. > * g++.dg/lto/devirt-2_0.C: New testcase. > * g++.dg/lto/devirt-3_0.C: New testcase. > * g++.dg/lto/devirt-4_0.C: New testcase. > * g++.dg/lto/devirt-5_0.C: New testcase. > * g++.dg/lto/devirt-6_0.C: New testcase. > * g++.dg/lto/devirt-13_0.C: New testcase. > * g++.dg/lto/devirt-14_0.C: New testcase. > * g++.dg/lto/devirt-19_0.C: New testcase. > * g++.dg/lto/devirt-22_0.C: New testcase. > * g++.dg/lto/devirt-23_0.C: New testcase. > * g++.dg/lto/devirt-30_0.C: New testcase. > * g++.dg/lto/devirt-34_0.C: New testcase. > I am seeing the following failures on aarch64-none-elf, aarch64-none-linux-gnu, aarch64_be-none-elf, arm-none-eabi, arm-none-linux-gnueabihf: UNRESOLVED: g++-dg-lto-devirt-13-01.exe scan-tree-dump-times ssa "OBJ_TYPE_REF" 0 UNRESOLVED: g++-dg-lto-devirt-13-11.exe scan-tree-dump-times ssa "OBJ_TYPE_REF" 0 UNRESOLVED: g++-dg-lto-devirt-13-21.exe scan-tree-dump-times ssa "OBJ_TYPE_REF" 0 UNRESOLVED: g++-dg-lto-devirt-14-01.exe scan-tree-dump-not ssa "A.*foo" UNRESOLVED: g++-dg-lto-devirt-14-11.exe scan-tree-dump-not ssa "A.*foo" UNRESOLVED: g++-dg-lto-devirt-23-01.exe scan-wpa-ipa-dump cp "Discovered a virtual call to" With an error like: g++-dg-lto-devirt-14-11.exe: dump file does not exist In my brief attempt, I can see that the scan-dump* routines are computing the wrong base name. I get the following if I edit diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp index 3d42692..5961623 100644 --- a/gcc/testsuite/lib/scandump.exp +++ b/gcc/testsuite/lib/scandump.exp @@ -160,7 +160,7 @@ proc scan-dump-not { args } { set dumpbase [dump-base $src [lindex $args 3]] set output_file "[glob -nocomplain $dumpbase.[lindex $args 2]]" if { $output_file == "" } { - verbose -log "$testcase: dump file does not exist" + verbose -log "$testcase: dump file does not exist $dumpbase" unresolved "$testname" return } g++-dg-lto-devirt-14-11.exe: dump file does not exist g++-dg-lto-devirt-14-11.exe UNRESOLVED: g++-dg-lto-devirt-14-11.exe scan-tree-dump-not ssa "A.*foo Thanks Sudi > Index: ipa-devirt.c > === > --- ipa-devirt.c (revision 267337) > +++ ipa-devirt.c (working copy) > @@ -1985,6 +1985,30 @@ add_type_duplicate (odr_type val, tree t > return build_bases; > } > > +/* REF is OBJ_TYPE_REF, return the class the ref corresponds to. */ > + > +tree > +obj_type_ref_class (const_tree ref) > +{ > + gcc_checking_assert (TREE_CODE (ref) == OBJ_TYPE_REF); > + ref = TREE_TYPE (ref); > + gcc_checking_assert (TREE_CODE (ref) == POINTER_TYPE); > + ref = TREE_TYPE (ref); > + /* We look for type THIS points to. ObjC also builds > + OBJ_TYPE_REF with non-method calls, Their first parameter > + ID however also corresponds to class type. */ > + gcc_checking_assert (TREE_CODE (ref) == METHOD_TYPE > +|| TREE_CODE (ref) == FUNCTION_TYPE); > + ref = TREE_VALUE (TYPE_ARG_TYPES (ref)); > + gcc_checking_assert (TREE_CODE (ref) == POINTER_TYPE); > + tree ret = TREE_TYPE (ref); > + if (!in_lto_p) > +ret = TYPE_CANONICAL (ret); > + else > +ret = get_odr_type (ret)->type; > + return ret; > +} > + > /* Get ODR type hash entry for TYPE. If INSERT is true, create > possibly new entry. */ > > @@ -2000,6 +2024,8 @@ get_odr_type (tree type, bool insert) > int base_id = -1; > > type = TYPE_MAIN_VARIANT (type); > + if (!in_lto_p) > +type = TYPE_CANONICAL (type); > > gcc_checking_assert (can_be_name_hashed_p (type) > || can_be_vtable_hashed_p (type)); > Index: testsuite/g++.dg/ipa/devirt-30.C > === > --- testsuite/g++.dg/ipa/devirt-30.C (revision 267337) > +++ testsuite/g++.dg/ipa/devirt-30.C (working copy) > @@ -1,4 +1,5 @@ > // PR c++/58678 > +// { dg-do compile } > // { dg-options "-O3 -fdump-ipa-devirt" } > > // We shouldn't
Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic
Hi Steve On 21/12/18 8:01 PM, Steve Kargl wrote: > On Fri, Dec 21, 2018 at 07:39:45PM +, Joseph Myers wrote: >> On Fri, 21 Dec 2018, Steve Kargl wrote: >> >>> scalbln(double x, long n) >>> { >>> >>> return (scalbn(x, (n > NMAX) ? NMAX : (n < NMIN) ? NMIN : (int)n)); >>> } >>> >>> A search for glibc's libm locateshttps://tinyurl.com/ybcy8w4t >>> which is a bit-twiddling routine. Not sure it's worth the >>> effort. Joseph Myers might have an opinion. >> Such comparisons are needed in the scalbn / scalbln implementations anyway >> to deal with large exponents. I suppose where there's a suitable scalbln >> implementation, and you don't know if the arguments are within the range >> of int, calling scalbln at least saves code size in the caller and avoids >> duplicating those range checks. >> > I was thinking along the lines of -ffast-math and whether > __builtin_scalbn and __builtin_scalbln are then inlined. > The comparisons may inhibit inlining __builtin_scalbn; > while, if gfortran used __builtin_scalbln, inlining would > occur. > > As it is, for > > function foo(x,i) > use ieee_arithmetic > real(8) foo, c > integer(8) i > foo = ieee_scalb(c, i) > end function foo > > the options -ffast-math -O3 -fdump-tree-optimized give > > [local count: 1073741824]: >_gfortran_ieee_procedure_entry (); >_8 = *i_7(D); >_1 = MIN_EXPR <_8, 2147483647>; >_2 = MAX_EXPR <_1, -2147483647>; >_3 = (integer(kind=4)) _2; >_4 = __builtin_scalbn (c_9(D), _3); >_gfortran_ieee_procedure_exit (); >fpstate.0 ={v} {CLOBBER}; >return _4; > > It seems this could be > > [local count: 1073741824]: >_gfortran_ieee_procedure_entry (); >_3 = (integer(kind=4)) *i_7(D); >_4 = __builtin_scalbn (c_9(D), _3 >_gfortran_ieee_procedure_exit (); >fpstate.0 ={v} {CLOBBER}; > I am observing your new test pr88328.f90 failing on arm-none-linux-gnueabihf: Excess errors: /build/src/gcc/gcc/testsuite/gfortran.dg/ieee/ieee_9.f90:20:36: Error: Invalid kind for REAL at (1) /build/src/gcc/gcc/testsuite/gfortran.dg/ieee/ieee_9.f90:35:36: Error: Invalid kind for REAL at (1) /build/src/gcc/gcc/testsuite/gfortran.dg/ieee/ieee_9.f90:50:36: Error: Invalid kind for REAL at (1) /build/src/gcc/gcc/testsuite/gfortran.dg/ieee/ieee_9.f90:65:36: Error: Invalid kind for REAL at (1)
Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.
Hi James On 19/12/18 3:40 PM, James Greenhalgh wrote: > On Fri, Dec 14, 2018 at 10:09:03AM -0600, Sudakshina Das wrote: > > > >> I have updated the patch according to our discussions offline. >> The md pattern is now split into 4 patterns and i have added a new >> test for the setjmp case along with some comments where missing. > > This is OK for trunk. > Thanks for the approvals. With this my series is ready to go in trunk. I will wait for Sam's options patch to go in trunk before I commit mine. Thanks Sudi > Thanks, > James > >> *** gcc/ChangeLog *** >> >> 2018-xx-xx Sudakshina Das >> Ramana Radhakrishnan >> >> * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o. >> * gcc/config/aarch64/aarch64.h: Update comment for >> TRAMPOLINE_SIZE. >> * config/aarch64/aarch64.c (aarch64_asm_trampoline_template): >> Update if bti is enabled. >> * config/aarch64/aarch64-bti-insert.c: New file. >> * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert >> bti pass. >> * config/aarch64/aarch64-protos.h (make_pass_insert_bti): >> Declare the new bti pass. >> * config/aarch64/aarch64.md (unspecv): Add UNSPECV_BTI_NOARG, >> UNSPECV_BTI_C, UNSPECV_BTI_J and UNSPECV_BTI_JC. >> (bti_noarg, bti_j, bti_c, bti_jc): New define_insns. >> * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o. >> >> *** gcc/testsuite/ChangeLog *** >> >> 2018-xx-xx Sudakshina Das >> >> * gcc.target/aarch64/bti-1.c: New test. >> * gcc.target/aarch64/bti-2.c: New test. >> * gcc.target/aarch64/bti-3.c: New test. >> * lib/target-supports.exp >> (check_effective_target_aarch64_bti_hw): Add new check for >> BTI hw. >> >> Thanks >> Sudi
Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.
Hi James On 29/11/18 16:47, Sudakshina Das wrote: > Hi > > On 13/11/18 14:47, Sudakshina Das wrote: >> Hi >> >> On 02/11/18 18:38, Sudakshina Das wrote: >>> Hi >>> >>> This patch is part of a series that enables ARMv8.5-A in GCC and >>> adds Branch Target Identification Mechanism. >>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) >>> >>> This patch adds a new pass called "bti" which is triggered by the >>> command line argument -mbranch-protection whenever "bti" is turned on. >>> >>> The pass iterates through the instructions and adds appropriated BTI >>> instructions based on the following: >>>* Add a new "BTI C" at the beginning of a function, unless its >>> already >>> protected by a "PACIASP/PACIBSP". We exempt the functions that are >>> only called directly. >>>* Add a new "BTI J" for every target of an indirect jump, jump table >>> targets, non-local goto targets or labels that might be referenced >>> by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL) >>> >>> Since we have already changed the use of indirect tail calls to only x16 >>> and x17, we do not have to use "BTI JC". >>> (check patch 3/6). >>> >> >> I missed out on the explanation for the changes to the trampoline code. >> The patch also updates the trampoline code in case BTI is enabled. Since >> the trampoline code is a target of an indirect branch, we need to add an >> appropriate BTI instruction at the beginning of it to avoid a branch >> target exception. >> >>> Bootstrapped and regression tested with aarch64-none-linux-gnu. Added >>> new tests. >>> Is this ok for trunk? >>> >>> Thanks >>> Sudi >>> >>> *** gcc/ChangeLog *** >>> >>> 2018-xx-xx Sudakshina Das >>> Ramana Radhakrishnan >>> >>> * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o. >>> * gcc/config/aarch64/aarch64.h: Update comment for >>> TRAMPOLINE_SIZE. >>> * config/aarch64/aarch64.c (aarch64_asm_trampoline_template): >>> Update if bti is enabled. >>> * config/aarch64/aarch64-bti-insert.c: New file. >>> * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert >>> bti pass. >>> * config/aarch64/aarch64-protos.h (make_pass_insert_bti): >>> Declare the new bti pass. >>> * config/aarch64/aarch64.md (bti_nop): Define. >>> * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o. >>> >>> *** gcc/testsuite/ChangeLog *** >>> >>> 2018-xx-xx Sudakshina Das >>> >>> * gcc.target/aarch64/bti-1.c: New test. >>> * gcc.target/aarch64/bti-2.c: New test. >>> * lib/target-supports.exp >>> (check_effective_target_aarch64_bti_hw): Add new check for >>> BTI hw. >>> >> >> Updated patch attached with more comments and a bit of simplification >> in aarch64-bti-insert.c. ChangeLog still applies. >> >> Thanks >> Sudi >> > > I found a missed case in the bti pass and edited the patch to include > it. This made me realize that the only 2 regressions I saw with the > BTI enabled model can now be avoided. (as quoted below from my 6/6 > patch) > "Bootstrapped and regression tested with aarch64-none-linux-gnu with > and without the configure option turned on. > Also tested on aarch64-none-elf with and without configure option with a > BTI enabled aem. Only 2 regressions and these were because newlib > requires patches to protect hand coded libraries with BTI." > > The ChangeLog still applies. > > Sudi > I have updated the patch according to our discussions offline. The md pattern is now split into 4 patterns and i have added a new test for the setjmp case along with some comments where missing. *** gcc/ChangeLog *** 2018-xx-xx Sudakshina Das Ramana Radhakrishnan * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o. * gcc/config/aarch64/aarch64.h: Update comment for TRAMPOLINE_SIZE. * config/aarch64/aarch64.c (aarch64_asm_trampoline_template): Update if bti is enabled. * config/aarch64/aarch64-bti-insert.c: New file. * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert bti pass. * config/aarch64/aarch64-protos.h (make_pass_insert_bti): Declare the new
Re: [PATCH, GCC, AARCH64, 3/6] Restrict indirect tail calls to x16 and x17
Hi On 02/11/18 18:37, Sudakshina Das wrote: > Hi > > This patch is part of a series that enables ARMv8.5-A in GCC and > adds Branch Target Identification Mechanism. > (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) > > This patch changes the registers that are allowed for indirect tail > calls. We are choosing to restrict these to only x16 or x17. > > Indirect tail calls are special in a way that they convert a call > statement (BLR instruction) to a jump statement (BR instruction). For > the best possible use of Branch Target Identification Mechanism, we > would like to place a "BTI C" (call) at the beginning of the function > which is only compatible with BLRs and BR X16/X17. In order to make > indirect tail calls compatible with this scenario, we are restricting > the TAILCALL_ADDR_REGS. > > In order to use x16/x17 for this purpose, we also had to change the use > of these registers in the epilogue/prologue handling. For this purpose > we are now using x12 and x13 named as EP0_REGNUM and EP1_REGNUM as > scratch registers for epilogue and prologue. > > Bootstrapped and regression tested with aarch64-none-linux-gnu. Updated > test. Ran Spec2017 and no performance hit. > > Is this ok for trunk? > > Thanks > Sudi > > > *** gcc/ChangeLog*** > > 2018-xx-xx Sudakshina Das > >* config/aarch64/aarch64.c (aarch64_expand_prologue): Use new >epilogue/prologue scratch registers EP0_REGNUM and EP1_REGNUM. >(aarch64_expand_epilogue): Likewise. >(aarch64_output_mi_thunk): Likewise >* config/aarch64/aarch64.h (REG_CLASS_CONTENTS): Change > TAILCALL_ADDR_REGS >to x16 and x17. > * config/aarch64/aarch64.md: Define EP0_REGNUM and EP1_REGNUM. > > *** gcc/testsuite/ChangeLog *** > > 2018-xx-xx Sudakshina Das > >* gcc.target/aarch64/test_frame_17.c: Update to check for > EP0_REGNUM instead of IP0_REGNUM and add test case. > I have edited the patch to take out a change that was not needed as part of this patch in aarch64_expand_epilogue. The only change now happening there is as mentioned in the ChangeLog to replace the uses of IP0/IP1. ChangeLog still applies. Thanks Sudi diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 4bec6bd963d91c475a4e18f883955093e9268cfd..cc95be32d40268d3647c8280188f17ff8212a156 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -586,7 +586,7 @@ enum reg_class #define REG_CLASS_CONTENTS \ { \ { 0x, 0x, 0x }, /* NO_REGS */ \ - { 0x0004, 0x, 0x }, /* TAILCALL_ADDR_REGS */\ + { 0x0003, 0x, 0x }, /* TAILCALL_ADDR_REGS */\ { 0x7fff, 0x, 0x0003 }, /* GENERAL_REGS */ \ { 0x8000, 0x, 0x }, /* STACK_REG */ \ { 0x, 0x, 0x0003 }, /* POINTER_REGS */ \ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index da7430f1fd88566c4f017a1b491f8de7dce724e8..f4ff300b883ce832335a4915b22bcbfefe64d9ae 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5357,8 +5357,8 @@ aarch64_expand_prologue (void) aarch64_emit_probe_stack_range (get_stack_check_protect (), frame_size); } - rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM); - rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM); + rtx tmp0_rtx = gen_rtx_REG (Pmode, EP0_REGNUM); + rtx tmp1_rtx = gen_rtx_REG (Pmode, EP1_REGNUM); /* In theory we should never have both an initial adjustment and a callee save adjustment. Verify that is the case since the @@ -5368,7 +5368,7 @@ aarch64_expand_prologue (void) /* Will only probe if the initial adjustment is larger than the guard less the amount of the guard reserved for use by the caller's outgoing args. */ - aarch64_allocate_and_probe_stack_space (ip0_rtx, ip1_rtx, initial_adjust, + aarch64_allocate_and_probe_stack_space (tmp0_rtx, tmp1_rtx, initial_adjust, true, false); if (callee_adjust != 0) @@ -5386,7 +5386,7 @@ aarch64_expand_prologue (void) } aarch64_add_offset (Pmode, hard_frame_pointer_rtx, stack_pointer_rtx, callee_offset, - ip1_rtx, ip0_rtx, frame_pointer_needed); + tmp1_rtx, tmp0_rtx, frame_pointer_needed); if (frame_pointer_needed && !frame_size.is_constant ()) { /* Variable-sized frames need to describe the save slot @@ -5428,7 +5428,7 @@ aarch64_expand_prologue (void) /* We may need to probe the final adjustment if it is larger than the guard that is assumed by the called. */ - aarch64_allocate_and_probe_stack_space (ip1_rtx, ip0_rtx, final_adjust, + aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, final_adjust
Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.
Hi On 13/11/18 14:47, Sudakshina Das wrote: > Hi > > On 02/11/18 18:38, Sudakshina Das wrote: >> Hi >> >> This patch is part of a series that enables ARMv8.5-A in GCC and >> adds Branch Target Identification Mechanism. >> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) >> >> This patch adds a new pass called "bti" which is triggered by the >> command line argument -mbranch-protection whenever "bti" is turned on. >> >> The pass iterates through the instructions and adds appropriated BTI >> instructions based on the following: >> * Add a new "BTI C" at the beginning of a function, unless its already >> protected by a "PACIASP/PACIBSP". We exempt the functions that are >> only called directly. >> * Add a new "BTI J" for every target of an indirect jump, jump table >> targets, non-local goto targets or labels that might be referenced >> by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL) >> >> Since we have already changed the use of indirect tail calls to only x16 >> and x17, we do not have to use "BTI JC". >> (check patch 3/6). >> > > I missed out on the explanation for the changes to the trampoline code. > The patch also updates the trampoline code in case BTI is enabled. Since > the trampoline code is a target of an indirect branch, we need to add an > appropriate BTI instruction at the beginning of it to avoid a branch > target exception. > >> Bootstrapped and regression tested with aarch64-none-linux-gnu. Added >> new tests. >> Is this ok for trunk? >> >> Thanks >> Sudi >> >> *** gcc/ChangeLog *** >> >> 2018-xx-xx Sudakshina Das >> Ramana Radhakrishnan >> >> * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o. >> * gcc/config/aarch64/aarch64.h: Update comment for >> TRAMPOLINE_SIZE. >> * config/aarch64/aarch64.c (aarch64_asm_trampoline_template): >> Update if bti is enabled. >> * config/aarch64/aarch64-bti-insert.c: New file. >> * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert >> bti pass. >> * config/aarch64/aarch64-protos.h (make_pass_insert_bti): >> Declare the new bti pass. >> * config/aarch64/aarch64.md (bti_nop): Define. >> * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o. >> >> *** gcc/testsuite/ChangeLog *** >> >> 2018-xx-xx Sudakshina Das >> >> * gcc.target/aarch64/bti-1.c: New test. >> * gcc.target/aarch64/bti-2.c: New test. >> * lib/target-supports.exp >> (check_effective_target_aarch64_bti_hw): Add new check for >> BTI hw. >> > > Updated patch attached with more comments and a bit of simplification > in aarch64-bti-insert.c. ChangeLog still applies. > > Thanks > Sudi > I found a missed case in the bti pass and edited the patch to include it. This made me realize that the only 2 regressions I saw with the BTI enabled model can now be avoided. (as quoted below from my 6/6 patch) "Bootstrapped and regression tested with aarch64-none-linux-gnu with and without the configure option turned on. Also tested on aarch64-none-elf with and without configure option with a BTI enabled aem. Only 2 regressions and these were because newlib requires patches to protect hand coded libraries with BTI." The ChangeLog still applies. Sudi diff --git a/gcc/config.gcc b/gcc/config.gcc index b108697cfc7b1c9c6dc1f30cca6fd1158182c29e..3e77f9df6ad6ca55fccca50387eab4b2501af647 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -317,7 +317,7 @@ aarch64*-*-*) c_target_objs="aarch64-c.o" cxx_target_objs="aarch64-c.o" d_target_objs="aarch64-d.o" - extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o" + extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o" target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c" target_has_targetm_common=yes ;; diff --git a/gcc/config/aarch64/aarch64-bti-insert.c b/gcc/config/aarch64/aarch64-bti-insert.c new file mode 100644 index ..be604fb2fd5df052971cc81b7e6d7760880a6b79 --- /dev/null +++ b/gcc/config/aarch64/aarch64-bti-insert.c @@ -0,0 +1,236 @@ +/* Branch Target Identification for AArch64 architecture. + Copyright (C) 2018 Free Software Foundation, Inc. + Contributed by Arm Ltd. + + This file is part
Re: [PATCH, GCC, AARCH64, 6/6] Enable BTI: Add configure option for BTI and PAC-RET
Hi James On 07/11/18 15:36, James Greenhalgh wrote: > On Fri, Nov 02, 2018 at 01:38:46PM -0500, Sudakshina Das wrote: >> Hi >> >> This patch is part of a series that enables ARMv8.5-A in GCC and >> adds Branch Target Identification Mechanism. >> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) >> >> This patch is adding a new configure option for enabling and return >> address signing by default with --enable-standard-branch-protection. >> This is equivalent to -mbranch-protection=standard which would >> imply -mbranch-protection=pac-ret+bti. >> >> Bootstrapped and regression tested with aarch64-none-linux-gnu with >> and without the configure option turned on. >> Also tested on aarch64-none-elf with and without configure option with a >> BTI enabled aem. Only 2 regressions and these were because newlib >> requires patches to protect hand coded libraries with BTI. >> >> Is this ok for trunk? > > With a tweak to the comment above your changes in aarch64.c, yes this is OK. > >> *** gcc/ChangeLog *** >> >> 2018-xx-xx Sudakshina Das >> >> * config/aarch64/aarch64.c (aarch64_override_options): Add case to check >> configure option to set BTI and Return Address Signing. >> * configure.ac: Add --enable-standard-branch-protection and >> --disable-standard-branch-protection. >> * configure: Regenerated. >> * doc/install.texi: Document the same. >> >> *** gcc/testsuite/ChangeLog *** >> >> 2018-xx-xx Sudakshina Das >> >> * gcc.target/aarch64/bti-1.c: Update test to not add command >> line option when configure with bti. >> * gcc.target/aarch64/bti-2.c: Likewise. >> * lib/target-supports.exp >> (check_effective_target_default_branch_protection): >> Add configure check for --enable-standard-branch-protection. >> > >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index >> 12a55a640de4fdc5df21d313c7ea6841f1daf3f2..a1a5b7b464eaa2ce67ac66d9aea837159590aa07 >> 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -11558,6 +11558,26 @@ aarch64_override_options (void) >> if (!selected_tune) >> selected_tune = selected_cpu; >> >> + if (aarch64_enable_bti == 2) >> +{ >> +#ifdef TARGET_ENABLE_BTI >> + aarch64_enable_bti = 1; >> +#else >> + aarch64_enable_bti = 0; >> +#endif >> +} >> + >> + /* No command-line option yet. */ > > This is too broad. Can you narrow this down to which command line option this > relates to, and what the expected default behaviours are (for both LP64 and > ILP32). > Updated patch attached. Return address signing is not supported for ILP32 currently. This patch just follows that and hence the extra ILP32 check is added. Thanks Sudi > Thanks, > James > >> + if (accepted_branch_protection_string == NULL && !TARGET_ILP32) >> +{ >> +#ifdef TARGET_ENABLE_PAC_RET >> + aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF; >> + aarch64_ra_sign_key = AARCH64_KEY_A; >> +#else >> + aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE; >> +#endif >> +} >> + >> #ifndef HAVE_AS_MABI_OPTION >> /* The compiler may have been configured with 2.23.* binutils, which does >>not have support for ILP32. */ > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b97d9e4deecf5ca33761dfd1008c39bb4b849881..e267d3441fd7f21105bfba339b69f2ecdb7595ae 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11579,6 +11579,28 @@ aarch64_override_options (void) if (!selected_tune) selected_tune = selected_cpu; + if (aarch64_enable_bti == 2) +{ +#ifdef TARGET_ENABLE_BTI + aarch64_enable_bti = 1; +#else + aarch64_enable_bti = 0; +#endif +} + + /* Return address signing is currently not supported for ILP32 targets. For + LP64 targets use the configured option in the absence of a command-line + option for -mbranch-protection. */ + if (!TARGET_ILP32 && accepted_branch_protection_string == NULL) +{ +#ifdef TARGET_ENABLE_PAC_RET + aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF; + aarch64_ra_sign_key = AARCH64_KEY_A; +#else + aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE; +#endif +} + #ifndef HAVE_AS_MABI_OPTION /* The compiler may have been configured with 2.23.* binutils, which does not have support for ILP32. */ diff --git a/gcc/configure b/gcc/configure index 03461f1e2753
Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.
Hi On 02/11/18 18:38, Sudakshina Das wrote: > Hi > > This patch is part of a series that enables ARMv8.5-A in GCC and > adds Branch Target Identification Mechanism. > (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) > > This patch adds a new pass called "bti" which is triggered by the > command line argument -mbranch-protection whenever "bti" is turned on. > > The pass iterates through the instructions and adds appropriated BTI > instructions based on the following: > * Add a new "BTI C" at the beginning of a function, unless its already >protected by a "PACIASP/PACIBSP". We exempt the functions that are >only called directly. > * Add a new "BTI J" for every target of an indirect jump, jump table >targets, non-local goto targets or labels that might be referenced >by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL) > > Since we have already changed the use of indirect tail calls to only x16 > and x17, we do not have to use "BTI JC". > (check patch 3/6). > I missed out on the explanation for the changes to the trampoline code. The patch also updates the trampoline code in case BTI is enabled. Since the trampoline code is a target of an indirect branch, we need to add an appropriate BTI instruction at the beginning of it to avoid a branch target exception. > Bootstrapped and regression tested with aarch64-none-linux-gnu. Added > new tests. > Is this ok for trunk? > > Thanks > Sudi > > *** gcc/ChangeLog *** > > 2018-xx-xx Sudakshina Das > Ramana Radhakrishnan > > * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o. > * gcc/config/aarch64/aarch64.h: Update comment for > TRAMPOLINE_SIZE. > * config/aarch64/aarch64.c (aarch64_asm_trampoline_template): > Update if bti is enabled. > * config/aarch64/aarch64-bti-insert.c: New file. > * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert > bti pass. > * config/aarch64/aarch64-protos.h (make_pass_insert_bti): > Declare the new bti pass. > * config/aarch64/aarch64.md (bti_nop): Define. > * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o. > > *** gcc/testsuite/ChangeLog *** > > 2018-xx-xx Sudakshina Das > > * gcc.target/aarch64/bti-1.c: New test. > * gcc.target/aarch64/bti-2.c: New test. > * lib/target-supports.exp > (check_effective_target_aarch64_bti_hw): Add new check for > BTI hw. > Updated patch attached with more comments and a bit of simplification in aarch64-bti-insert.c. ChangeLog still applies. Thanks Sudi diff --git a/gcc/config.gcc b/gcc/config.gcc index b108697cfc7b1c9c6dc1f30cca6fd1158182c29e..3e77f9df6ad6ca55fccca50387eab4b2501af647 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -317,7 +317,7 @@ aarch64*-*-*) c_target_objs="aarch64-c.o" cxx_target_objs="aarch64-c.o" d_target_objs="aarch64-d.o" - extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o" + extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o" target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c" target_has_targetm_common=yes ;; diff --git a/gcc/config/aarch64/aarch64-bti-insert.c b/gcc/config/aarch64/aarch64-bti-insert.c new file mode 100644 index ..15202e0def3b514bdbd1564b39a121e43e01a67f --- /dev/null +++ b/gcc/config/aarch64/aarch64-bti-insert.c @@ -0,0 +1,226 @@ +/* Branch Target Identification for AArch64 architecture. + Copyright (C) 2018 Free Software Foundation, Inc. + Contributed by Arm Ltd. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#define INCLUDE_STRING +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "target.h" +#include "rtl.h" +#include &qu
Re: [PATCH, GCC, AARCH64, 1/6] Enable ARMv8.5-A in gcc
Hi James On 07/11/18 15:16, James Greenhalgh wrote: > On Fri, Nov 02, 2018 at 01:37:33PM -0500, Sudakshina Das wrote: >> Hi >> >> This patch is part of a series that enables ARMv8.5-A in GCC and >> adds Branch Target Identification Mechanism. >> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) >> >> This patch add the march option for armv8.5-a. >> >> Bootstrapped and regression tested with aarch64-none-linux-gnu. >> Is this ok for trunk? > > One minor tweak, otherwise OK. > >> *** gcc/ChangeLog *** >> >> 2018-xx-xx Sudakshina Das >> >> * config/aarch64/aarch64-arches.def: Define AARCH64_ARCH for >> ARMv8.5-A. >> * gcc/config/aarch64/aarch64.h (AARCH64_FL_V8_5): New. >> (AARCH64_FL_FOR_ARCH8_5, AARCH64_ISA_V8_5): New. >> * gcc/doc/invoke.texi: Document ARMv8.5-A. > >> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h >> index >> fa9af26fd40fd23b1c9cd6da9b6300fd77089103..b324cdd2fede33af13c03362750401f9eb1c9a90 >> 100644 >> --- a/gcc/config/aarch64/aarch64.h >> +++ b/gcc/config/aarch64/aarch64.h >> @@ -170,6 +170,8 @@ extern unsigned aarch64_architecture_version; >> #define AARCH64_FL_SHA3 (1 << 18) /* Has ARMv8.4-a SHA3 and >> SHA512. */ >> #define AARCH64_FL_F16FML (1 << 19) /* Has ARMv8.4-a FP16 extensions. >> */ >> #define AARCH64_FL_RCPC8_4(1 << 20) /* Has ARMv8.4-a RCPC extensions. >> */ >> +/* ARMv8.5-A architecture extensions. */ >> +#define AARCH64_FL_V8_5 (1 << 22) /* Has ARMv8.5-A features. */ >> >> /* Statistical Profiling extensions. */ >> #define AARCH64_FL_PROFILE(1 << 21) > > Let's keep this in order. 20, 21, 22. > I have the moved the Armv8.5 stuff below. Patch attached. If this looks ok, I will rebase 2/6 on top. Let me know if you want me to resend the rebased 2/6 too. Thanks Sudi > Thanks, > James > > diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index a37a5553894d6ab1d629017ea204478f69d8773d..7d05cd604093d15f27e5b197803a50c45a260e6e 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -35,5 +35,6 @@ AARCH64_ARCH("armv8.1-a", generic, 8_1A, 8, AARCH64_FL_FOR_ARCH8_1) AARCH64_ARCH("armv8.2-a", generic, 8_2A, 8, AARCH64_FL_FOR_ARCH8_2) AARCH64_ARCH("armv8.3-a", generic, 8_3A, 8, AARCH64_FL_FOR_ARCH8_3) AARCH64_ARCH("armv8.4-a", generic, 8_4A, 8, AARCH64_FL_FOR_ARCH8_4) +AARCH64_ARCH("armv8.5-a", generic, 8_5A, 8, AARCH64_FL_FOR_ARCH8_5) #undef AARCH64_ARCH diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 8ab21e7bc37c7d5ffba1a365345f70d9f501b3ac..8ce8445586f29963107848604c5e2bab8e853685 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -177,6 +177,9 @@ extern unsigned aarch64_architecture_version; /* Statistical Profiling extensions. */ #define AARCH64_FL_PROFILE(1 << 21) +/* ARMv8.5-A architecture extensions. */ +#define AARCH64_FL_V8_5 (1 << 22) /* Has ARMv8.5-A features. */ + /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) @@ -195,6 +198,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_FL_FOR_ARCH8_4 \ (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \ | AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4) +#define AARCH64_FL_FOR_ARCH8_5 \ + (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5) /* Macros to test ISA flags. */ @@ -216,6 +221,7 @@ extern unsigned aarch64_architecture_version; #define AARCH64_ISA_SHA3 (aarch64_isa_flags & AARCH64_FL_SHA3) #define AARCH64_ISA_F16FML (aarch64_isa_flags & AARCH64_FL_F16FML) #define AARCH64_ISA_RCPC8_4 (aarch64_isa_flags & AARCH64_FL_RCPC8_4) +#define AARCH64_ISA_V8_5 (aarch64_isa_flags & AARCH64_FL_V8_5) /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 3e54087ab98049ba932caa34ba2fb135eda48396..26770c5aafda1524d63a89cacf8cc069b7c8b9b6 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -15118,8 +15118,11 @@ more feature modifiers. This option has the form @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}. The permissible values for @var{arch} are @samp{armv8-a}, -@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @samp{armv8.4-a} -or @var{native}. +@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a}, @samp{armv8.4-a}, +@samp{armv8.5-a} or @var{native}. + +The value @samp{armv8.5-a} implies @samp{armv8.4-a} and enables compiler +support for the ARMv8.5-A architecture extensions. The value @samp{armv8.4-a} implies @samp{armv8.3-a} and enables compiler support for the ARMv8.4-A architecture extensions.
Re: [PATCH 2/3][GCC][AARCH64] Add new -mbranch-protection option to combine pointer signing and BTI
Hi Sam On 02/11/18 17:31, Sam Tebbs wrote: > Hi all, > > The -mbranch-protection option combines the functionality of > -msign-return-address and the BTI features new in Armv8.5 to better reflect > their relationship. This new option therefore supersedes and deprecates the > existing -msign-return-address option. > > -mbranch-protection=[none|standard|] - Turns on different types of > branch > protection available where: > > * "none": Turn of all types of branch protection > * "standard" : Turns on all the types of protection to their respective > standard levels. > * can be "+" separated protection types: > > * "bti" : Branch Target Identification Mechanism. > * "pac-ret{+leaf+b-key}": Return Address Signing. The default return > address signing is enabled by signing functions that save the return > address to memory (non-leaf functions will practically always do this) > using the a-key. The optional tuning arguments allow the user to > extend the scope of return address signing to include leaf functions > and to change the key to b-key. The tuning arguments must proceed the > protection type "pac-ret". > > Thus -mbranch-protection=standard -> -mbranch-protection=bti+pac-ret. > > Its mapping to -msign-return-address is as follows: > > * -mbranch-protection=none -> -msign-return-address=none > * -mbranch-protection=standard -> -msign-return-address=leaf > * -mbranch-protection=pac-ret -> -msign-return-address=non-leaf > * -mbranch-protection=pac-ret+leaf -> -msign-return-address=all > > This patch implements the option's skeleton and the "none", "standard" and > "pac-ret" types (along with its "leaf" subtype). > > The previous patch in this series is here: > https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00103.html > > Bootstrapped successfully and tested on aarch64-none-elf with no regressions. > > OK for trunk? > Thank for doing this. I am not a maintainer so you will need a maintainer's approval. Only nit, that I would add is that it would be good to have more test coverage, specially for the new parsing functions that have been added and the errors that are added. Example checking a few valid and invalid combinations of the options like: -mbranch-protection=pac-ret -mbranch-protection=none //disables everything -mbranch-protection=leaf //errors out -mbranch-protection=none+pac-ret //errors out ... etc Also instead of removing all the old deprecated options, you can keep one (or a copy of one) to check for the deprecated warning. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e290128f535f3e6b515bff5a81fae0aa0d1c8baf..07cfe69dc3dd9161a2dd93089ccf52ef251208d2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -15221,13 +15222,18 @@ accessed using a single instruction and emitted after each function. This limits the maximum size of functions to 1MB. This is enabled by default for @option{-mcmodel=tiny}. -@item -msign-return-address=@var{scope} -@opindex msign-return-address -Select the function scope on which return address signing will be applied. -Permissible values are @samp{none}, which disables return address signing, -@samp{non-leaf}, which enables pointer signing for functions which are not leaf -functions, and @samp{all}, which enables pointer signing for all functions. The -default value is @samp{none}. +@item -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}] +@opindex mbranch-protection +Select the branch protection features to use. +@samp{none} is the default and turns off all types of branch protection. +@samp{standard} turns on all types of branch protection features. If a feature +has additional tuning options, then @samp{standard} sets it to its standard +level. +@samp{pac-ret[+@var{leaf}]} turns on return address signing to its standard +level: signing functions that save the return address to memory (non-leaf +functions will practically always do this) using the a-key. The optional +argument @samp{leaf} can be used to extend the signing to include leaf +functions. I am not sure if deleting the previous documentation of -msign-retun-address is the way to go. Maybe add a "this has been deprecated and refer to -mbranch-protection" to its description. Thanks Sudi > gcc/ChangeLog: > > 2018-11-02 Sam Tebbs > > * config/aarch64/aarch64.c (BRANCH_PROTEC_STR_MAX, > aarch64_parse_branch_protection, > struct aarch64_branch_protec_type, > aarch64_handle_no_branch_protection, > aarch64_handle_standard_branch_protection, > aarch64_validate_mbranch_protection, > aarch64_handle_pac_ret_protection, > aarch64_handle_attr_branch_protection, > accepted_branch_protection_string, > aarch64_pac_ret_subtypes, > aarch64_branch_protec_types, > aarch64_handle_pac_ret_leaf): Define. > (aarch64_override_options_after_change_1): Add
Re: [PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM
Hi Kyrill On 09/11/18 18:21, Kyrill Tkachov wrote: > Hi Sudi, > > On 09/11/18 15:33, Sudakshina Das wrote: >> Hi >> >> This patch adds -march=armv8.5-a to the Arm backend. >> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) >> >> >> Armv8.5-A also adds two new security features: >> - Speculation Barrier instruction >> - Execution and Data Prediction Restriction Instructions >> These are made optional to all older Armv8-A versions. Thus we are >> adding two new options "+sb" and "+predres" to all older Armv8-A. These >> are passed on to the assembler and have no code generation effects and >> have already gone in the trunk of binutils. >> >> Bootstrapped and regression tested with arm-none-linux-gnueabihf. >> >> Is this ok for trunk? >> Sudi >> >> *** gcc/ChangeLog *** >> >> 2018-xx-xx Sudakshina Das >> >> * config/arm/arm-cpus.in (armv8_5, sb, predres): New features. >> (ARMv8_5a): New fgroup. >> (armv8.5-a): New arch. >> (armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New >> options sb and predres. >> * config/arm/arm-tables.opt: Regenerate. >> * config/arm/t-aprofile: Add matching rules for -march=armv8.5-a >> * config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a. >> * config/arm/t-multilib (v8_5_a_simd_variants): New variable. >> Add matching rules for -march=armv8.5-a and extensions. >> * doc/invoke.texi (ARM options): Document -march=armv8.5-a. >> Add sb and predres to all armv8-a except armv8.5-a. >> >> *** gcc/testsuite/ChangeLog *** >> >> 2018-xx-xx Sudakshina Das >> >> * gcc.target/arm/multilib.exp: Add some -march=armv8.5-a >> combination tests. > > Hi > > This patch adds -march=armv8.5-a to the Arm backend. > (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) > > > Armv8.5-A also adds two new security features: > - Speculation Barrier instruction > - Execution and Data Prediction Restriction Instructions > These are made optional to all older Armv8-A versions. Thus we are > adding two new options "+sb" and "+predres" to all older Armv8-A. These > are passed on to the assembler and have no code generation effects and > have already gone in the trunk of binutils. > > Bootstrapped and regression tested with arm-none-linux-gnueabihf. > > Is this ok for trunk? > Sudi > > *** gcc/ChangeLog *** > > 2018-xx-xx Sudakshina Das > > * config/arm/arm-cpus.in (armv8_5, sb, predres): New features. > (ARMv8_5a): New fgroup. > (armv8.5-a): New arch. > (armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New > options sb and predres. > * config/arm/arm-tables.opt: Regenerate. > * config/arm/t-aprofile: Add matching rules for -march=armv8.5-a > * config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a. > * config/arm/t-multilib (v8_5_a_simd_variants): New variable. > Add matching rules for -march=armv8.5-a and extensions. > * doc/invoke.texi (ARM options): Document -march=armv8.5-a. > Add sb and predres to all armv8-a except armv8.5-a. > > *** gcc/testsuite/ChangeLog *** > > 2018-xx-xx Sudakshina Das > > * gcc.target/arm/multilib.exp: Add some -march=armv8.5-a > combination tests. > > > > This is ok modulo a typo fix below. > > Thanks, > Kyrill > Thanks. Fixed and committed as r266031. Sudi > > > index > 25788ad09851daf41038b1578307bf23b7f34a94..eba038f9d20bc54bef7bdb7fa1c0e7028d954ed7 > > 100644 > --- a/gcc/config/arm/t-multilib > +++ b/gcc/config/arm/t-multilib > @@ -70,7 +70,8 @@ v8_a_simd_variants := $(call all_feat_combs, simd > crypto) > v8_1_a_simd_variants := $(call all_feat_combs, simd crypto) > v8_2_a_simd_variants := $(call all_feat_combs, simd fp16 fp16fml > crypto dotprod) > v8_4_a_simd_variants := $(call all_feat_combs, simd fp16 crypto) > -v8_r_nosimd_variants := +crc > +v8_5_a_simd_variants := $(call all_feat_combs, simd fp16 crypto) > +v8_r_nosimd_variants := +cr5 > > > Typo, should be +crc > > >
[PATCH, GCC, AArch64] Branch Dilution Pass
Hi I am posting this patch on behalf of Carey (cc'ed). I also have some review comments that I will make as a reply to this later. This implements a new AArch64 specific back-end pass that helps optimize branch-dense code, which can be a bottleneck for performance on some Arm cores. This is achieved by padding out the branch-dense sections of the instruction stream with nops. This has proven to show up to a 2.61%~ improvement on the Cortex A-72 (SPEC CPU 2006: sjeng). The implementation includes the addition of a new RTX instruction class FILLER_INSN, which has been white listed to allow placement of NOPs outside of a basic block. This is to allow padding after unconditional branches. This is favorable so that any performance gained from diluting branches is not paid straight back via excessive eating of nops. It was deemed that a new RTX class was less invasive than modifying behavior in regards to standard UNSPEC nops. ## Command Line Options Three new target-specific options are provided: - mbranch-dilution - mbranch-dilution-granularity={num} - mbranch-dilution-max-branches={num} A number of cores known to be able to benefit from this pass have been given default tuning values for their granularity and max-branches. Each affected core has a very specific granule size and associated max-branch limit. This is a microarchitecture specific optimization. Typical usage should be -mdilute-branches with a specificed -mcpu. Cores with a granularity tuned to 0 will be ignored. Options are provided for experimentation. ## Algorithm and Heuristic The pass takes a very simple 'sliding window' approach to the problem. We crawl through each instruction (starting at the first branch) and keep track of the number of branches within the current "granule" (or window). When this exceeds the max-branch value, the pass will dilute the current granule, inserting nops to push out some of the branches. The heuristic will favour unconditonal branches (for performance reasons), or branches that are between two other branches (in order to decrease the likelihood of another dilution call being needed). Each branch type required a different method for nop insertion due to RTL/basic_block restrictions: - Returning calls do not end a basic block so can be handled by emitting a generic nop. - Unconditional branches must be the end of a basic block, and nops cannot be outside of a basic block. Thus the need for FILLER_INSN, which allows placement outside of a basic block - and translates to a nop. - For most conditional branches we've taken a simple approach and only handle the fallthru edge for simplicity, which we do by inserting a "nop block" of nops on the fallthru edge, mapping that back to the original destination block. - asm gotos and pcsets are going to be tricky to analyse from a dilution perspective so are ignored at present. ## Changelog gcc/testsuite/ChangeLog: 2018-11-09 Carey Williams * gcc.target/aarch64/branch-dilution-off.c: New test. * gcc.target/aarch64/branch-dilution-on.c: New test. gcc/ChangeLog: 2018-11-09 Carey Williams * cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case. * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside basic blocks. * config.gcc (extra_objs): Add aarch64-branch-dilution.o. * config/aarch64/aarch64-branch-dilution.c: New file. * config/aarch64/aarch64-passes.def (branch-dilution): Register pass. * config/aarch64/aarch64-protos.h (struct tune_params): Declare tuning parameters bdilution_gsize and bdilution_maxb. (make_pass_branch_dilution): New declaration. * config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings, cortexa53_tunings,cortexa57_tunings,cortexa72_tunings, cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings, thunderx_tunings,tsv110_tunings,xgene1_tunings, qdf24xx_tunings,saphira_tunings,thunderx2t99_tunings): Provide default tunings for bdilution_gsize and bdilution_maxb. * config/aarch64/aarch64.md (filler_insn): Define new insn. * config/aarch64/aarch64.opt (mbranch-dilution, mbranch-dilution-granularity, mbranch-dilution-max-branches): Define new branch dilution options. * config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule for aarch64-branch-dilution.c. * coretypes.h (rtx_filler_insn): New rtx class. * doc/invoke.texi (mbranch-dilution, mbranch-dilution-granularity, mbranch-dilution-max-branches): Document branch dilution options. * emit-rtl.c (emit_filler_after): New emit function. * rtl.def (FILLER_INSN): New RTL EXPR of type RTX_INSN. * rtl.h (class GTY): New class for rtx_filler_insn. (is_a_helper ::test): New test helper for rtx_filler_insn. (macro FILLER_INSN_P(X)): New predicate. * target-insns.def
[PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM
Hi This patch adds -march=armv8.5-a to the Arm backend. (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) Armv8.5-A also adds two new security features: - Speculation Barrier instruction - Execution and Data Prediction Restriction Instructions These are made optional to all older Armv8-A versions. Thus we are adding two new options "+sb" and "+predres" to all older Armv8-A. These are passed on to the assembler and have no code generation effects and have already gone in the trunk of binutils. Bootstrapped and regression tested with arm-none-linux-gnueabihf. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-xx-xx Sudakshina Das * config/arm/arm-cpus.in (armv8_5, sb, predres): New features. (ARMv8_5a): New fgroup. (armv8.5-a): New arch. (armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New options sb and predres. * config/arm/arm-tables.opt: Regenerate. * config/arm/t-aprofile: Add matching rules for -march=armv8.5-a * config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a. * config/arm/t-multilib (v8_5_a_simd_variants): New variable. Add matching rules for -march=armv8.5-a and extensions. * doc/invoke.texi (ARM options): Document -march=armv8.5-a. Add sb and predres to all armv8-a except armv8.5-a. *** gcc/testsuite/ChangeLog *** 2018-xx-xx Sudakshina Das * gcc.target/arm/multilib.exp: Add some -march=armv8.5-a combination tests. diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index d82e95a226659948e59b317f07e0fd386ed674a2..e6bcc3c720b64f4c80d9bff101e756de82d760e6 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -114,6 +114,9 @@ define feature armv8_3 # Architecture rel 8.4. define feature armv8_4 +# Architecture rel 8.5. +define feature armv8_5 + # M-Profile security extensions. define feature cmse @@ -174,6 +177,14 @@ define feature quirk_cm3_ldrd # (Very) slow multiply operations. Should probably be a tuning bit. define feature smallmul +# Speculation Barrier Instruction for v8-A architectures, added by +# default to v8.5-A +define feature sb + +# Execution and Data Prediction Restriction Instruction for +# v8-A architectures, added by default from v8.5-A +define feature predres + # Feature groups. Conventionally all (or mostly) upper case. # ALL_FPU lists all the feature bits associated with the floating-point # unit; these will all be removed if the floating-point unit is disabled @@ -235,6 +246,7 @@ define fgroup ARMv8_1aARMv8a crc32 armv8_1 define fgroup ARMv8_2aARMv8_1a armv8_2 define fgroup ARMv8_3aARMv8_2a armv8_3 define fgroup ARMv8_4aARMv8_3a armv8_4 +define fgroup ARMv8_5aARMv8_4a armv8_5 sb predres define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv define fgroup ARMv8m_main ARMv7m armv8 cmse define fgroup ARMv8r ARMv8a @@ -505,6 +517,8 @@ begin arch armv8-a option crypto add FP_ARMv8 CRYPTO option nocrypto remove ALL_CRYPTO option nofp remove ALL_FP + option sb add sb + option predres add predres end arch armv8-a begin arch armv8.1-a @@ -517,6 +531,8 @@ begin arch armv8.1-a option crypto add FP_ARMv8 CRYPTO option nocrypto remove ALL_CRYPTO option nofp remove ALL_FP + option sb add sb + option predres add predres end arch armv8.1-a begin arch armv8.2-a @@ -532,6 +548,8 @@ begin arch armv8.2-a option nocrypto remove ALL_CRYPTO option nofp remove ALL_FP option dotprod add FP_ARMv8 DOTPROD + option sb add sb + option predres add predres end arch armv8.2-a begin arch armv8.3-a @@ -547,6 +565,8 @@ begin arch armv8.3-a option nocrypto remove ALL_CRYPTO option nofp remove ALL_FP option dotprod add FP_ARMv8 DOTPROD + option sb add sb + option predres add predres end arch armv8.3-a begin arch armv8.4-a @@ -560,8 +580,23 @@ begin arch armv8.4-a option crypto add FP_ARMv8 CRYPTO DOTPROD option nocrypto remove ALL_CRYPTO option nofp remove ALL_FP + option sb add sb + option predres add predres end arch armv8.4-a +begin arch armv8.5-a + tune for cortex-a53 + tune flags CO_PROC + base 8A + profile A + isa ARMv8_5a + option simd add FP_ARMv8 DOTPROD + option fp16 add fp16 fp16fml FP_ARMv8 DOTPROD + option crypto add FP_ARMv8 CRYPTO DOTPROD + option nocrypto remove ALL_CRYPTO + option nofp remove ALL_FP +end arch armv8.5-a + begin arch armv8-m.base tune for cortex-m23 base 8M_BASE diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index eacee746a39912d04aa03c636f9a95e0e72ce43b..dde6e137db5598d92df6a1e69a63140146bf7372 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -377,19 +377,22 @@ EnumValue Enum(arm_arch) String(armv8.4-a) Value(24) EnumValue -Enum(arm_arch) String(armv8-m.base) Value(25) +Enum(arm_arch) String(armv8.5-a) Value(25) EnumValue -Enum(arm_arch) String(armv8-m.main) Value(26) +Enum(arm_arch)
Re: [PATCH, arm] Backport -- Fix ICE during thunk generation with -mlong-calls
Hi Mihail On 08/11/18 10:02, Ramana Radhakrishnan wrote: > On 07/11/2018 17:49, Mihail Ionescu wrote: >> Hi All, >> >> This is a backport from trunk for GCC 8 and 7. >> >> SVN revision: r264595. >> >> Regression tested on arm-none-eabi. >> >> >> gcc/ChangeLog >> >> 2018-11-02 Mihail Ionescu >> >> Backport from mainiline >> 2018-09-26 Eric Botcazou >> >> * config/arm/arm.c (arm_reorg): Skip Thumb reorg pass for thunks. >> (arm32_output_mi_thunk): Deal with long calls. >> >> gcc/testsuite/ChangeLog >> >> 2018-11-02 Mihail Ionescu >> >> Backport from mainiline >>2018-09-17 Eric Botcazou >> >> * g++.dg/other/thunk2a.C: New test. >> * g++.dg/other/thunk2b.C: Likewise. >> >> >> If everything is ok, could someone commit it on my behalf? >> >> Best regards, >> Mihail >> > > It is a regression since my rewrite of this code. > > Ok to backport to the release branches, it's been on trunk for a while > and not shown any issues - please give the release managers a day or so > to object. > > regards > Ramana > Does this fix PR87867 you reported? If yes, then it would be easier to add the PR tag in the ChangeLog so that the ticket gets updated once committed. Thanks Sudi
Re: [PATCH, GCC, AARCH64, 1/6] Enable ARMv8.5-A in gcc
Hi On 02/11/18 18:37, Sudakshina Das wrote: > Hi > > This patch is part of a series that enables ARMv8.5-A in GCC and > adds Branch Target Identification Mechanism. > (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) > > > > This patch add the march option for armv8.5-a. > > Bootstrapped and regression tested with aarch64-none-linux-gnu. > Is this ok for trunk? > > Thanks > Sudi > > > *** gcc/ChangeLog *** > > 2018-xx-xx Sudakshina Das > > * config/aarch64/aarch64-arches.def: Define AARCH64_ARCH for > ARMv8.5-A. > * gcc/config/aarch64/aarch64.h (AARCH64_FL_V8_5): New. > (AARCH64_FL_FOR_ARCH8_5, AARCH64_ISA_V8_5): New. > * gcc/doc/invoke.texi: Document ARMv8.5-A. > As per an offline chat earlier with Richard, I was supposed to send future patch series as a reply on a single thread. Sadly I forgot to do that this time. So I am adding links of the other patches here to make it easy to link the series: [PATCH, GCC, AARCH64, 2/6] Add new arch command line feaures from ARMv8.5-A : https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00111.html [PATCH, GCC, AARCH64, 3/6] Restrict indirect tail calls to x16 and x17: https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00113.html [PATCH, GCC, AARCH64, 4/6] Enable BTI: Add new to -mbranch-protection: https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00114.html [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI: https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00115.html [PATCH, GCC, AARCH64, 6/6] Enable BTI: Add configure option for BTI and PAC-RET: https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00116.html Sorry! Sudi
[PATCH, GCC, AARCH64, 6/6] Enable BTI: Add configure option for BTI and PAC-RET
Hi This patch is part of a series that enables ARMv8.5-A in GCC and adds Branch Target Identification Mechanism. (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) This patch is adding a new configure option for enabling and return address signing by default with --enable-standard-branch-protection. This is equivalent to -mbranch-protection=standard which would imply -mbranch-protection=pac-ret+bti. Bootstrapped and regression tested with aarch64-none-linux-gnu with and without the configure option turned on. Also tested on aarch64-none-elf with and without configure option with a BTI enabled aem. Only 2 regressions and these were because newlib requires patches to protect hand coded libraries with BTI. Is this ok for trunk? Thanks Sudi *** gcc/ChangeLog *** 2018-xx-xx Sudakshina Das * config/aarch64/aarch64.c (aarch64_override_options): Add case to check configure option to set BTI and Return Address Signing. * configure.ac: Add --enable-standard-branch-protection and --disable-standard-branch-protection. * configure: Regenerated. * doc/install.texi: Document the same. *** gcc/testsuite/ChangeLog *** 2018-xx-xx Sudakshina Das * gcc.target/aarch64/bti-1.c: Update test to not add command line option when configure with bti. * gcc.target/aarch64/bti-2.c: Likewise. * lib/target-supports.exp (check_effective_target_default_branch_protection): Add configure check for --enable-standard-branch-protection. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 12a55a640de4fdc5df21d313c7ea6841f1daf3f2..a1a5b7b464eaa2ce67ac66d9aea837159590aa07 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11558,6 +11558,26 @@ aarch64_override_options (void) if (!selected_tune) selected_tune = selected_cpu; + if (aarch64_enable_bti == 2) +{ +#ifdef TARGET_ENABLE_BTI + aarch64_enable_bti = 1; +#else + aarch64_enable_bti = 0; +#endif +} + + /* No command-line option yet. */ + if (accepted_branch_protection_string == NULL && !TARGET_ILP32) +{ +#ifdef TARGET_ENABLE_PAC_RET + aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF; + aarch64_ra_sign_key = AARCH64_KEY_A; +#else + aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE; +#endif +} + #ifndef HAVE_AS_MABI_OPTION /* The compiler may have been configured with 2.23.* binutils, which does not have support for ILP32. */ diff --git a/gcc/configure b/gcc/configure index 03461f1e27538a3a0791c2b61b0e75c3ff1a25be..a0f95106c22ee858bbf4516f14cd9d265dede272 100755 --- a/gcc/configure +++ b/gcc/configure @@ -947,6 +947,7 @@ with_plugin_ld enable_gnu_indirect_function enable_initfini_array enable_comdat +enable_standard_branch_protection enable_fix_cortex_a53_835769 enable_fix_cortex_a53_843419 with_glibc_version @@ -1677,6 +1678,14 @@ Optional Features: --enable-initfini-array use .init_array/.fini_array sections --enable-comdat enable COMDAT group support + --enable-standard-branch-protection + enable Branch Target Identification Mechanism and + Return Address Signing by default for AArch64 + --disable-standard-branch-protection + disable Branch Target Identification Mechanism and + Return Address Signing by default for AArch64 + + --enable-fix-cortex-a53-835769 enable workaround for AArch64 Cortex-A53 erratum 835769 by default @@ -18529,7 +18538,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18532 "configure" +#line 18541 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -18635,7 +18644,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18638 "configure" +#line 18647 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -24939,6 +24948,25 @@ $as_echo "#define HAVE_AS_SMALL_PIC_RELOCS 1" >>confdefs.h fi +# Enable Branch Target Identification Mechanism and Return Address +# Signing by default. +# Check whether --enable-standard-branch-protection was given. +if test "${enable_standard_branch_protection+set}" = set; then : + enableval=$enable_standard_branch_protection; +case $enableval in + yes) +tm_defines="${tm_defines} TARGET_ENABLE_BTI=1 TARGET_ENABLE_PAC_RET=1" +;; + no) +;; + *) +as_fn_error "'$enableval' is an invalid value for --enable-standard-branch-protection.\ + Valid choices are 'yes' and 'no'." "$LINENO" 5 +
[PATCH, GCC, AARCH64, 4/6] Enable BTI: Add new to -mbranch-protection.
Hi This patch is part of a series that enables ARMv8.5-A in GCC and adds Branch Target Identification Mechanism. (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) NOTE: This patch is dependent on Sam Tebbs patch to deprecate -msign-return-address and add new -mbranch-protection option https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00104.html This pass updates the CLI of -mbranch-protection to add "bti" as a new type of branch protection and also add it its definition of "none" and "standard". Since the BTI instructions, just like the return address signing instructions are in the HINT space, this option is not limited to ARMv8.5-A architecture version. The option does not really do anything functional. The functional changes are in the next patch. I am initializing the target variable aarch64_enable_bti to 2 since I am also adding a configure option in a later patch and a value different from 0 and 1 would help identify if its already been updated. Bootstrapped and regression tested with aarch64-none-linux-gnu. Is this ok for trunk? Thanks Sudi *** gcc/ChangeLog *** 2018-xx-xx Sudakshina Das * config/aarch64/aarch64-protos.h (aarch64_bti_enabled): Declare. * config/aarch64/aarch64.c (aarch64_handle_no_branch_protection): Disable bti for -mbranch-protection=none. (aarch64_handle_standard_branch_protection): Enable bti for -mbranch-protection=standard. (aarch64_handle_bti_protection): Enable bti for "bti" in the string to -mbranch-protection. (aarch64_bti_enabled): Check if bti is enabled. * config/aarch64/aarch64.opt: Declare target variable. * doc/invoke.texi: Add bti to the -mbranch-protection documentation. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index bba8204fa53083da49d00a8c2b29e62849bd233c..a5ccfe534b6c59c90bd91215f89c59d67fd88688 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -525,6 +525,7 @@ void aarch64_register_pragmas (void); void aarch64_relayout_simd_types (void); void aarch64_reset_previous_fndecl (void); bool aarch64_return_address_signing_enabled (void); +bool aarch64_bti_enabled (void); void aarch64_save_restore_target_globals (tree); void aarch64_addti_scratch_regs (rtx, rtx, rtx *, rtx *, rtx *, diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 039aec828d7dae60918493abb0d044001ac0b366..836275ab58de894529a72be88ff226da503598dc 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1140,6 +1140,7 @@ static enum aarch64_parse_opt_result aarch64_handle_no_branch_protection (char* str ATTRIBUTE_UNUSED, char* rest) { aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE; + aarch64_enable_bti = 0; if (rest) { error ("unexpected %<%s%> after %<%s%>", rest, str); @@ -1154,6 +1155,7 @@ aarch64_handle_standard_branch_protection (char* str ATTRIBUTE_UNUSED, { aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF; aarch64_ra_sign_key = AARCH64_KEY_A; + aarch64_enable_bti = 1; if (rest) { error ("unexpected %<%s%> after %<%s%>", rest, str); @@ -1187,6 +1189,14 @@ aarch64_handle_pac_ret_b_key (char* str ATTRIBUTE_UNUSED, return AARCH64_PARSE_OK; } +static enum aarch64_parse_opt_result +aarch64_handle_bti_protection (char* str ATTRIBUTE_UNUSED, +char* rest ATTRIBUTE_UNUSED) +{ + aarch64_enable_bti = 1; + return AARCH64_PARSE_OK; +} + static const struct aarch64_branch_protec_type aarch64_pac_ret_subtypes[] = { { "leaf", aarch64_handle_pac_ret_leaf, NULL, 0 }, { "b-key", aarch64_handle_pac_ret_b_key, NULL, 0 }, @@ -1198,6 +1208,7 @@ static const struct aarch64_branch_protec_type aarch64_branch_protec_types[] = { { "standard", aarch64_handle_standard_branch_protection, NULL, 0 }, { "pac-ret", aarch64_handle_pac_ret_protection, aarch64_pac_ret_subtypes, sizeof (aarch64_pac_ret_subtypes) / sizeof (aarch64_branch_protec_type) }, + { "bti", aarch64_handle_bti_protection, NULL, 0 }, { NULL, NULL, NULL, 0 } }; @@ -4581,6 +4592,13 @@ aarch64_return_address_signing_enabled (void) && cfun->machine->frame.reg_offset[LR_REGNUM] >= 0)); } +/* Return TRUE if Branch Target Identification Mechanism is enabled. */ +bool +aarch64_bti_enabled (void) +{ + return (aarch64_enable_bti == 1); +} + /* Emit code to save the callee-saved registers from register number START to LIMIT to the stack at the location starting at offset START_OFFSET, skipping any write-back candidates if SKIP_WB is true. */ diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 9460636d93b67af1525f028176aa78e6fed4e45f..fc2064bd688490765b977eca777245986274d268 100644
[PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.
Hi This patch is part of a series that enables ARMv8.5-A in GCC and adds Branch Target Identification Mechanism. (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) This patch adds a new pass called "bti" which is triggered by the command line argument -mbranch-protection whenever "bti" is turned on. The pass iterates through the instructions and adds appropriated BTI instructions based on the following: * Add a new "BTI C" at the beginning of a function, unless its already protected by a "PACIASP/PACIBSP". We exempt the functions that are only called directly. * Add a new "BTI J" for every target of an indirect jump, jump table targets, non-local goto targets or labels that might be referenced by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL) Since we have already changed the use of indirect tail calls to only x16 and x17, we do not have to use "BTI JC". (check patch 3/6). Bootstrapped and regression tested with aarch64-none-linux-gnu. Added new tests. Is this ok for trunk? Thanks Sudi *** gcc/ChangeLog *** 2018-xx-xx Sudakshina Das Ramana Radhakrishnan * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o. * gcc/config/aarch64/aarch64.h: Update comment for TRAMPOLINE_SIZE. * config/aarch64/aarch64.c (aarch64_asm_trampoline_template): Update if bti is enabled. * config/aarch64/aarch64-bti-insert.c: New file. * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert bti pass. * config/aarch64/aarch64-protos.h (make_pass_insert_bti): Declare the new bti pass. * config/aarch64/aarch64.md (bti_nop): Define. * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o. *** gcc/testsuite/ChangeLog *** 2018-xx-xx Sudakshina Das * gcc.target/aarch64/bti-1.c: New test. * gcc.target/aarch64/bti-2.c: New test. * lib/target-supports.exp (check_effective_target_aarch64_bti_hw): Add new check for BTI hw. diff --git a/gcc/config.gcc b/gcc/config.gcc index b108697cfc7b1c9c6dc1f30cca6fd1158182c29e..3e77f9df6ad6ca55fccca50387eab4b2501af647 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -317,7 +317,7 @@ aarch64*-*-*) c_target_objs="aarch64-c.o" cxx_target_objs="aarch64-c.o" d_target_objs="aarch64-d.o" - extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o" + extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o" target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c" target_has_targetm_common=yes ;; diff --git a/gcc/config/aarch64/aarch64-bti-insert.c b/gcc/config/aarch64/aarch64-bti-insert.c new file mode 100644 index ..efd57620d8803302e03ca643b9f2495e188dc19b --- /dev/null +++ b/gcc/config/aarch64/aarch64-bti-insert.c @@ -0,0 +1,195 @@ +/* Branch Target Identification for AArch64 architecture. + Copyright (C) 2018 Free Software Foundation, Inc. + Contributed by Arm Ltd. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#define INCLUDE_STRING +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "target.h" +#include "rtl.h" +#include "tree.h" +#include "memmodel.h" +#include "gimple.h" +#include "tm_p.h" +#include "stringpool.h" +#include "attribs.h" +#include "emit-rtl.h" +#include "gimplify.h" +#include "gimple-iterator.h" +#include "dumpfile.h" +#include "rtl-iter.h" +#include "cfgrtl.h" +#include "tree-pass.h" +#include "cgraph.h" + +namespace { + +const pass_data pass_data_insert_bti = +{ + RTL_PASS, /* type. */ + "bti", /* name. */ + OPTGROUP_NONE, /* optinfo_flags. */ + TV_MACH_DEP, /* tv_id. */ + 0, /* properties_required. */ + 0, /* properties_provided. */ + 0, /* properties_destroyed. */ +
[PATCH, GCC, AARCH64, 3/6] Restrict indirect tail calls to x16 and x17
Hi This patch is part of a series that enables ARMv8.5-A in GCC and adds Branch Target Identification Mechanism. (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) This patch changes the registers that are allowed for indirect tail calls. We are choosing to restrict these to only x16 or x17. Indirect tail calls are special in a way that they convert a call statement (BLR instruction) to a jump statement (BR instruction). For the best possible use of Branch Target Identification Mechanism, we would like to place a "BTI C" (call) at the beginning of the function which is only compatible with BLRs and BR X16/X17. In order to make indirect tail calls compatible with this scenario, we are restricting the TAILCALL_ADDR_REGS. In order to use x16/x17 for this purpose, we also had to change the use of these registers in the epilogue/prologue handling. For this purpose we are now using x12 and x13 named as EP0_REGNUM and EP1_REGNUM as scratch registers for epilogue and prologue. Bootstrapped and regression tested with aarch64-none-linux-gnu. Updated test. Ran Spec2017 and no performance hit. Is this ok for trunk? Thanks Sudi *** gcc/ChangeLog*** 2018-xx-xx Sudakshina Das * config/aarch64/aarch64.c (aarch64_expand_prologue): Use new epilogue/prologue scratch registers EP0_REGNUM and EP1_REGNUM. (aarch64_expand_epilogue): Likewise. (aarch64_output_mi_thunk): Likewise * config/aarch64/aarch64.h (REG_CLASS_CONTENTS): Change TAILCALL_ADDR_REGS to x16 and x17. * config/aarch64/aarch64.md: Define EP0_REGNUM and EP1_REGNUM. *** gcc/testsuite/ChangeLog *** 2018-xx-xx Sudakshina Das * gcc.target/aarch64/test_frame_17.c: Update to check for EP0_REGNUM instead of IP0_REGNUM and add test case. diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 94184049c9c77d858fd5b3e2a8970a48b70f7529..8e7a8d54351cf7eb1774a474bfbfbebf58070e31 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -579,7 +579,7 @@ enum reg_class #define REG_CLASS_CONTENTS \ { \ { 0x, 0x, 0x }, /* NO_REGS */ \ - { 0x0004, 0x, 0x }, /* TAILCALL_ADDR_REGS */\ + { 0x0003, 0x, 0x }, /* TAILCALL_ADDR_REGS */\ { 0x7fff, 0x, 0x0003 }, /* GENERAL_REGS */ \ { 0x8000, 0x, 0x }, /* STACK_REG */ \ { 0x, 0x, 0x0003 }, /* POINTER_REGS */ \ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 27f81b654a2bae3ddd87b99e4b7926cc588a95f5..f9a81f1734e6885662f6a9e6c97bdbcdac24211b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5317,8 +5317,8 @@ aarch64_expand_prologue (void) aarch64_emit_probe_stack_range (get_stack_check_protect (), frame_size); } - rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM); - rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM); + rtx tmp0_rtx = gen_rtx_REG (Pmode, EP0_REGNUM); + rtx tmp1_rtx = gen_rtx_REG (Pmode, EP1_REGNUM); /* In theory we should never have both an initial adjustment and a callee save adjustment. Verify that is the case since the @@ -5328,7 +5328,7 @@ aarch64_expand_prologue (void) /* Will only probe if the initial adjustment is larger than the guard less the amount of the guard reserved for use by the caller's outgoing args. */ - aarch64_allocate_and_probe_stack_space (ip0_rtx, ip1_rtx, initial_adjust, + aarch64_allocate_and_probe_stack_space (tmp0_rtx, tmp1_rtx, initial_adjust, true, false); if (callee_adjust != 0) @@ -5346,7 +5346,7 @@ aarch64_expand_prologue (void) } aarch64_add_offset (Pmode, hard_frame_pointer_rtx, stack_pointer_rtx, callee_offset, - ip1_rtx, ip0_rtx, frame_pointer_needed); + tmp1_rtx, tmp0_rtx, frame_pointer_needed); if (frame_pointer_needed && !frame_size.is_constant ()) { /* Variable-sized frames need to describe the save slot @@ -5388,7 +5388,7 @@ aarch64_expand_prologue (void) /* We may need to probe the final adjustment if it is larger than the guard that is assumed by the called. */ - aarch64_allocate_and_probe_stack_space (ip1_rtx, ip0_rtx, final_adjust, + aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, final_adjust, !frame_pointer_needed, true); } @@ -5426,8 +5426,8 @@ aarch64_expand_epilogue (bool for_sibcall) unsigned reg2 = cfun->machine->frame.wb_candidate2; rtx cfi_ops = NULL; rtx_insn *insn; - /* A stack clash protection prologue may not have left IP0_REGNUM or - IP1_REGNUM in a usable state. The same is true for allocations + /* A stack clash protection prologue may not have left EP0_REGNUM or + EP1_REGNUM in a usable state. The same is true for allocations with an SVE component, since we then need both temporary
[PATCH, GCC, AARCH64, 2/6] Add new arch command line feaures from ARMv8.5-A
Hi This patch is part of a series that enables ARMv8.5-A in GCC and adds Branch Target Identification Mechanism. (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) This patch add all the command line feature that are added by ARMv8.5. Optional extensions to armv8.5-a: +rng : Random number Generation Instructions. +memtag : Memory Tagging Extension. ARMv8.5-A features that are optional to older arch: +sb : Speculation barrier instruction. +ssbs: Speculative Store Bypass Safe instruction. +predres: Execution and Data Prediction Restriction instructions. All of the above only effect the assembler and have already (or almost for a couple of cases) gone in the trunk of binutils. Bootstrapped and regression tested with aarch64-none-linux-gnu. Is this ok for trunk? Thanks Sudi *** gcc/ChangeLog *** 2018-xx-xx Sudakshina Das * config/aarch64/aarch64-option-extensions.def: Define AARCH64_OPT_EXTENSION for memtag, rng, sb, ssbs and predres. * gcc/config/aarch64/aarch64.h (AARCH64_FL_RNG): New. (AARCH64_FL_MEMTAG, ARCH64_FL_SB, AARCH64_FL_SSBS): New. (AARCH64_FL_PREDRES): New. (AARCH64_FL_FOR_ARCH8_5): Add AARCH64_FL_SB, AARCH64_FL_SSBS and AARCH64_FL_PREDRES by default. * gcc/doc/invoke.texi: Document rng, memtag, sb, ssbs and predres. diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 69ab796a4e1a959b89ebb55b599919c442cfb088..ed669a63061ba5e1595840943176077af7e69988 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -108,4 +108,19 @@ AARCH64_OPT_EXTENSION("sve", AARCH64_FL_SVE, AARCH64_FL_FP | AARCH64_FL_SIMD | A /* Enabling/Disabling "profile" does not enable/disable any other feature. */ AARCH64_OPT_EXTENSION("profile", AARCH64_FL_PROFILE, 0, 0, "") +/* Enabling/Disabling "rng" only changes "rng". */ +AARCH64_OPT_EXTENSION("rng", AARCH64_FL_RNG, 0, 0, "") + +/* Enabling/Disabling "memtag" only changes "memtag". */ +AARCH64_OPT_EXTENSION("memtag", AARCH64_FL_MEMTAG, 0, 0, "") + +/* Enabling/Disabling "sb" only changes "sb". */ +AARCH64_OPT_EXTENSION("sb", AARCH64_FL_SB, 0, 0, "") + +/* Enabling/Disabling "ssbs" only changes "ssbs". */ +AARCH64_OPT_EXTENSION("ssbs", AARCH64_FL_SSBS, 0, 0, "") + +/* Enabling/Disabling "predres" only changes "predres". */ +AARCH64_OPT_EXTENSION("predres", AARCH64_FL_PREDRES, 0, 0, "") + #undef AARCH64_OPT_EXTENSION diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index b324cdd2fede33af13c03362750401f9eb1c9a90..60325bb1b16c71e951ef18319872e8b0911e8d12 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -172,10 +172,22 @@ extern unsigned aarch64_architecture_version; #define AARCH64_FL_RCPC8_4(1 << 20) /* Has ARMv8.4-a RCPC extensions. */ /* ARMv8.5-A architecture extensions. */ #define AARCH64_FL_V8_5 (1 << 22) /* Has ARMv8.5-A features. */ +#define AARCH64_FL_RNG (1 << 23) /* ARMv8.5-A Random Number Insns. */ +#define AARCH64_FL_MEMTAG (1 << 24) /* ARMv8.5-A Memory Tagging + Extensions. */ /* Statistical Profiling extensions. */ #define AARCH64_FL_PROFILE(1 << 21) +/* Speculation Barrier instruction supported. */ +#define AARCH64_FL_SB (1 << 25) + +/* Speculative Store Bypass Safe instruction supported. */ +#define AARCH64_FL_SSBS (1 << 26) + +/* Execution and Data Prediction Restriction instructions supported. */ +#define AARCH64_FL_PREDRES(1 << 27) + /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) @@ -195,7 +207,8 @@ extern unsigned aarch64_architecture_version; (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \ | AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4) #define AARCH64_FL_FOR_ARCH8_5 \ - (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5) + (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5 \ + | AARCH64_FL_SB | AARCH64_FL_SSBS | AARCH64_FL_PREDRES) /* Macros to test ISA flags. */ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 0cf568b60dfb0fb260ca3708ea2d7e081d20cc8b..cc7420f3a84f9cd527c582114a9a96f406b63699 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -15287,6 +15287,27 @@ Use of this option with architectures prior to Armv8.2-A is not supported. @item profile Enable the Statistical Profiling extension. This option is only to enable the extension at the assembler level and does not affect code generation. +@item rng +Enable the Armv8.5-a Random Number instructions. This option is only to +enable the ex
[PATCH, GCC, AARCH64, 1/6] Enable ARMv8.5-A in gcc
Hi This patch is part of a series that enables ARMv8.5-A in GCC and adds Branch Target Identification Mechanism. (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools) This patch add the march option for armv8.5-a. Bootstrapped and regression tested with aarch64-none-linux-gnu. Is this ok for trunk? Thanks Sudi *** gcc/ChangeLog *** 2018-xx-xx Sudakshina Das * config/aarch64/aarch64-arches.def: Define AARCH64_ARCH for ARMv8.5-A. * gcc/config/aarch64/aarch64.h (AARCH64_FL_V8_5): New. (AARCH64_FL_FOR_ARCH8_5, AARCH64_ISA_V8_5): New. * gcc/doc/invoke.texi: Document ARMv8.5-A. diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index a37a5553894d6ab1d629017ea204478f69d8773d..7d05cd604093d15f27e5b197803a50c45a260e6e 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -35,5 +35,6 @@ AARCH64_ARCH("armv8.1-a", generic, 8_1A, 8, AARCH64_FL_FOR_ARCH8_1) AARCH64_ARCH("armv8.2-a", generic, 8_2A, 8, AARCH64_FL_FOR_ARCH8_2) AARCH64_ARCH("armv8.3-a", generic, 8_3A, 8, AARCH64_FL_FOR_ARCH8_3) AARCH64_ARCH("armv8.4-a", generic, 8_4A, 8, AARCH64_FL_FOR_ARCH8_4) +AARCH64_ARCH("armv8.5-a", generic, 8_5A, 8, AARCH64_FL_FOR_ARCH8_5) #undef AARCH64_ARCH diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index fa9af26fd40fd23b1c9cd6da9b6300fd77089103..b324cdd2fede33af13c03362750401f9eb1c9a90 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -170,6 +170,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_FL_SHA3 (1 << 18) /* Has ARMv8.4-a SHA3 and SHA512. */ #define AARCH64_FL_F16FML (1 << 19) /* Has ARMv8.4-a FP16 extensions. */ #define AARCH64_FL_RCPC8_4(1 << 20) /* Has ARMv8.4-a RCPC extensions. */ +/* ARMv8.5-A architecture extensions. */ +#define AARCH64_FL_V8_5 (1 << 22) /* Has ARMv8.5-A features. */ /* Statistical Profiling extensions. */ #define AARCH64_FL_PROFILE(1 << 21) @@ -192,6 +194,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_FL_FOR_ARCH8_4 \ (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \ | AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4) +#define AARCH64_FL_FOR_ARCH8_5 \ + (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5) /* Macros to test ISA flags. */ @@ -213,6 +217,7 @@ extern unsigned aarch64_architecture_version; #define AARCH64_ISA_SHA3 (aarch64_isa_flags & AARCH64_FL_SHA3) #define AARCH64_ISA_F16FML (aarch64_isa_flags & AARCH64_FL_F16FML) #define AARCH64_ISA_RCPC8_4 (aarch64_isa_flags & AARCH64_FL_RCPC8_4) +#define AARCH64_ISA_V8_5 (aarch64_isa_flags & AARCH64_FL_V8_5) /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 06a00a29de73aa509b6a15ebb34dfc182cf94cd2..c76c4fc223f9c46e517213eb6ad292c70aa1c89f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -15097,8 +15097,11 @@ more feature modifiers. This option has the form @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}. The permissible values for @var{arch} are @samp{armv8-a}, -@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @samp{armv8.4-a} -or @var{native}. +@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a}, @samp{armv8.4-a}, +@samp{armv8.5-a} or @var{native}. + +The value @samp{armv8.5-a} implies @samp{armv8.4-a} and enables compiler +support for the ARMv8.5-A architecture extensions. The value @samp{armv8.4-a} implies @samp{armv8.3-a} and enables compiler support for the ARMv8.4-A architecture extensions.
Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.
Hi Tamar On 13/08/18 17:27, Tamar Christina wrote: Hi Thomas, Thanks for the review. I’ll correct the typo before committing if I have no other changes required by a maintainer. Regards, Tamar. I am not a maintainer but I would like to point out something in your patch. I think you test case will fail with -mabi=ilp32 FAIL: gcc.target/aarch64/large_struct_copy_2.c (test for excess errors) Excess errors: /work/trunk/src/gcc/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c:18:27: warning: overflow in conversion from 'long long int' to 'long int' changes value from '4073709551611' to '2080555003' [-Woverflow] We have had more such recent failures and James gave a very neat way to make sure the mode comes out what you intend it to here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00233.html I would just ask you to change the data types accordingly and test it with -mabi=ilp32. Thanks Sudi From: Thomas Preudhomme Sent: Monday, August 13, 2018 14:37 To: Tamar Christina Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh ; Richard Earnshaw ; Marcus Shawcroft Subject: Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies. Hi Tamar, Thanks for your patch. Just one comment about your ChangeLog entry for the testsuiet change: shouldn't it mention that it is a new testcase? The patch you attached seems to create the file. Best regards, Thomas On Mon, 13 Aug 2018 at 10:33, Tamar Christina mailto:tamar.christ...@arm.com>> wrote: Hi All, On AArch64 we have integer modes larger than TImode, and while we can generate moves for these they're not as efficient. So instead make sure we limit the maximum we can copy to TImode. This means copying a 16 byte struct will issue 1 TImode copy, which will be done using a single STP as we expect but an CImode sized copy won't issue CImode operations. Bootstrapped and regtested on aarch4-none-linux-gnu and no issues. Crosstested aarch4_be-none-elf and no issues. Ok for trunk? Thanks, Tamar gcc/ 2018-08-13 Tamar Christina mailto:tamar.christ...@arm.com>> * config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max. gcc/testsuite/ 2018-08-13 Tamar Christina mailto:tamar.christ...@arm.com>> * gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan. --
Re: [PATCH][GCC][AARCH64] Use STLUR for atomic_store
Hi Matthew On 02/08/18 17:26, matthew.malcom...@arm.com wrote: Use the STLUR instruction introduced in Armv8.4-a. This insruction has the store-release semantic like STLR but can take a 9-bit unscaled signed immediate offset. Example test case: ``` void foo () { int32_t *atomic_vals = calloc (4, sizeof (int32_t)); atomic_store_explicit (atomic_vals + 1, 2, memory_order_release); } ``` Before patch generates ``` foo: stp x29, x30, [sp, -16]! mov x1, 4 mov x0, x1 mov x29, sp bl calloc mov w1, 2 add x0, x0, 4 stlrw1, [x0] ldp x29, x30, [sp], 16 ret ``` After patch generates ``` foo: stp x29, x30, [sp, -16]! mov x1, 4 mov x0, x1 mov x29, sp bl calloc mov w1, 2 stlur w1, [x0, 4] ldp x29, x30, [sp], 16 ret ``` Full bootstrap and regression test done on aarch64. Ok for trunk? gcc/ 2018-07-26 Matthew Malcomson * config/aarch64/aarch64-protos.h (aarch64_offset_9bit_signed_unscaled_p): New declaration. * config/aarch64/aarch64.c (aarch64_offset_9bit_signed_unscaled_p): Rename from offset_9bit_signed_unscaled_p. * config/aarch64/aarch64.h (TARGET_ARMV8_4): Add feature macro. * config/aarch64/atomics.md (atomic_store): Allow offset and use stlur. * config/aarch64/constraints.md (Ust): New constraint. * config/aarch64/predicates.md. (aarch64_sync_or_stlur_memory_operand): New predicate. gcc/testsuite/ 2018-07-26 Matthew Malcomson * gcc.target/aarch64/atomic-store.c: New. Thank you for doing this. I am not a maintainer but I have a few nits on this patch: diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index af5db9c595385f7586692258f750b6aceb3ed9c8..630a75bf776fcdc374aa9ffa4bb020fea3719320 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -393,6 +393,7 @@ void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx); bool aarch64_mov_operand_p (rtx, machine_mode); ... -static inline bool -offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED, +bool +aarch64_offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED, poly_int64 offset) This needs to be aligned with the first argument ... @@ -5837,7 +5837,7 @@ aarch64_classify_address (struct aarch64_address_info *info, ldr/str instructions (only big endian will get here). */ if (mode == CImode) return (aarch64_offset_7bit_signed_scaled_p (TImode, offset) - && (offset_9bit_signed_unscaled_p (V16QImode, offset + 32) + && (aarch64_offset_9bit_signed_unscaled_p (V16QImode, offset + 32) This is not less that 80 characters ... +;; STLUR instruction constraint requires Armv8.4 +(define_special_memory_constraint "Ust" + "@internal + A memory address suitable for use with an stlur instruction." + (and (match_operand 0 "aarch64_sync_or_stlur_memory_operand") + (match_test "TARGET_ARMV8_4"))) + You are already checking for TARGET_ARMV8_4 inside aarch64_sync_or_stlur_memory_operand. Also see my comment below for this function. ... +;; True if the operand is memory reference valid for one of a str or stlur +;; operation. +(define_predicate "aarch64_sync_or_stlur_memory_operand" + (ior (match_operand 0 "aarch64_sync_memory_operand") + (and (match_operand 0 "memory_operand") + (match_code "plus" "0") + (match_code "reg" "00") + (match_code "const_int" "01"))) +{ + if (aarch64_sync_memory_operand (op, mode)) +return true; + + if (!TARGET_ARMV8_4) +return false; + + rtx mem_op = XEXP (op, 0); + rtx plus_op0 = XEXP (mem_op, 0); + rtx plus_op1 = XEXP (mem_op, 1); + + if (GET_MODE (plus_op0) != DImode) +return false; + + poly_int64 offset; + poly_int_rtx_p (plus_op1, ); + return aarch64_offset_9bit_signed_unscaled_p (mode, offset); +}) + This predicate body makes it a bit mixed up with the two type of operands that you want to test especially looking at it from the constraint check perspective. I am assuming you would not want to use the non-immediate form of stlur and instead only use it in the form: STLUR , [, #] and use stlr for no immediate alternative. Thus the constraint does not need to check for aarch64_sync_memory_operand. My suggestion would be to make this operand check separate. Something like: +(define_predicate "aarch64_sync_or_stlur_memory_operand" + (ior (match_operand 0 "aarch64_sync_memory_operand") + (match_operand 0 "aarch64_stlur_memory_operand"))) Where you define aarch64_stlur_memory_operand as +bool aarch64_stlur_memory_operand (rtx op) +{ + if (!TARGET_ARMV8_4) +return false; + + rtx mem_op = XEXP
Re: [PATCH][GCC] Correct name of file in ChangeLog
Hi Matthew On 01/08/18 10:25, matthew.malcom...@arm.com wrote: My first patch included an incorrect ChangeLog entry -- the filename was misspelt. This corrects it. I think this counts as an obvious change. I have committed this on your behalf. Thanks Sudi
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
Hi On 31/07/18 22:48, Andrew Pinski wrote: On Tue, Jul 31, 2018 at 2:43 PM James Greenhalgh wrote: On Thu, Jul 12, 2018 at 12:01:09PM -0500, Sudakshina Das wrote: Hi Eric On 27/06/18 12:22, Wilco Dijkstra wrote: Eric Botcazou wrote: This test can easily be changed not to use optimize since it doesn't look like it needs it. We really need to tests these builtins properly, otherwise they will continue to fail on most targets. As far as I can see PR target/84521 has been reported only for Aarch64 so I'd just leave the other targets alone (and avoid propagating FUD if possible). It's quite obvious from PR84521 that this is an issue affecting all targets. Adding better generic tests for __builtin_setjmp can only be a good thing. Wilco This conversation seems to have died down and I would like to start it again. I would agree with Wilco's suggestion about keeping the test in the generic folder. I have removed the optimize attribute and the effect is still the same. It passes on AArch64 with this patch and it currently fails on x86 trunk (gcc version 9.0.0 20180712 (experimental) (GCC)) on -O1 and above. I don't see where the FUD comes in here; either this builtin has a defined semantics across targets and they are adhered to, or the builtin doesn't have well defined semantics, or the targets fail to implement those semantics. The problem comes from the fact the builtins are not documented at all. See PR59039 for the issue on them not being documented. Thanks @James for bringing this up again. I tried to revive the conversation on PR59039 while working on this as well but that conversation mainly focused on documenting if we are allowed to use __builtin_setjmp and __builtin_longjmp on the same function and with the same jmp buffer or not. This patch and this test case however does not involve that issue. There are other holes in the documentation/implementation of these builtins. For now as advised by James, I have posted the test case on the PR. I personally don't see why this test case should go on the AArch64 tests when it clearly fails on other targets as well. But if we can not come to an agreement on that, I am willing to move it to AArch64 tests and maybe open a new bug report which is not marked as "target" with the same test case. Thanks Sudi Thanks, Andrew I think this should go in as is. If other targets are unhappy with the failing test they should fix their target or skip the test if it is not appropriate. You may want to CC some of the maintainers of platforms you know to fail as a courtesy on the PR (add your testcase, and add failing targets and their maintainers to that PR) before committing so it doesn't come as a complete surprise. This is OK with some attempt to get target maintainers involved in the conversation before commit. Thanks, James diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index f284e74..9792d28 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version; #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM) #define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () -/* Don't use __builtin_setjmp until we've defined it. */ +/* Don't use __builtin_setjmp until we've defined it. + CAUTION: This macro is only used during exception unwinding. + Don't fall for its name. */ #undef DONT_USE_BUILTIN_SETJMP #define DONT_USE_BUILTIN_SETJMP 1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 01f35f8..4266a3d 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3998,7 +3998,7 @@ static bool aarch64_needs_frame_chain (void) { /* Force a frame chain for EH returns so the return address is at FP+8. */ - if (frame_pointer_needed || crtl->calls_eh_return) + if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label) return true; /* A leaf function cannot have calls or write LR. */ @@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE. */ +static rtx +aarch64_builtin_setjmp_frame_value (void) +{ + return hard_frame_pointer_rtx; +} + /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR. */ static tree @@ -17744,6 +17751,9 @@ aarch64_run_selftests (void) #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN aarch64_fold_builtin +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value + #undef TARGET_FUNCTION_ARG #define TARGET_FUNCTION_ARG aarch64_function_arg diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a014a01..d5f33d8 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -6087,6 +6087,30 @@ DONE; })
Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode
Hi Sam On 01/08/18 10:12, Sam Tebbs wrote: On 07/31/2018 11:16 PM, James Greenhalgh wrote: On Thu, Jul 26, 2018 at 11:52:15AM -0500, Sam Tebbs wrote: Thanks for making the changes and adding more test cases. I do however see that you are only covering 2 out of 4 new *aarch64_get_lane_zero_extenddi<> patterns. The *aarch64_get_lane_zero_extendsi<> were already existing. I don't mind those tests. I would just ask you to add the other two new patterns as well. Also since the different versions of the instruction generate same instructions (like foo_16qi and foo_8qi both give out the same instruction), I would suggest using a -fdump-rtl-final (or any relevant rtl dump) with the dg-options and using a scan-rtl-dump to scan the pattern name. Something like: /* { dg-do compile } */ /* { dg-options "-O3 -fdump-rtl-final" } */ ... ... /* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" "final" } } */ Thanks Sudi Hi Sudi, Thanks again. Here's an update that adds 4 more tests, so all 8 patterns generated are now tested for! This is OK for trunk, thanks for the patch (and thanks Sudi for the review!) Thanks, James Thank you James! I'd appreciate it if someone could commit it as I don't have commit rights yet. I have committed this on your behalf as r263200. Thanks Sudi Sam Below is the updated changelog gcc/ 2018-07-26 Sam Tebbs * config/aarch64/aarch64-simd.md (*aarch64_get_lane_zero_extendsi): Rename to... (*aarch64_get_lane_zero_extend): ... This. Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-26 Sam Tebbs * gcc.target/aarch64/extract_zero_extend.c: New file
Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode
Hi Sam On 25/07/18 14:08, Sam Tebbs wrote: On 07/23/2018 05:01 PM, Sudakshina Das wrote: Hi Sam On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote: Hi all, This patch extends the aarch64_get_lane_zero_extendsi instruction definition to also cover DI mode. This prevents a redundant AND instruction from being generated due to the pattern failing to be matched. Example: typedef char v16qi __attribute__ ((vector_size (16))); unsigned long long foo (v16qi a) { return a[0]; } Previously generated: foo: umov w0, v0.b[0] and x0, x0, 255 ret And now generates: foo: umov w0, v0.b[0] ret Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf with no regressions. gcc/ 2018-07-23 Sam Tebbs * config/aarch64/aarch64-simd.md (*aarch64_get_lane_zero_extendsi): Rename to... (*aarch64_get_lane_zero_extend): ... This. Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-23 Sam Tebbs * gcc.target/aarch64/extract_zero_extend.c: New file You will need an approval from a maintainer, but I would only add one request to this: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 89e38e6..15fb661 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3032,15 +3032,16 @@ [(set_attr "type" "neon_to_gp")] ) -(define_insn "*aarch64_get_lane_zero_extendsi" - [(set (match_operand:SI 0 "register_operand" "=r") - (zero_extend:SI +(define_insn "*aarch64_get_lane_zero_extend" + [(set (match_operand:GPI 0 "register_operand" "=r") + (zero_extend:GPI Since you are adding 4 new patterns with this change, could you add more cases in your test as well to make sure you have coverage for each of them. Thanks Sudi Hi Sudi, Thanks for the feedback. Here is an updated patch that adds more testcases to cover the patterns generated by the different mode combinations. The changelog and description from my original email still apply. Thanks it looks good to me! You will still need a maintainer to approve. Sudi (vec_select: (match_operand:VDQQH 1 "register_operand" "w") (parallel [(match_operand:SI 2 "immediate_operand" "i")]] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "umov\\t%w0, %1.[%2]"; } [(set_attr "type" "neon_to_gp")]
Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode
Hi Sam On 25/07/18 14:08, Sam Tebbs wrote: On 07/23/2018 05:01 PM, Sudakshina Das wrote: Hi Sam On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote: Hi all, This patch extends the aarch64_get_lane_zero_extendsi instruction definition to also cover DI mode. This prevents a redundant AND instruction from being generated due to the pattern failing to be matched. Example: typedef char v16qi __attribute__ ((vector_size (16))); unsigned long long foo (v16qi a) { return a[0]; } Previously generated: foo: umov w0, v0.b[0] and x0, x0, 255 ret And now generates: foo: umov w0, v0.b[0] ret Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf with no regressions. gcc/ 2018-07-23 Sam Tebbs * config/aarch64/aarch64-simd.md (*aarch64_get_lane_zero_extendsi): Rename to... (*aarch64_get_lane_zero_extend): ... This. Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-23 Sam Tebbs * gcc.target/aarch64/extract_zero_extend.c: New file You will need an approval from a maintainer, but I would only add one request to this: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 89e38e6..15fb661 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3032,15 +3032,16 @@ [(set_attr "type" "neon_to_gp")] ) -(define_insn "*aarch64_get_lane_zero_extendsi" - [(set (match_operand:SI 0 "register_operand" "=r") - (zero_extend:SI +(define_insn "*aarch64_get_lane_zero_extend" + [(set (match_operand:GPI 0 "register_operand" "=r") + (zero_extend:GPI Since you are adding 4 new patterns with this change, could you add more cases in your test as well to make sure you have coverage for each of them. Thanks Sudi Hi Sudi, Thanks for the feedback. Here is an updated patch that adds more testcases to cover the patterns generated by the different mode combinations. The changelog and description from my original email still apply. Thanks for making the changes and adding more test cases. I do however see that you are only covering 2 out of 4 new *aarch64_get_lane_zero_extenddi<> patterns. The *aarch64_get_lane_zero_extendsi<> were already existing. I don't mind those tests. I would just ask you to add the other two new patterns as well. Also since the different versions of the instruction generate same instructions (like foo_16qi and foo_8qi both give out the same instruction), I would suggest using a -fdump-rtl-final (or any relevant rtl dump) with the dg-options and using a scan-rtl-dump to scan the pattern name. Something like: /* { dg-do compile } */ /* { dg-options "-O3 -fdump-rtl-final" } */ ... ... /* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" "final" } } */ Thanks Sudi (vec_select: (match_operand:VDQQH 1 "register_operand" "w") (parallel [(match_operand:SI 2 "immediate_operand" "i")]] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "umov\\t%w0, %1.[%2]"; } [(set_attr "type" "neon_to_gp")]
Re: [PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64
Hi Vlad On Friday 20 July 2018 10:37 AM, Vlad Lazar wrote: Hi, The patch adds implementations for the NEON intrinsics vabsd_s64 and vnegd_s64. (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification) Bootstrapped and regtested on aarch64-none-linux-gnu and there are no regressions. OK for trunk? Thanks for doing this. This looks good to me but you will a maintainer's approval. Thanks Sudi Thanks, Vlad gcc/ 2018-07-02 Vlad Lazar * config/aarch64/arm_neon.h (vabsd_s64, vnegd_s64): New. gcc/testsuite/ 2018-07-02 Vlad Lazar * gcc.target/aarch64/scalar_intrinsics.c (test_vabsd_s64, test_vabsd_s64): New. --- diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 2d18400040f031dfcdaf60269ad484647804e1be..19e22431a85bcd09d0ea759b42b0a52420b6c43c 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -11822,6 +11822,13 @@ vabsq_s64 (int64x2_t __a) return __builtin_aarch64_absv2di (__a); } +__extension__ extern __inline int64_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vabsd_s64 (int64_t __a) +{ + return __builtin_aarch64_absdi (__a); +} + /* vadd */ __extension__ extern __inline int64_t @@ -22907,6 +22914,12 @@ vneg_s64 (int64x1_t __a) return -__a; } +__extension__ extern __inline int64_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vnegd_s64 (int64_t __a) +{ + return -__a; +} __extension__ extern __inline float32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vnegq_f32 (float32x4_t __a) diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c index ea29066e369b967d0781d31c8a5208bda9e4f685..45afeec373971838e0cd107038b4aa51a2d4998f 100644 --- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c +++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c @@ -603,6 +603,14 @@ test_vsqaddd_u64 (uint64_t a, int64_t b) return vsqaddd_u64 (a, b); } +/* { dg-final { scan-assembler-times "\\tabs\\td\[0-9\]+" 1 } } */ + +int64_t +test_vabsd_s64 (int64_t a) +{ + return vabsd_s64 (a); +} + /* { dg-final { scan-assembler-times "\\tsqabs\\tb\[0-9\]+" 1 } } */ int8_t @@ -627,6 +635,14 @@ test_vqabss_s32 (int32_t a) return vqabss_s32 (a); } +/* { dg-final { scan-assembler-times "\\tneg\\tx\[0-9\]+" 1 } } */ + +int64_t +test_vnegd_s64 (int64_t a) +{ + return vnegd_s64 (a); +} + /* { dg-final { scan-assembler-times "\\tsqneg\\tb\[0-9\]+" 1 } } */ int8_t
Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode
Hi Sam On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote: Hi all, This patch extends the aarch64_get_lane_zero_extendsi instruction definition to also cover DI mode. This prevents a redundant AND instruction from being generated due to the pattern failing to be matched. Example: typedef char v16qi __attribute__ ((vector_size (16))); unsigned long long foo (v16qi a) { return a[0]; } Previously generated: foo: umov w0, v0.b[0] and x0, x0, 255 ret And now generates: foo: umov w0, v0.b[0] ret Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf with no regressions. gcc/ 2018-07-23 Sam Tebbs * config/aarch64/aarch64-simd.md (*aarch64_get_lane_zero_extendsi): Rename to... (*aarch64_get_lane_zero_extend): ... This. Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-23 Sam Tebbs * gcc.target/aarch64/extract_zero_extend.c: New file You will need an approval from a maintainer, but I would only add one request to this: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 89e38e6..15fb661 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3032,15 +3032,16 @@ [(set_attr "type" "neon_to_gp")] ) -(define_insn "*aarch64_get_lane_zero_extendsi" - [(set (match_operand:SI 0 "register_operand" "=r") - (zero_extend:SI +(define_insn "*aarch64_get_lane_zero_extend" + [(set (match_operand:GPI 0 "register_operand" "=r") + (zero_extend:GPI Since you are adding 4 new patterns with this change, could you add more cases in your test as well to make sure you have coverage for each of them. Thanks Sudi (vec_select: (match_operand:VDQQH 1 "register_operand" "w") (parallel [(match_operand:SI 2 "immediate_operand" "i")]] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "umov\\t%w0, %1.[%2]"; } [(set_attr "type" "neon_to_gp")]
Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks
Hi Sam On 13/07/18 17:09, Sam Tebbs wrote: Hi all, This patch adds an optimisation that exploits the AArch64 BFXIL instruction when or-ing the result of two bitwise and operations with non-overlapping bitmasks (e.g. (a & 0x) | (b & 0x)). Example: unsigned long long combine(unsigned long long a, unsigned long long b) { return (a & 0xll) | (b & 0xll); } void read2(unsigned long long a, unsigned long long b, unsigned long long *c, unsigned long long *d) { *c = combine(a, b); *d = combine(b, a); } When compiled with -O2, read2 would result in: read2: and x5, x1, #0x and x4, x0, #0x orr x4, x4, x5 and x1, x1, #0x and x0, x0, #0x str x4, [x2] orr x0, x0, x1 str x0, [x3] ret But with this patch results in: read2: mov x4, x1 bfxil x4, x0, 0, 32 str x4, [x2] bfxil x0, x1, 0, 32 str x0, [x3] ret Bootstrapped and regtested on aarch64-none-linux-gnu and aarch64-none-elf with no regressions. I am not a maintainer but I have a question about this patch. I may be missing something or reading it wrong. So feel free to point it out: +(define_insn "*aarch64_bfxil" + [(set (match_operand:DI 0 "register_operand" "=r") + (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r") + (match_operand 3 "const_int_operand")) + (and:DI (match_operand:DI 2 "register_operand" "0") + (match_operand 4 "const_int_operand"] + "INTVAL (operands[3]) == ~INTVAL (operands[4]) + && aarch64_is_left_consecutive (INTVAL (operands[3]))" + { + HOST_WIDE_INT op4 = INTVAL (operands[4]); + operands[3] = GEN_INT (64 - ceil_log2 (op4)); + output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands); In the BFXIL you are reading %3 LSB bits from operand 1 and putting it in the LSBs of %0. This means that the pattern should be masking the 32-%3 MSB of %0 and %3 LSB of %1. So shouldn't operand 4 is LEFT_CONSECUTIVE> Can you please compare a simpler version of the above example you gave to make sure the generated assembly is equivalent before and after the patch: void read2(unsigned long long a, unsigned long long b, unsigned long long *c) { *c = combine(a, b); } From the above text read2: and x5, x1, #0x and x4, x0, #0x orr x4, x4, x5 read2: mov x4, x1 bfxil x4, x0, 0, 32 This does not seem equivalent to me. Thanks Sudi + return ""; + } + [(set_attr "type" "bfx")] +) gcc/ 2018-07-11 Sam Tebbs * config/aarch64/aarch64.md (*aarch64_bfxil, *aarch64_bfxil_alt): Define. * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive): Define. * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New function. gcc/testsuite 2018-07-11 Sam Tebbs * gcc.target/aarch64/combine_bfxil.c: New file. * gcc.target/aarch64/combine_bfxil_2.c: New file.
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
Hi Eric On 27/06/18 12:22, Wilco Dijkstra wrote: Eric Botcazou wrote: This test can easily be changed not to use optimize since it doesn't look like it needs it. We really need to tests these builtins properly, otherwise they will continue to fail on most targets. As far as I can see PR target/84521 has been reported only for Aarch64 so I'd just leave the other targets alone (and avoid propagating FUD if possible). It's quite obvious from PR84521 that this is an issue affecting all targets. Adding better generic tests for __builtin_setjmp can only be a good thing. Wilco This conversation seems to have died down and I would like to start it again. I would agree with Wilco's suggestion about keeping the test in the generic folder. I have removed the optimize attribute and the effect is still the same. It passes on AArch64 with this patch and it currently fails on x86 trunk (gcc version 9.0.0 20180712 (experimental) (GCC)) on -O1 and above. Thanks Sudi diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index f284e74..9792d28 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version; #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM) #define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () -/* Don't use __builtin_setjmp until we've defined it. */ +/* Don't use __builtin_setjmp until we've defined it. + CAUTION: This macro is only used during exception unwinding. + Don't fall for its name. */ #undef DONT_USE_BUILTIN_SETJMP #define DONT_USE_BUILTIN_SETJMP 1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 01f35f8..4266a3d 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3998,7 +3998,7 @@ static bool aarch64_needs_frame_chain (void) { /* Force a frame chain for EH returns so the return address is at FP+8. */ - if (frame_pointer_needed || crtl->calls_eh_return) + if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label) return true; /* A leaf function cannot have calls or write LR. */ @@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE. */ +static rtx +aarch64_builtin_setjmp_frame_value (void) +{ + return hard_frame_pointer_rtx; +} + /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR. */ static tree @@ -17744,6 +17751,9 @@ aarch64_run_selftests (void) #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN aarch64_fold_builtin +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value + #undef TARGET_FUNCTION_ARG #define TARGET_FUNCTION_ARG aarch64_function_arg diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a014a01..d5f33d8 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -6087,6 +6087,30 @@ DONE; }) +;; This is broadly similar to the builtins.c except that it uses +;; temporaries to load the incoming SP and FP. +(define_expand "nonlocal_goto" + [(use (match_operand 0 "general_operand")) + (use (match_operand 1 "general_operand")) + (use (match_operand 2 "general_operand")) + (use (match_operand 3 "general_operand"))] + "" +{ +rtx label_in = copy_to_reg (operands[1]); +rtx fp_in = copy_to_reg (operands[3]); +rtx sp_in = copy_to_reg (operands[2]); + +emit_move_insn (hard_frame_pointer_rtx, fp_in); +emit_stack_restore (SAVE_NONLOCAL, sp_in); + +emit_use (hard_frame_pointer_rtx); +emit_use (stack_pointer_rtx); + +emit_indirect_jump (label_in); + +DONE; +}) + ;; Helper for aarch64.c code. (define_expand "set_clobber_cc" [(parallel [(set (match_operand 0) diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c b/gcc/testsuite/gcc.c-torture/execute/pr84521.c new file mode 100644 index 000..564ef14 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c @@ -0,0 +1,53 @@ +/* { dg-require-effective-target indirect_jumps } */ + +#include +#include +#include + +jmp_buf buf; + +int uses_longjmp (void) +{ + jmp_buf buf2; + memcpy (buf2, buf, sizeof (buf)); + __builtin_longjmp (buf2, 1); +} + +int gl; +void after_longjmp (void) +{ + gl = 5; +} + +int +test_1 (int n) +{ + volatile int *p = alloca (n); + if (__builtin_setjmp (buf)) +{ + after_longjmp (); +} + else +{ + uses_longjmp (); +} + + return 0; +} + +int +test_2 (int n) +{ + int i; + int *ptr = (int *)__builtin_alloca (sizeof (int) * n); + for (i = 0; i < n; i++) +ptr[i] = i; + test_1 (n); + return 0; +} + +int main (int argc, const char **argv) +{ + __builtin_memset (, 0xaf, sizeof (buf)); + test_2 (100); +}
Re: [PATCH][GCC][AARCH64] Canonicalize aarch64 widening simd plus insns
Hi Matthew On 12/07/18 11:18, Richard Sandiford wrote: Looks good to me FWIW (not a maintainer), just a minor formatting thing: Matthew Malcomson writes: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index aac5fa146ed8dde4507a0eb4ad6a07ce78d2f0cd..67b29cbe2cad91e031ee23be656ec61a403f2cf9 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3302,38 +3302,78 @@ DONE; }) -(define_insn "aarch64_w" +(define_insn "aarch64_subw" [(set (match_operand: 0 "register_operand" "=w") -(ADDSUB: (match_operand: 1 "register_operand" "w") - (ANY_EXTEND: - (match_operand:VD_BHSI 2 "register_operand" "w"] + (minus: +(match_operand: 1 "register_operand" "w") +(ANY_EXTEND: + (match_operand:VD_BHSI 2 "register_operand" "w"] The (minus should be under the "(match_operand": (define_insn "aarch64_subw" [(set (match_operand: 0 "register_operand" "=w") (minus: (match_operand: 1 "register_operand" "w") (ANY_EXTEND: (match_operand:VD_BHSI 2 "register_operand" "w"] Same for the other patterns. Thanks, Richard You will need a maintainer's approval but this looks good to me. Thanks for doing this. I would only point out one other nit which you can choose to ignore: +/* Ensure + saddw2 and one saddw for the function add() + ssubw2 and one ssubw for the function subtract() + uaddw2 and one uaddw for the function uadd() + usubw2 and one usubw for the function usubtract() */ + +/* { dg-final { scan-assembler-times "\[ \t\]ssubw2\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]ssubw\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]saddw2\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]saddw\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]usubw2\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]usubw\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]uaddw2\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]uaddw\[ \t\]+" 1 } } */ The scan-assembly directives for the different functions can be placed right below each of them and that would make it easier to read the expected results in the test and you can get rid of the comments saying the same. Thanks Sudi
Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]
Hi Jackson On 11/07/18 17:48, Jackson Woodruff wrote: Hi Sudi, On 07/10/2018 02:29 PM, Sudakshina Das wrote: Hi Jackson On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote: Hi all, This patch resolves PR86014. It does so by noticing that the last load may clobber the address register without issue (regardless of where it exists in the final ldp/stp sequence). That check has been changed so that the last register may be clobbered and the testcase (gcc.target/aarch64/ldp_stp_10.c) now passes. Bootstrap and regtest OK. OK for trunk? Jackson Changelog: gcc/ 2018-06-25 Jackson Woodruff PR target/86014 * config/aarch64/aarch64.c (aarch64_operands_adjust_ok_for_ldpstp): Remove address clobber check on last register. This looks good to me but you will need a maintainer to approve it. The only thing I would add is that if you could move the comment on top of the for loop to this patch. That is, keep the original /* Check if the addresses are clobbered by load. */ in your [1/2] and make the comment change in [2/2]. Thanks, change made. OK for trunk? Looks good to me but you will need approval from a maintainer to commit it! Thanks Sudi Thanks, Jackson
Re: [AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]
Hi Jackson On 11/07/18 17:48, Jackson Woodruff wrote: Hi Sudi, Thanks for the review. On 07/10/2018 10:56 AM, Sudakshina wrote: Hi Jackson - if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode)) + if (!MEM_P (mem[1]) || aarch64_mem_pair_operand (mem[1], mode)) mem_1 == mem[1]? Oops, yes... That should be mem[0]. return false; - /* The mems cannot be volatile. */ ... /* If we have SImode and slow unaligned ldp, check the alignment to be at least 8 byte. */ if (mode == SImode && (aarch64_tune_params.extra_tuning_flags - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) + & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) && !optimize_size - && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT) + && MEM_ALIGN (mem[1]) < 8 * BITS_PER_UNIT) Likewise Done ... /* Check if the registers are of same class. */ - if (rclass_1 != rclass_2 || rclass_2 != rclass_3 || rclass_3 != rclass_4) - return false; + for (int i = 0; i < 3; i++) num_instructions -1 instead of 3 would be more consistent. Done + if (rclass[i] != rclass[i + 1]) + return false; It looks good otherwise. Thanks Sudi Re-regtested and boostrapped. OK for trunk? Looks good to me but you will need approval from a maintainer to commit it! Thanks Sudi Thanks, Jackson
Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]
Hi Jackson On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote: Hi all, This patch resolves PR86014. It does so by noticing that the last load may clobber the address register without issue (regardless of where it exists in the final ldp/stp sequence). That check has been changed so that the last register may be clobbered and the testcase (gcc.target/aarch64/ldp_stp_10.c) now passes. Bootstrap and regtest OK. OK for trunk? Jackson Changelog: gcc/ 2018-06-25 Jackson Woodruff PR target/86014 * config/aarch64/aarch64.c (aarch64_operands_adjust_ok_for_ldpstp): Remove address clobber check on last register. This looks good to me but you will need a maintainer to approve it. The only thing I would add is that if you could move the comment on top of the for loop to this patch. That is, keep the original /* Check if the addresses are clobbered by load. */ in your [1/2] and make the comment change in [2/2]. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index d0e9b2d464183eecc8cc7639ca3e981d2ff243ba..feffe8ebdbd4efd0ffc09834547767ceec46f4e4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -17074,7 +17074,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load, /* Only the last register in the order in which they occur may be clobbered by the load. */ if (load) -for (int i = 0; i < num_instructions; i++) +for (int i = 0; i < num_instructions - 1; i++) if (reg_mentioned_p (reg[i], mem[i])) return false; Thanks Sudi
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
PING! On 14/06/18 12:10, Sudakshina Das wrote: Hi Eric On 07/06/18 16:33, Eric Botcazou wrote: Sorry this fell off my radar. I have reg-tested it on x86 and tried it on the sparc machine from the gcc farm but I think I couldn't finished the run and now its showing to he unreachable. The patch is a no-op for SPARC because it defines the nonlocal_goto pattern. But I would nevertheless strongly suggest _not_ fiddling with the generic code like that and just defining the nonlocal_goto pattern for Aarch64 instead. Thank you for the suggestion, I have edited the patch accordingly and defined the nonlocal_goto pattern for AArch64. This has also helped take care of the issue with __builtin_longjmp that Wilco had mentioned in his comment on the PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521#c19). I have also modified the test case according to Wilco's comment to add an extra jump buffer. This test case passes with AArch64 but fails on x86 trunk as follows (It may fail on other targets as well): FAIL: gcc.c-torture/execute/pr84521.c -O1 execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 execution test FAIL: gcc.c-torture/execute/pr84521.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.c-torture/execute/pr84521.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr84521.c -Os execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test Testing: Bootstrapped and regtested on aarch64-none-linux-gnu. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-06-14 Sudakshina Das PR target/84521 * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment. * config/aarch64/aarch64.c (aarch64_needs_frame_chain): Add cfun->has_nonlocal_label to force frame chain. (aarch64_builtin_setjmp_frame_value): New. (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. * config/aarch64/aarch64.md (nonlocal_goto): New. *** gcc/testsuite/ChangeLog *** 2018-06-14 Sudakshina Das PR target/84521 * gcc.c-torture/execute/pr84521.c: New test.
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
Hi Eric On 07/06/18 16:33, Eric Botcazou wrote: Sorry this fell off my radar. I have reg-tested it on x86 and tried it on the sparc machine from the gcc farm but I think I couldn't finished the run and now its showing to he unreachable. The patch is a no-op for SPARC because it defines the nonlocal_goto pattern. But I would nevertheless strongly suggest _not_ fiddling with the generic code like that and just defining the nonlocal_goto pattern for Aarch64 instead. Thank you for the suggestion, I have edited the patch accordingly and defined the nonlocal_goto pattern for AArch64. This has also helped take care of the issue with __builtin_longjmp that Wilco had mentioned in his comment on the PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521#c19). I have also modified the test case according to Wilco's comment to add an extra jump buffer. This test case passes with AArch64 but fails on x86 trunk as follows (It may fail on other targets as well): FAIL: gcc.c-torture/execute/pr84521.c -O1 execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 execution test FAIL: gcc.c-torture/execute/pr84521.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.c-torture/execute/pr84521.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr84521.c -Os execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test Testing: Bootstrapped and regtested on aarch64-none-linux-gnu. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-06-14 Sudakshina Das PR target/84521 * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment. * config/aarch64/aarch64.c (aarch64_needs_frame_chain): Add cfun->has_nonlocal_label to force frame chain. (aarch64_builtin_setjmp_frame_value): New. (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. * config/aarch64/aarch64.md (nonlocal_goto): New. *** gcc/testsuite/ChangeLog *** 2018-06-14 Sudakshina Das PR target/84521 * gcc.c-torture/execute/pr84521.c: New test. diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 976f9af..f042def 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version; #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM) #define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () -/* Don't use __builtin_setjmp until we've defined it. */ +/* Don't use __builtin_setjmp until we've defined it. + CAUTION: This macro is only used during exception unwinding. + Don't fall for its name. */ #undef DONT_USE_BUILTIN_SETJMP #define DONT_USE_BUILTIN_SETJMP 1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index bd0ac2f..95f7fe3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3998,7 +3998,7 @@ static bool aarch64_needs_frame_chain (void) { /* Force a frame chain for EH returns so the return address is at FP+8. */ - if (frame_pointer_needed || crtl->calls_eh_return) + if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label) return true; /* A leaf function cannot have calls or write LR. */ @@ -12213,6 +12213,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE. */ +static rtx +aarch64_builtin_setjmp_frame_value (void) +{ + return hard_frame_pointer_rtx; +} + /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR. */ static tree @@ -17829,6 +17836,9 @@ aarch64_run_selftests (void) #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN aarch64_fold_builtin +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value + #undef TARGET_FUNCTION_ARG #define TARGET_FUNCTION_ARG aarch64_function_arg diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 830f976..381fd83 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -6081,6 +6081,30 @@ DONE; }) +;; This is broadly similar to the builtins.c except that it uses +;; temporaries to load the incoming SP and FP. +(define_expand "nonlocal_goto" + [(use (match_operand 0 "general_operand")) + (use (match_operand 1 "general_operand")) + (use (match_operand 2 "general_operand")) + (use (match_operand 3 "general_operand"))] + "" +{ +rtx label_in = copy_to_reg (operands[1]); +rtx fp_in = copy_to_reg (operands[3]); +rtx sp_in = copy_to_reg (operands[2]); + +emit_move_insn (hard_frame_pointer_rtx, fp_in); +emit_stack_restore (SAVE_N
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
On 02/05/18 18:28, Jeff Law wrote: On 03/14/2018 11:40 AM, Sudakshina Das wrote: Hi This patch is another partial fix for PR 84521. This is adding a definition to one of the target hooks used in the SJLJ implemetation so that AArch64 defines the hard_frame_pointer_rtx as the TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is still a lot more work to be done for these builtins in the future. Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added new test. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-03-14 Sudakshina Das * builtins.c (expand_builtin_setjmp_receiver): Update condition to restore frame pointer. * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment. * config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value): New. (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. *** gcc/testsuite/ChangeLog *** 2018-03-14 Sudakshina Das * gcc.c-torture/execute/pr84521.c: New test. So just to be clear, you do _not_ want the frame pointer restored here? Right? aarch64_builtin_setjmp_frame_value always returns hard_frame_pointer_rtx which will cause the generic code in builtins.c to not restore the frame pointer. Have you looked at other targets which define builtin_setjmp_frame_value to determine if they'll do the right thing. x86 and sparc are the most important. I see that arc, vax and avr also define that hook, but are obviously harder to test. Sorry this fell off my radar. I have reg-tested it on x86 and tried it on the sparc machine from the gcc farm but I think I couldn't finished the run and now its showing to he unreachable. Sudi jeff
Re: C++ PATCHes to xvalue handling
On 23/05/18 18:21, Jason Merrill wrote: The first patch implements the adjustments from core issues 616 and 1213 to the value category of subobjects of class prvalues: they were considered prvalues themselves, but that was kind of nonsensical. Now they are considered xvalues. Along with this, I've removed the diagnostic distinction between xvalues and prvalues when trying to use one or the other as an lvalue; the important thing is that they are rvalues. The second patch corrects various issues with casts and xvalues/rvalue references: we were treating an xvalue operand to dynamic_cast as an lvalue, and we were objecting to casts from prvalue to rvalue reference type. With the second patch: commit f7d2790049fd1e59af4b69ee12f7c101cfe4cdab Author: jasonDate: Wed May 23 17:21:39 2018 + Fix cast to rvalue reference from prvalue. * cvt.c (diagnose_ref_binding): Handle rvalue reference. * rtti.c (build_dynamic_cast_1): Don't try to build a reference to non-class type. Handle xvalue argument. * typeck.c (build_reinterpret_cast_1): Allow cast from prvalue to rvalue reference. * semantics.c (finish_compound_literal): Do direct-initialization, not cast, to initialize a reference. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@260622 138bc75d-0d04-0410-961f-82ee72b054a4 I have observed the following failure in Spec2017 while building 510.parest_r on aarch64-none-linux-gnu aarch64-none-linux-gnu-g++ -c -o source/numerics/matrices.all_dimensions.o -DSPEC -DNDEBUG -Iinclude -I. -DSPEC_AUTO_SUPPRESS_OPENMP -mcpu=cortex-a57+crypto -Ofast -fomit-frame-pointer -fpermissive-DSPEC_LP64 source/numerics/matrices.all_dimensions.cc source/numerics/matrices.all_dimensions.cc: In static member function 'static void dealii::MatrixTools::apply_boundary_values(const std::map&, dealii::BlockSparseMatrix&, dealii::BlockVector&, dealii::BlockVector&, bool)': source/numerics/matrices.all_dimensions.cc:469:50: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row]]; ^ source/numerics/matrices.all_dimensions.cc:472:55: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row]+1], ^ source/numerics/matrices.all_dimensions.cc:474:55: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row+1]], ^ source/numerics/matrices.all_dimensions.cc:479:49: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row]], ^ source/numerics/matrices.all_dimensions.cc:481:51: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row+1]], ^ source/numerics/matrices.all_dimensions.cc:510:50: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[0]]); Sudi Tested x86_64-pc-linux-gnu, applying to trunk.
Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.
Hi Martin On 25/05/18 10:45, Martin Liška wrote: On 05/21/2018 04:42 PM, Sudakshina Das wrote: On 21/05/18 15:00, Rainer Orth wrote: Hi Martin, Thanks for opened eyes, following patch will fix that. It's quite obvious, I'll install it right after tests will finish. unfortunately, it didn't fix either issue: * The switchlower -> switchlower1 renames in the dg-final* lines (attached) are still necessary to avoid the UNRESOLVED errors. Although obvious, I haven't installed them since ... * ... even so FAIL: gcc.dg/tree-prof/update-loopch.c scan-tree-dump switchlower1 "Removing basic block" remains. [...] You are right, it's using -O2, thus your patch is right. Please install the patch after testing. It's obvious fix. But what about the remaining FAIL? Sorry to add to this, but I have also observed the following failures on aarch64-none-elf, aarch64-none-linux-gnu and aarch64_be-none-elf targets bisected to this commit: FAIL: gcc.dg/sancov/cmp0.c -O0 scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp" 7 FAIL: gcc.dg/sancov/cmp0.c -O0 scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_switch \\(" 2 FAIL: gcc.dg/sancov/cmp0.c -O0 -g scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp" 7 FAIL: gcc.dg/sancov/cmp0.c -O0 -g scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_switch \\(" 2 Hi. I've just tested sancov tests on my aarch64 and cmp0.c looks fine. Can you please tell me which -march, -mtune does your board have? FAIL: gcc.dg/tree-ssa/pr77445-2.c scan-tree-dump-not thread3 "not considered" FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps threaded" I can confirm these 2. It's kind of expected, I will clean it up before next release. Jeff is aware of that.. Martin From my today's build, I only see the following remaining now: FAIL: gcc.dg/tree-prof/update-loopch.c scan-tree-dump switchlower1 "Removing basic block" FAIL: gcc.dg/tree-ssa/pr77445-2.c scan-tree-dump-not thread3 "not considered" FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps threaded" Sudi Sudi Rainer
Re: [PATCH][AARCH64][PR target/84882] Add mno-strict-align
Hi Richard On 18/05/18 15:48, Richard Earnshaw (lists) wrote: On 27/03/18 13:58, Sudakshina Das wrote: Hi This patch adds the no variant to -mstrict-align and the corresponding function attribute. To enable the function attribute, I have modified aarch64_can_inline_p () to allow checks even when the callee function has no attribute. The need for this is shown by the new test target_attr_18.c. Testing: Bootstrapped, regtested and added new tests that are copies of earlier tests checking -mstrict-align with opposite scan directives. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-03-27 Sudakshina Das <sudi@arm.com> * common/config/aarch64/aarch64-common.c (aarch64_handle_option): Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags. * config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative. * config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg as true for strict-align. (aarch64_can_inline_p): Perform checks even when callee has no attributes to check for strict alignment. * doc/extend.texi (AArch64 Function Attributes): Document no-strict-align. * doc/invoke.texi: (AArch64 Options): Likewise. *** gcc/testsuite/ChangeLog *** 2018-03-27 Sudakshina Das <sudi@arm.com> * gcc.target/aarch64/pr84882.c: New test. * gcc.target/aarch64/target_attr_18.c: Likewise. strict-align.diff diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c index 7fd9305..d5655a0 100644 --- a/gcc/common/config/aarch64/aarch64-common.c +++ b/gcc/common/config/aarch64/aarch64-common.c @@ -97,7 +97,10 @@ aarch64_handle_option (struct gcc_options *opts, return true; case OPT_mstrict_align: - opts->x_target_flags |= MASK_STRICT_ALIGN; + if (val) + opts->x_target_flags |= MASK_STRICT_ALIGN; + else + opts->x_target_flags &= ~MASK_STRICT_ALIGN; return true; case OPT_momit_leaf_frame_pointer: diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4b5183b..4f35a6c 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11277,7 +11277,7 @@ static const struct aarch64_attribute_info aarch64_attributes[] = { "fix-cortex-a53-843419", aarch64_attr_bool, true, NULL, OPT_mfix_cortex_a53_843419 }, { "cmodel", aarch64_attr_enum, false, NULL, OPT_mcmodel_ }, - { "strict-align", aarch64_attr_mask, false, NULL, OPT_mstrict_align }, + { "strict-align", aarch64_attr_mask, true, NULL, OPT_mstrict_align }, { "omit-leaf-frame-pointer", aarch64_attr_bool, true, NULL, OPT_momit_leaf_frame_pointer }, { "tls-dialect", aarch64_attr_enum, false, NULL, OPT_mtls_dialect_ }, @@ -11640,16 +11640,13 @@ aarch64_can_inline_p (tree caller, tree callee) tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller); tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee); - /* If callee has no option attributes, then it is ok to inline. */ - if (!callee_tree) -return true; I think it's still useful to spot the case where both callee_tree and caller_tree are NULL. In that case both options will pick up target_option_default_node and will always be compatible; so you can short-circuit that case, which is the most likely scenario. - struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree ? caller_tree : target_option_default_node); - struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); - + struct cl_target_option *callee_opts + = TREE_TARGET_OPTION (callee_tree ? callee_tree + : target_option_default_node); /* Callee's ISA flags should be a subset of the caller's. */ if ((caller_opts->x_aarch64_isa_flags & callee_opts->x_aarch64_isa_flags) diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 52eaf8c..1426b45 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -85,7 +85,7 @@ Target RejectNegative Joined Enum(cmodel) Var(aarch64_cmodel_var) Init(AARCH64_C Specify the code model. mstrict-align -Target Report RejectNegative Mask(STRICT_ALIGN) Save +Target Report Mask(STRICT_ALIGN) Save Don't assume that unaligned accesses are handled by the system. momit-leaf-frame-pointer diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 93a0ebc..dcda216 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -3605,8 +3605,10 @@ for the command line option @option{-mcmodel=}. @item strict-align Other targets add an @itemx for the no-variant. @cindex @code{strict-align} function attribute, AArch64 Indicates that the compiler should not assume that unaligned memory references -are handled by the system. The behavi
Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.
On 21/05/18 15:00, Rainer Orth wrote: Hi Martin, Thanks for opened eyes, following patch will fix that. It's quite obvious, I'll install it right after tests will finish. unfortunately, it didn't fix either issue: * The switchlower -> switchlower1 renames in the dg-final* lines (attached) are still necessary to avoid the UNRESOLVED errors. Although obvious, I haven't installed them since ... * ... even so FAIL: gcc.dg/tree-prof/update-loopch.c scan-tree-dump switchlower1 "Removing basic block" remains. [...] You are right, it's using -O2, thus your patch is right. Please install the patch after testing. It's obvious fix. But what about the remaining FAIL? Sorry to add to this, but I have also observed the following failures on aarch64-none-elf, aarch64-none-linux-gnu and aarch64_be-none-elf targets bisected to this commit: FAIL: gcc.dg/sancov/cmp0.c -O0 scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp" 7 FAIL: gcc.dg/sancov/cmp0.c -O0 scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_switch \\(" 2 FAIL: gcc.dg/sancov/cmp0.c -O0 -g scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp" 7 FAIL: gcc.dg/sancov/cmp0.c -O0 -g scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_switch \\(" 2 FAIL: gcc.dg/tree-ssa/pr77445-2.c scan-tree-dump-not thread3 "not considered" FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps threaded" Sudi Rainer
Re: [PATCH][AARCH64][PR target/84882] Add mno-strict-align
Ping! On 27/03/18 13:58, Sudakshina Das wrote: Hi This patch adds the no variant to -mstrict-align and the corresponding function attribute. To enable the function attribute, I have modified aarch64_can_inline_p () to allow checks even when the callee function has no attribute. The need for this is shown by the new test target_attr_18.c. Testing: Bootstrapped, regtested and added new tests that are copies of earlier tests checking -mstrict-align with opposite scan directives. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-03-27 Sudakshina Das <sudi@arm.com> * common/config/aarch64/aarch64-common.c (aarch64_handle_option): Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags. * config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative. * config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg as true for strict-align. (aarch64_can_inline_p): Perform checks even when callee has no attributes to check for strict alignment. * doc/extend.texi (AArch64 Function Attributes): Document no-strict-align. * doc/invoke.texi: (AArch64 Options): Likewise. *** gcc/testsuite/ChangeLog *** 2018-03-27 Sudakshina Das <sudi@arm.com> * gcc.target/aarch64/pr84882.c: New test. * gcc.target/aarch64/target_attr_18.c: Likewise.
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
Hi Sameera On 11/04/18 13:05, Sameera Deshpande wrote: On 11 April 2018 at 15:53, Sudakshina Das <sudi@arm.com> wrote: Hi Sameera On 11/04/18 09:04, Sameera Deshpande wrote: On 10 April 2018 at 20:07, Sudakshina Das <sudi@arm.com> wrote: Hi Sameera On 10/04/18 11:20, Sameera Deshpande wrote: On 7 April 2018 at 01:25, Christophe Lyon <christophe.l...@linaro.org> wrote: Hi, 2018-04-06 12:15 GMT+02:00 Sameera Deshpande <sameera.deshpa...@linaro.org>: Hi Christophe, Please find attached the updated patch with testcases. Ok for trunk? Thanks for the update. Since the new intrinsics are only available on aarch64, you want to prevent the tests from running on arm. Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two targets. There are several examples on how to do that in that directory. I have also noticed that the tests fail at execution on aarch64_be. I didn't look at the patch in details. Christophe - Thanks and regards, Sameera D. 2017-12-14 22:17 GMT+05:30 Christophe Lyon <christophe.l...@linaro.org>: 2017-12-14 9:29 GMT+01:00 Sameera Deshpande <sameera.deshpa...@linaro.org>: Hi! Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics as defined by Neon document. Ok for trunk? - Thanks and regards, Sameera D. gcc/Changelog: 2017-11-14 Sameera Deshpande <sameera.deshpa...@linaro.org> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New. (st1x2): Likewise. (st1x3): Likewise. * config/aarch64/aarch64-simd.md (aarch64_ld1x3): New pattern. (aarch64_ld1_x3_): Likewise (aarch64_st1x2): Likewise (aarch64_st1_x2_): Likewise (aarch64_st1x3): Likewise (aarch64_st1_x3_): Likewise * config/aarch64/arm_neon.h (vld1_u8_x3): New function. (vld1_s8_x3): Likewise. (vld1_u16_x3): Likewise. (vld1_s16_x3): Likewise. (vld1_u32_x3): Likewise. (vld1_s32_x3): Likewise. (vld1_u64_x3): Likewise. (vld1_s64_x3): Likewise. (vld1_fp16_x3): Likewise. (vld1_f32_x3): Likewise. (vld1_f64_x3): Likewise. (vld1_p8_x3): Likewise. (vld1_p16_x3): Likewise. (vld1_p64_x3): Likewise. (vld1q_u8_x3): Likewise. (vld1q_s8_x3): Likewise. (vld1q_u16_x3): Likewise. (vld1q_s16_x3): Likewise. (vld1q_u32_x3): Likewise. (vld1q_s32_x3): Likewise. (vld1q_u64_x3): Likewise. (vld1q_s64_x3): Likewise. (vld1q_f16_x3): Likewise. (vld1q_f32_x3): Likewise. (vld1q_f64_x3): Likewise. (vld1q_p8_x3): Likewise. (vld1q_p16_x3): Likewise. (vld1q_p64_x3): Likewise. (vst1_s64_x2): Likewise. (vst1_u64_x2): Likewise. (vst1_f64_x2): Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3 patchname=armv8_2-fp16-scalar-2.patch3 refrev=259064 email_to=christophe.l...@linaro.org (vst1_s8_x2): Likewise. (vst1_p8_x2): Likewise. (vst1_s16_x2): Likewise. (vst1_p16_x2): Likewise. (vst1_s32_x2): Likewise. (vst1_u8_x2): Likewise. (vst1_u16_x2): Likewise. (vst1_u32_x2): Likewise. (vst1_f16_x2): Likewise. (vst1_f32_x2): Likewise. (vst1_p64_x2): Likewise. (vst1q_s8_x2): Likewise. (vst1q_p8_x2): Likewise. (vst1q_s16_x2): Likewise. (vst1q_p16_x2): Likewise. (vst1q_s32_x2): Likewise. (vst1q_s64_x2): Likewise. (vst1q_u8_x2): Likewise. (vst1q_u16_x2): Likewise. (vst1q_u32_x2): Likewise. (vst1q_u64_x2): Likewise. (vst1q_f16_x2): Likewise. (vst1q_f32_x2): Likewise. (vst1q_f64_x2): Likewise. (vst1q_p64_x2): Likewise. (vst1_s64_x3): Likewise. (vst1_u64_x3): Likewise. (vst1_f64_x3): Likewise. (vst1_s8_x3): Likewise. (vst1_p8_x3): Likewise. (vst1_s16_x3): Likewise. (vst1_p16_x3): Likewise. (vst1_s32_x3): Likewise. (vst1_u8_x3): Likewise. (vst1_u16_x3): Likewise. (vst1_u32_x3): Likewise. (vst1_f16_x3): Likewise. (vst1_f32_x3): Likewise. (vst1_p64_x3): Likewise. (vst1q_s8_x3): Likewise. (vst1q_p8_x3): Likewise. (vst1q_s16_x3): Likewise. (vst1q_p16_x3): Likewise. (vst1q_s32_x3): Likewise. (vst1q_s64_x3): Likewise. (vst1q_u8_x3): Likewise. (vst1q_u16_x3): Likewise. (vst1q_u32_x3): Likewise. (vst1q_u64_x3): Likewise. (vst1q_f16_x3): Lik
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
Hi Sameera On 11/04/18 09:04, Sameera Deshpande wrote: On 10 April 2018 at 20:07, Sudakshina Das <sudi@arm.com> wrote: Hi Sameera On 10/04/18 11:20, Sameera Deshpande wrote: On 7 April 2018 at 01:25, Christophe Lyon <christophe.l...@linaro.org> wrote: Hi, 2018-04-06 12:15 GMT+02:00 Sameera Deshpande <sameera.deshpa...@linaro.org>: Hi Christophe, Please find attached the updated patch with testcases. Ok for trunk? Thanks for the update. Since the new intrinsics are only available on aarch64, you want to prevent the tests from running on arm. Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two targets. There are several examples on how to do that in that directory. I have also noticed that the tests fail at execution on aarch64_be. I didn't look at the patch in details. Christophe - Thanks and regards, Sameera D. 2017-12-14 22:17 GMT+05:30 Christophe Lyon <christophe.l...@linaro.org>: 2017-12-14 9:29 GMT+01:00 Sameera Deshpande <sameera.deshpa...@linaro.org>: Hi! Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics as defined by Neon document. Ok for trunk? - Thanks and regards, Sameera D. gcc/Changelog: 2017-11-14 Sameera Deshpande <sameera.deshpa...@linaro.org> * config/aarch64/aarch64-simd-builtins.def (ld1x3): New. (st1x2): Likewise. (st1x3): Likewise. * config/aarch64/aarch64-simd.md (aarch64_ld1x3): New pattern. (aarch64_ld1_x3_): Likewise (aarch64_st1x2): Likewise (aarch64_st1_x2_): Likewise (aarch64_st1x3): Likewise (aarch64_st1_x3_): Likewise * config/aarch64/arm_neon.h (vld1_u8_x3): New function. (vld1_s8_x3): Likewise. (vld1_u16_x3): Likewise. (vld1_s16_x3): Likewise. (vld1_u32_x3): Likewise. (vld1_s32_x3): Likewise. (vld1_u64_x3): Likewise. (vld1_s64_x3): Likewise. (vld1_fp16_x3): Likewise. (vld1_f32_x3): Likewise. (vld1_f64_x3): Likewise. (vld1_p8_x3): Likewise. (vld1_p16_x3): Likewise. (vld1_p64_x3): Likewise. (vld1q_u8_x3): Likewise. (vld1q_s8_x3): Likewise. (vld1q_u16_x3): Likewise. (vld1q_s16_x3): Likewise. (vld1q_u32_x3): Likewise. (vld1q_s32_x3): Likewise. (vld1q_u64_x3): Likewise. (vld1q_s64_x3): Likewise. (vld1q_f16_x3): Likewise. (vld1q_f32_x3): Likewise. (vld1q_f64_x3): Likewise. (vld1q_p8_x3): Likewise. (vld1q_p16_x3): Likewise. (vld1q_p64_x3): Likewise. (vst1_s64_x2): Likewise. (vst1_u64_x2): Likewise. (vst1_f64_x2): Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3 patchname=armv8_2-fp16-scalar-2.patch3 refrev=259064 email_to=christophe.l...@linaro.org (vst1_s8_x2): Likewise. (vst1_p8_x2): Likewise. (vst1_s16_x2): Likewise. (vst1_p16_x2): Likewise. (vst1_s32_x2): Likewise. (vst1_u8_x2): Likewise. (vst1_u16_x2): Likewise. (vst1_u32_x2): Likewise. (vst1_f16_x2): Likewise. (vst1_f32_x2): Likewise. (vst1_p64_x2): Likewise. (vst1q_s8_x2): Likewise. (vst1q_p8_x2): Likewise. (vst1q_s16_x2): Likewise. (vst1q_p16_x2): Likewise. (vst1q_s32_x2): Likewise. (vst1q_s64_x2): Likewise. (vst1q_u8_x2): Likewise. (vst1q_u16_x2): Likewise. (vst1q_u32_x2): Likewise. (vst1q_u64_x2): Likewise. (vst1q_f16_x2): Likewise. (vst1q_f32_x2): Likewise. (vst1q_f64_x2): Likewise. (vst1q_p64_x2): Likewise. (vst1_s64_x3): Likewise. (vst1_u64_x3): Likewise. (vst1_f64_x3): Likewise. (vst1_s8_x3): Likewise. (vst1_p8_x3): Likewise. (vst1_s16_x3): Likewise. (vst1_p16_x3): Likewise. (vst1_s32_x3): Likewise. (vst1_u8_x3): Likewise. (vst1_u16_x3): Likewise. (vst1_u32_x3): Likewise. (vst1_f16_x3): Likewise. (vst1_f32_x3): Likewise. (vst1_p64_x3): Likewise. (vst1q_s8_x3): Likewise. (vst1q_p8_x3): Likewise. (vst1q_s16_x3): Likewise. (vst1q_p16_x3): Likewise. (vst1q_s32_x3): Likewise. (vst1q_s64_x3): Likewise. (vst1q_u8_x3): Likewise. (vst1q_u16_x3): Likewise. (vst1q_u32_x3): Likewise. (vst1q_u64_x3): Likewise. (vst1q_f16_x3): Likewise. (vst1q_f32_x3): Likewise. (vst1q_f64_x3): Likewise. (vst1q_p64_x3): Likewise. Hi, I'm not a maintainer, but I suspect you should add some tests. Christophe -- - Thanks and regards,
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
Hi Sameera On 10/04/18 11:20, Sameera Deshpande wrote: On 7 April 2018 at 01:25, Christophe Lyonwrote: Hi, 2018-04-06 12:15 GMT+02:00 Sameera Deshpande : Hi Christophe, Please find attached the updated patch with testcases. Ok for trunk? Thanks for the update. Since the new intrinsics are only available on aarch64, you want to prevent the tests from running on arm. Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two targets. There are several examples on how to do that in that directory. I have also noticed that the tests fail at execution on aarch64_be. I didn't look at the patch in details. Christophe - Thanks and regards, Sameera D. 2017-12-14 22:17 GMT+05:30 Christophe Lyon : 2017-12-14 9:29 GMT+01:00 Sameera Deshpande : Hi! Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics as defined by Neon document. Ok for trunk? - Thanks and regards, Sameera D. gcc/Changelog: 2017-11-14 Sameera Deshpande * config/aarch64/aarch64-simd-builtins.def (ld1x3): New. (st1x2): Likewise. (st1x3): Likewise. * config/aarch64/aarch64-simd.md (aarch64_ld1x3): New pattern. (aarch64_ld1_x3_): Likewise (aarch64_st1x2): Likewise (aarch64_st1_x2_): Likewise (aarch64_st1x3): Likewise (aarch64_st1_x3_): Likewise * config/aarch64/arm_neon.h (vld1_u8_x3): New function. (vld1_s8_x3): Likewise. (vld1_u16_x3): Likewise. (vld1_s16_x3): Likewise. (vld1_u32_x3): Likewise. (vld1_s32_x3): Likewise. (vld1_u64_x3): Likewise. (vld1_s64_x3): Likewise. (vld1_fp16_x3): Likewise. (vld1_f32_x3): Likewise. (vld1_f64_x3): Likewise. (vld1_p8_x3): Likewise. (vld1_p16_x3): Likewise. (vld1_p64_x3): Likewise. (vld1q_u8_x3): Likewise. (vld1q_s8_x3): Likewise. (vld1q_u16_x3): Likewise. (vld1q_s16_x3): Likewise. (vld1q_u32_x3): Likewise. (vld1q_s32_x3): Likewise. (vld1q_u64_x3): Likewise. (vld1q_s64_x3): Likewise. (vld1q_f16_x3): Likewise. (vld1q_f32_x3): Likewise. (vld1q_f64_x3): Likewise. (vld1q_p8_x3): Likewise. (vld1q_p16_x3): Likewise. (vld1q_p64_x3): Likewise. (vst1_s64_x2): Likewise. (vst1_u64_x2): Likewise. (vst1_f64_x2): Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3 patchname=armv8_2-fp16-scalar-2.patch3 refrev=259064 email_to=christophe.l...@linaro.org (vst1_s8_x2): Likewise. (vst1_p8_x2): Likewise. (vst1_s16_x2): Likewise. (vst1_p16_x2): Likewise. (vst1_s32_x2): Likewise. (vst1_u8_x2): Likewise. (vst1_u16_x2): Likewise. (vst1_u32_x2): Likewise. (vst1_f16_x2): Likewise. (vst1_f32_x2): Likewise. (vst1_p64_x2): Likewise. (vst1q_s8_x2): Likewise. (vst1q_p8_x2): Likewise. (vst1q_s16_x2): Likewise. (vst1q_p16_x2): Likewise. (vst1q_s32_x2): Likewise. (vst1q_s64_x2): Likewise. (vst1q_u8_x2): Likewise. (vst1q_u16_x2): Likewise. (vst1q_u32_x2): Likewise. (vst1q_u64_x2): Likewise. (vst1q_f16_x2): Likewise. (vst1q_f32_x2): Likewise. (vst1q_f64_x2): Likewise. (vst1q_p64_x2): Likewise. (vst1_s64_x3): Likewise. (vst1_u64_x3): Likewise. (vst1_f64_x3): Likewise. (vst1_s8_x3): Likewise. (vst1_p8_x3): Likewise. (vst1_s16_x3): Likewise. (vst1_p16_x3): Likewise. (vst1_s32_x3): Likewise. (vst1_u8_x3): Likewise. (vst1_u16_x3): Likewise. (vst1_u32_x3): Likewise. (vst1_f16_x3): Likewise. (vst1_f32_x3): Likewise. (vst1_p64_x3): Likewise. (vst1q_s8_x3): Likewise. (vst1q_p8_x3): Likewise. (vst1q_s16_x3): Likewise. (vst1q_p16_x3): Likewise. (vst1q_s32_x3): Likewise. (vst1q_s64_x3): Likewise. (vst1q_u8_x3): Likewise. (vst1q_u16_x3): Likewise. (vst1q_u32_x3): Likewise. (vst1q_u64_x3): Likewise. (vst1q_f16_x3): Likewise. (vst1q_f32_x3): Likewise. (vst1q_f64_x3): Likewise. (vst1q_p64_x3): Likewise. Hi, I'm not a maintainer, but I suspect you should add some tests. Christophe -- - Thanks and regards, Sameera D. Hi Christophe, Please find attached the updated patch. Similar to the testcase vld1x2.c, I have updated the testcases to mark them XFAIL for ARM, as the intrinsics are not implemented yet. I have also added required target to be little endian. I am not a
Re: [Aarch64] Fix conditional branches with target far away.
Hi Sameera On 29/03/18 11:44, Sameera Deshpande wrote: Hi Sudakshina, Thanks for pointing that out. Updated the conditions for attribute length to take care of boundary conditions for offset range. Please find attached the updated patch. I have tested it for gcc testsuite and the failing testcase. Ok for trunk? Thank you so much for fixing the length as well along with you patch. You mention a failing testcase? Maybe it would be helpful to add that to the patch for the gcc testsuite. Sudi On 22 March 2018 at 19:06, Sudakshina Das <sudi@arm.com> wrote: Hi Sameera On 22/03/18 02:07, Sameera Deshpande wrote: Hi Sudakshina, As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the far branch instruction offset is inclusive of both the offsets. Hence, I am using <=||=> and not <||>= as it was in previous implementation. I have to admit earlier I was only looking at the patch mechanically and found a difference with the previous implementation in offset comparison. After you pointed out, I looked up the ARMv8 ARM and I have a couple of doubts: 1. My understanding is that any offset in [-1048576 ,1048572] both inclusive qualifies as an 'in range' offset. However, the code for both attribute length and far_branch has been using [-1048576 ,1048572), that is, ( >= && < ). If the far_branch was incorrectly calculated, then maybe the length calculations with similar magic numbers should also be corrected? Of course, I am not an expert in this and maybe this was a conscience decision so I would ask Ramana to maybe clarify if he remembers. 2. Now to come back to your patch, if my understanding is correct, I think a far_branch would be anything outside of this range, that is, (offset < -1048576 || offset > 1048572), anything that can not be represented in the 21-bit range. Thanks Sudi On 16 March 2018 at 00:51, Sudakshina Das <sudi@arm.com> wrote: On 15/03/18 15:27, Sameera Deshpande wrote: Ping! On 28 February 2018 at 16:18, Sameera Deshpande <sameera.deshpa...@linaro.org> wrote: On 27 February 2018 at 18:25, Ramana Radhakrishnan <ramana@googlemail.com> wrote: On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande <sameera.deshpa...@linaro.org> wrote: Hi! Please find attached the patch to fix bug in branches with offsets over 1MiB. There has been an attempt to fix this issue in commit 050af05b9761f1979f11c151519e7244d5becd7c However, the far_branch attribute defined in above patch used insn_length - which computes incorrect offset. Hence, eliminated the attribute completely, and computed the offset from insn_addresses instead. Ok for trunk? gcc/Changelog 2018-02-13 Sameera Deshpande <sameera.deshpa...@linaro.org> * config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate all the dependencies on the attribute from RTL patterns. I'm not a maintainer but this looks good to me modulo notes about how this was tested. What would be nice is a testcase for the testsuite as well as ensuring that the patch has been bootstrapped and regression tested. AFAIR, the original patch was put in because match.pd failed when bootstrap in another context. regards Ramana -- - Thanks and regards, Sameera D. The patch is tested with GCC testsuite and bootstrapping successfully. Also tested for spec benchmark. I am not a maintainer either. I noticed that the range check you do for the offset has a (<= || >=). The "far_branch" however did (< || >=) for a positive value. Was that also part of the incorrect offset calculation? @@ -692,7 +675,11 @@ { if (get_attr_length (insn) =3D=3D 8) { - if (get_attr_far_branch (insn) =3D=3D 1) + long long int offset; + offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0))) + - INSN_ADDRESSES (INSN_UID (insn)); + + if (offset <=3D -1048576 || offset >=3D 1048572) return aarch64_gen_far_branch (operands, 2, "Ltb", "\\t%0, %1, "); else @@ -709,12 +696,7 @@ (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768)) (lt (minus (match_dup 2) (pc)) (const_int 32764))) (const_int 4) - (const_int 8))) - (set (attr "far_branch") - (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) - (lt (minus (match_dup 2) (pc)) (const_int 1048572))) - (const_int 0) - (const_int 1)))] + (const_int 8)))] ) Thanks Sudi -- - Thanks and regards, Sameera D.
Re: [PATCH, GCC-7, GCC-6][ARM][PR target/84826] Backport Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabihf
Hi Kyrill On 29/03/18 09:41, Kyrill Tkachov wrote: Hi Sudi, On 28/03/18 15:04, Sudakshina Das wrote: Hi This patch is a request to backport r258777 and r258805 to gcc-7-branch and gcc-6-branch. The same ICE occurs in both the branches with -fstack-check. Thus the test case directive has been changed. The discussion on the patch that went into trunk is: https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01120.html Testing : Regtested on both the branches with arm-none-linux-gnueabihf Is this ok for gcc-7 and gcc-6? Ok. Thanks, Kyrill Thanks! Committed to gcc-7-branch as r258948 and gcc-6-branch as r258949. Sudi Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-03-28 Sudakshina Das <sudi@arm.com> Backport from mainline 2018-03-22 Sudakshina Das <sudi@arm.com> PR target/84826 * config/arm/arm.h (machine_function): Add static_chain_stack_bytes. * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid re-computing once computed. (arm_expand_prologue): Compute machine->static_chain_stack_bytes. (arm_init_machine_status): Initialize machine->static_chain_stack_bytes. *** gcc/testsuite/ChangeLog *** 2018-03-28 Sudakshina Das <sudi@arm.com> * gcc.target/arm/pr84826.c: Change dg-option to -fstack-check. Backport from mainline 2018-03-23 Sudakshina Das <sudi@arm.com> PR target/84826 * gcc.target/arm/pr84826.c: Add dg directive. Backport from mainline 2018-03-22 Sudakshina Das <sudi@arm.com> PR target/84826 * gcc.target/arm/pr84826.c: New test.