from:"Sudakshina Das"

RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-11-13 Thread Sudakshina Das via Gcc-patches

Hi Richard

> -Original Message-
> From: Richard Sandiford 
> Sent: 11 November 2020 17:52
> To: Sudakshina Das 
> Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org;
> Kyrylo Tkachov ; Richard Earnshaw
> 
> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> __builtin_memset
> 
> Sudakshina Das  writes:
> > Apologies for the delay. I have attached another version of the patch.
> > I have disabled the test cases for ILP32. This is only because
> > function body check fails because there is an addition unsigned extension
> instruction for src pointer in
> > every test (uxtwx0, w0). The actual inlining is not different.
> 
> Yeah, agree that's the best way of handling the ILP32 difference.
> 
> > […]
> > +/* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant.
> Without
> > +   -mstrict-align, make decisions in "setmem".  Otherwise follow a sensible
> > +   default:  when optimizing for size adjust the ratio to account for
> > +the
> 
> nit: should just be one space after “:”
> 
> > […]
> > @@ -21289,6 +21292,134 @@ aarch64_expand_cpymem (rtx *operands)
> >return true;
> >  }
> >
> > +/* Like aarch64_copy_one_block_and_progress_pointers, except for
> memset where
> > +   *src is a register we have created with the duplicated value to be
> > +set.  */
> 
> “*src” -> SRC
> since there's no dereference now
> 
> > […]
> > +  /* In case we are optimizing for size or if the core does not
> > + want to use STP Q regs, lower the max_set_size.  */
> > +  max_set_size = (!speed_p
> > + || (aarch64_tune_params.extra_tuning_flags
> > + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
> > + ? max_set_size/2 : max_set_size;
> 
> Formatting nit: should be a space either side of “/”.
> 
> > +  while (n > 0)
> > +{
> > +  /* Find the largest mode in which to do the copy in without
> > +over writing.  */
> 
> s/in without/without/
> 
> > +  opt_scalar_int_mode mode_iter;
> > +  FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> > +   if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
> > + cur_mode = mode_iter.require ();
> > +
> > +  gcc_assert (cur_mode != BLKmode);
> > +
> > +  mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant ();
> > +  aarch64_set_one_block_and_progress_pointer (src, ,
> > + cur_mode);
> > +
> > +  n -= mode_bits;
> > +
> > +  /* Do certain trailing copies as overlapping if it's going to be
> > +cheaper.  i.e. less instructions to do so.  For instance doing a 15
> > +byte copy it's more efficient to do two overlapping 8 byte copies
> than
> > +8 + 4 + 2 + 1.  */
> > +  if (n > 0 && n < copy_limit / 2)
> > +   {
> > + next_mode = smallest_mode_for_size (n, MODE_INT);
> > + int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
> 
> Sorry for the runaround, but looking at this again, I'm a bit worried that we
> only indirectly test that n_bits is within the length of the original set.  I 
> guess
> it is because if n < copy_limit / 2 then n < mode_bits, and so n_bits will 
> never
> exceed mode_bits.  I think it might be worth adding an assert to make that
> “clearer” (maybe only to me, probably obvious to everyone else):
> 
> gcc_assert (n_bits <= mode_bits);
> 
> OK with those changes, thanks.

Thank you! Committed as 54bbde5 with those changes.

Sudi

> 
> Richard
> 
> > + dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);
> > + n = n_bits;
> > +   }
> > +}
> > +
> > +  return true;
> > +}
> > +
> > +
> >  /* Split a DImode store of a CONST_INT SRC to MEM DST as two
> > SImode stores.  Handle the case when the constant has identical
> > bottom and top halves.  This is beneficial when the two stores can
> > be

RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-11-11 Thread Sudakshina Das via Gcc-patches

Hi Richard

> -Original Message-
> From: Richard Sandiford 
> Sent: 03 November 2020 11:34
> To: Sudakshina Das 
> Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org;
> Kyrylo Tkachov ; Richard Earnshaw
> 
> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> __builtin_memset
> 
> Sudakshina Das  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: 30 October 2020 19:56
> >> To: Sudakshina Das 
> >> Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org;
> >> Kyrylo Tkachov ; Richard Earnshaw
> >> 
> >> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> >> __builtin_memset
> >>
> >> > +  base = copy_to_mode_reg (Pmode, XEXP (dst, 0));  dst =
> >> > + adjust_automodify_address (dst, VOIDmode, base, 0);
> >> > +
> >> > +  /* Prepare the val using a DUP v0.16B, val.  */  if (CONST_INT_P
> >> > + (val))
> >> > +{
> >> > +  val = force_reg (QImode, val);
> >> > +}
> >> > +  src = gen_reg_rtx (V16QImode);
> >> > +  emit_insn (gen_aarch64_simd_dupv16qi(src, val));
> >>
> >> I think we should use:
> >>
> >>   src = expand_vector_broadcast (V16QImode, val);
> >>
> >> here (without the CONST_INT_P check), so that for constants we just
> >> move a constant directly into a register.
> >>
> >
> > Sorry to bring this up again. When I tried expand_vector_broadcast, I
> > see the following behaviour:
> > for __builtin_memset(p, 1, 24) where the duplicated constant fits
> > moviv0.16b, 0x1
> > mov x1, 72340172838076673
> > str x1, [x0, 16]
> > str q0, [x0]
> > and an ICE for __builtin_memset(p, 1, 32) where I am guessing the
> > duplicated constant does not fit
> > x.c:7:30: error: unrecognizable insn:
> > 7 | { __builtin_memset(p, 1, 32);}
> >   |  ^
> > (insn 8 7 0 2 (parallel [
> > (set (mem:V16QI (reg:DI 94) [0 MEM  [(void 
> > *)p_2(D)]+0
> S16 A8])
> > (const_vector:V16QI [
> > (const_int 1 [0x1]) repeated x16
> > ]))
> > (set (mem:V16QI (plus:DI (reg:DI 94)
> > (const_int 16 [0x10])) [0 MEM  [(void 
> > *)p_2(D)]+16
> S16 A8])
> > (const_vector:V16QI [
> > (const_int 1 [0x1]) repeated x16
> > ]))
> > ]) "x.c":7:3 -1
> >  (nil))
> > during RTL pass: vregs
> 
> Ah, yeah, I guess we need to call force_reg on the result.
> 
> >> So yeah, I'm certainly not questioning the speed_p value of 256.
> >> I'm sure you and Wilco have picked the best value for that.  But -Os
> >> stuff can usually be justified on first principles and I wasn't sure
> >> where the value of 128 came from.
> >>
> >
> > I had another chat with Wilco about the 128byte value for !speed_p. We
> > estimate the average number of instructions upto 128byte would be ~3
> > which is similar to do a memset call. But I did go back and think
> > about the tuning argument of
> AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS a
> > bit more because you are right that based on that the average instructions
> can become double.
> > I would propose using 256/128 based on speed_p but halving the value
> > based on the tune parameter. Obviously the assumption here is that we
> > are respecting the core's choice of avoiding stp of q registers (given
> > that I do not see other uses of
> AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS being changed by -Os).
> 
> Yeah, but I think the lack of an -Os check in the existing code might be a
> mistake.  The point is that STP Q is smaller than two separate STR Qs, so 
> using
> it is a size optimisation even if it's not a speed optimisation.
> And like I say, -Os isn't supposed to be striking a balance between size and
> speed: it's supposed to be going for size quite aggressively.
> 
> So TBH I have slight preference for keeping the current value and only
> checking the tuning flag for speed_p.  But I agree that halving the value
> would be self-consistent, so if you or Wilco believe strongly that halving is
> better, that'd be OK with me too.
> 
> > There might be a debate on how useful
> > AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS
> > is in the context of memset/memcpy but that needs more analysis and I
> > would say should be a separate patch.
>

RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-11-03 Thread Sudakshina Das via Gcc-patches

Hi Richard

> -Original Message-
> From: Richard Sandiford 
> Sent: 30 October 2020 19:56
> To: Sudakshina Das 
> Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org;
> Kyrylo Tkachov ; Richard Earnshaw
> 
> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> __builtin_memset
> 
> > +  base = copy_to_mode_reg (Pmode, XEXP (dst, 0));  dst =
> > + adjust_automodify_address (dst, VOIDmode, base, 0);
> > +
> > +  /* Prepare the val using a DUP v0.16B, val.  */  if (CONST_INT_P
> > + (val))
> > +{
> > +  val = force_reg (QImode, val);
> > +}
> > +  src = gen_reg_rtx (V16QImode);
> > +  emit_insn (gen_aarch64_simd_dupv16qi(src, val));
> 
> I think we should use:
> 
>   src = expand_vector_broadcast (V16QImode, val);
> 
> here (without the CONST_INT_P check), so that for constants we just move a
> constant directly into a register.
>

Sorry to bring this up again. When I tried expand_vector_broadcast, I 
see the following behaviour:
for __builtin_memset(p, 1, 24) where the duplicated constant fits
moviv0.16b, 0x1
mov x1, 72340172838076673
str x1, [x0, 16]
str q0, [x0]
and an ICE for __builtin_memset(p, 1, 32) where I am guessing the duplicated
constant does not fit
x.c:7:30: error: unrecognizable insn:
7 | { __builtin_memset(p, 1, 32);}
  |  ^
(insn 8 7 0 2 (parallel [
(set (mem:V16QI (reg:DI 94) [0 MEM  [(void *)p_2(D)]+0 
S16 A8])
(const_vector:V16QI [
(const_int 1 [0x1]) repeated x16
]))
(set (mem:V16QI (plus:DI (reg:DI 94)
(const_int 16 [0x10])) [0 MEM  [(void 
*)p_2(D)]+16 S16 A8])
(const_vector:V16QI [
(const_int 1 [0x1]) repeated x16
]))
]) "x.c":7:3 -1
 (nil))
during RTL pass: vregs

> Sudakshina Das  writes:
> >> > +
> >> > +  /* "Cast" the *dst to the correct mode.  */  *dst =
> >> > + adjust_address (*dst, mode, 0);
> >> > +  /* Emit the memset.  */
> >> > +  emit_move_insn (*dst, reg);
> >> > +  /* Move the pointer forward.  */  *dst =
> >> > + aarch64_progress_pointer (*dst); }
> >> > +
> >> > +/* Expand setmem, as if from a __builtin_memset.  Return true if
> >> > +   we succeed, otherwise return false.  */
> >> > +
> >> > +bool
> >> > +aarch64_expand_setmem (rtx *operands) {
> >> > +  int n, mode_bits;
> >> > +  unsigned HOST_WIDE_INT len;
> >> > +  rtx dst = operands[0];
> >> > +  rtx val = operands[2], src;
> >> > +  rtx base;
> >> > +  machine_mode cur_mode = BLKmode, next_mode;
> >> > +  bool speed_p = !optimize_function_for_size_p (cfun);
> >> > +  unsigned max_set_size = speed_p ? 256 : 128;
> >>
> >> What's the basis for the size value?  AIUI (and I've probably got
> >> this wrong), that effectively means a worst case of 3+2 stores
> >> (3 STP Qs and 2 mop-up stores).  Then we need one instruction to set
> >> up the constant.  So if that's right, it looks like the worst-case size is 
> >> 6
> instructions.
> >>
> >> AARCH64_CALL_RATIO has a value of 8, but I'm not sure how that
> >> relates to the number of instructions in a call.  I guess the best
> >> case is 4 (3 instructions for the parameters and one for the call itself).
> >>
> >
> > This one I will ask Wilco to chime in. We discussed offline what would
> > be the largest case that this builtin should allow and he suggested
> > 256-bytes. It would actually generate 9 instructions (its in the memset-
> corner-case.c).
> > Personally I am not sure what the best decisions are in this case so I
> > will rely on Wilco's suggestions.
> 
> Ah, sorry, by “the size value”, I meant the !speed_p value of 128.
> I now realise that that was far from clear given that the variable is called
> max_set_size :-)
> 
> So yeah, I'm certainly not questioning the speed_p value of 256.
> I'm sure you and Wilco have picked the best value for that.  But -Os stuff can
> usually be justified on first principles and I wasn't sure where the value of 
> 128
> came from.
>

I had another chat with Wilco about the 128byte value for !speed_p. We
estimate the average number of instructions upto 128byte would be ~3 which
is similar to do a memset call. But I did go back and think about the tuning
argument of  AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS a bit more because
you are right that based on that the average instructions ca

RE: [PATCH] aarch64: Fix PR97638

2020-11-02 Thread Sudakshina Das via Gcc-patches

Hi Richard

> -Original Message-
> From: Richard Sandiford 
> Sent: 02 November 2020 10:31
> To: Sudakshina Das 
> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw 
> Subject: Re: [PATCH] aarch64: Fix PR97638
> 
> Sudakshina Das  writes:
> > Hi
> >
> > Currently the testcase in the patch was failing to produce a 'bti c'
> > at the beginning of the function. This was because in
> > aarch64_pac_insn_p, we were wrongly returning at the first check. This
> > patch fixes the return value.
> >
> > Bootstrap and regression tested on aarch64-none-linux-gnu.
> > Is this ok for trunk and gcc 10 backport?
> 
> OK for both, thanks.

Thank you! Pushed to trunk. Will wait for a couple of days before backport.

Thanks
Sudi

> 
> Richard

[PATCH] aarch64: Fix PR97638

2020-11-02 Thread Sudakshina Das via Gcc-patches

Hi

Currently the testcase in the patch was failing to produce
a 'bti c' at the beginning of the function. This was because
in aarch64_pac_insn_p, we were wrongly returning at the first
check. This patch fixes the return value.

Bootstrap and regression tested on aarch64-none-linux-gnu.
Is this ok for trunk and gcc 10 backport?

Thanks
Sudi

gcc/ChangeLog:

2020-10-30  Sudakshina Das  

PR target/97638
* config/aarch64/aarch64-bti-insert.c (aarch64_pac_insn_p): Update
return value on INSN_P check.

gcc/testsuite/ChangeLog:

2020-10-30  Sudakshina Das  

PR target/97638
* gcc.target/aarch64/pr97638.c: New test.


### Attachment also inlined for ease of reply###

diff --git a/gcc/config/aarch64/aarch64-bti-insert.c 
b/gcc/config/aarch64/aarch64-bti-insert.c
index 
57663ee23b490162dbe7ffe2f618066e71cea455..98026695fdbbe2eda84e0befad94b5fe4ce22754
 100644
--- a/gcc/config/aarch64/aarch64-bti-insert.c
+++ b/gcc/config/aarch64/aarch64-bti-insert.c
@@ -95,7 +95,7 @@ static bool
 aarch64_pac_insn_p (rtx x)
 {
   if (!INSN_P (x))
-return x;
+return false;
 
   subrtx_var_iterator::array_type array;
   FOR_EACH_SUBRTX_VAR (iter, array, PATTERN (x), ALL)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr97638.c 
b/gcc/testsuite/gcc.target/aarch64/pr97638.c
new file mode 100644
index 
..e5869e86c449aef5606541c4c7a51069a1426793
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr97638.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=bti" } */
+
+char *foo (const char *s, const int c)
+{
+  const char *p = 0;
+  for (;;)
+  {
+if (*s == c)
+p = s;
+if (p != 0 || *s++ == 0)
+break;
+  }
+  return (char *)p;
+}
+
+/* { dg-final { scan-assembler "hint\t34" } } */


rb13708.patch
Description: rb13708.patch

RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-10-30 Thread Sudakshina Das via Gcc-patches

Hi Richard

Thank you for the review. Please find my comments inlined.

> -Original Message-
> From: Richard Sandiford 
> Sent: 30 October 2020 15:03
> To: Sudakshina Das 
> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw 
> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> __builtin_memset
> 
> Sudakshina Das  writes:
> > diff --git a/gcc/config/aarch64/aarch64.h
> > b/gcc/config/aarch64/aarch64.h index
> >
> 00b5f8438863bb52c348cfafd5d4db478fe248a7..bcb654809c9662db0f51fc1368
> e3
> > 7e42969efd29 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -1024,16 +1024,18 @@ typedef struct  #define MOVE_RATIO(speed) \
> >(!STRICT_ALIGNMENT ? 2 : (((speed) ? 15 : AARCH64_CALL_RATIO) / 2))
> >
> > -/* For CLEAR_RATIO, when optimizing for size, give a better estimate
> > -   of the length of a memset call, but use the default otherwise.  */
> > +/* Like MOVE_RATIO, without -mstrict-align, make decisions in "setmem"
> when
> > +   we would use more than 3 scalar instructions.
> > +   Otherwise follow a sensible default: when optimizing for size, give a
> better
> > +   estimate of the length of a memset call, but use the default
> > +otherwise.  */
> >  #define CLEAR_RATIO(speed) \
> > -  ((speed) ? 15 : AARCH64_CALL_RATIO)
> > +  (!STRICT_ALIGNMENT ? 4 : (speed) ? 15 : AARCH64_CALL_RATIO)
> >
> >  /* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant, so
> when
> > optimizing for size adjust the ratio to account for the overhead of 
> > loading
> > the constant.  */
> >  #define SET_RATIO(speed) \
> > -  ((speed) ? 15 : AARCH64_CALL_RATIO - 2)
> > +  (!STRICT_ALIGNMENT ? 0 : (speed) ? 15 : AARCH64_CALL_RATIO - 2)
> 
> Think it would help to adjust the SET_RATIO comment too, otherwise it's not
> obvious why its !STRICT_ALIGNMNENT value is 0.
> 

Will do.

> >
> >  /* Disable auto-increment in move_by_pieces et al.  Use of auto-
> increment is
> > rarely a good idea in straight-line code since it adds an extra
> > address diff --git a/gcc/config/aarch64/aarch64.c
> > b/gcc/config/aarch64/aarch64.c index
> >
> a8cc545c37044345c3f1d3bf09151c8a9578a032..16ac0c076adcc82627af43473a9
> 3
> > 8e78d3a7ecdc 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -7058,6 +7058,9 @@ aarch64_gen_store_pair (machine_mode mode,
> rtx mem1, rtx reg1, rtx mem2,
> >  case E_V4SImode:
> >return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2);
> >
> > +case E_V16QImode:
> > +  return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2);
> > +
> >  default:
> >gcc_unreachable ();
> >  }
> > @@ -21373,6 +21376,134 @@ aarch64_expand_cpymem (rtx *operands)
> >return true;
> >  }
> >
> > +/* Like aarch64_copy_one_block_and_progress_pointers, except for
> memset where
> > +   *src is a register we have created with the duplicated value to be
> > +set.  */
> 
> AIUI, *SRC doesn't accumulate across calls in the way that it does for
> aarch64_copy_one_block_and_progress_pointers, so it might be better to
> pass an rtx rather than an “rtx *”.
> 

Will do.

> > +static void
> > +aarch64_set_one_block_and_progress_pointer (rtx *src, rtx *dst,
> > +   machine_mode mode)
> > +{
> > +  /* If we are copying 128bits or 256bits, we can do that straight from
> > +  the SIMD register we prepared.  */
> 
> Nit: excess space before “the”.
>
 
Will do.

> > +  if (known_eq (GET_MODE_BITSIZE (mode), 256))
> > +{
> > +  mode =  GET_MODE (*src);
> 
> Excess space before “GET_MODE”.
>
  
Will do.

> > +  /* "Cast" the *dst to the correct mode.  */
> > +  *dst = adjust_address (*dst, mode, 0);
> > +  /* Emit the memset.  */
> > +  emit_insn (aarch64_gen_store_pair (mode, *dst, *src,
> > +aarch64_progress_pointer (*dst),
> *src));
> > +
> > +  /* Move the pointers forward.  */
> > +  *dst = aarch64_move_pointer (*dst, 32);
> > +  return;
> > +}
> > +  else if (known_eq (GET_MODE_BITSIZE (mode), 128))
> 
> Nit: more usual in GCC not to have an “else” after an early return.
>

Will do.
 
> > +{
> > +  /* "Cast" the *dst to the correct mode.  */
> > +  *dst = adjust_address (*dst, GET_MODE (*src), 0);
> > +  /* Emit the memset.  */
> > +

[PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-10-29 Thread Sudakshina Das via Gcc-patches

Hi

This patch implements aarch64 backend expansion for __builtin_memset. Most of 
the
implementation is based on the expansion of __builtin_memcpy. We change the 
values of
SET_RATIO and MOVE_RATIO for cases where we do not have to strictly align and 
where
we can benefit from NEON instructions in the backend.

So for a test case like:

void foo (void* p) { __builtin_memset (p, 1, 7); }

instead of generating:
mov w3, 16843009
mov w2, 257
mov w1, 1
str w3, [x0]
strhw2, [x0, 4]
strbw1, [x0, 6]
ret
we now generate
moviv0.16b, 0x1
str s0, [x0]
str s0, [x0, 3]
ret

Bootstrapped and regression tested on aarch64-none-linux-gnu.
With this patch I have seen an overall improvement of 0.27% in Spec2017 Int
and 0.19% in Spec2017 FP benchmarks on Neoverse N1.

Is this ok for trunk?

gcc/ChangeLog:

2020-xx-xx  Sudakshina Das  

* config/aarch64/aarch64-protos.h (aarch64_expand_setmem): New
declaration.
* config/aarch64/aarch64.c (aarch64_gen_store_pair): Add case for
E_V16QImode.
(aarch64_set_one_block_and_progress_pointer): New helper for
aarch64_expand_setmem.
(aarch64_expand_setmem): Define the expansion for memset.
* config/aarch64/aarch64.h (CLEAR_RATIO): Tweak to favor
aarch64_expand_setmem when allowed and profitable.
(SET_RATIO): Likewise.
* config/aarch64/aarch64.md: Define pattern for setmemdi.

gcc/testsuite/ChangeLog:

2020-xx-xx  Sudakshina Das  

* g++.dg/tree-ssa/pr90883.C: Remove xfail for aarch64.
* gcc.dg/tree-prof/stringop-2.c: Add xfail for aarch64.
* gcc.target/aarch64/memset-corner-cases.c: New test.
* gcc.target/aarch64/memset-q-reg.c: New test.

Thanks
Sudi

### Attachment also inlined for ease of reply###

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
7a34c841355bad88365381912b163c61c5a35811..2aa3f1fddaafae58f0bfb26e5b33fe6a94e85e06
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -510,6 +510,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 void aarch64_expand_call (rtx, rtx, rtx, bool);
 bool aarch64_expand_cpymem (rtx *);
+bool aarch64_expand_setmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_float_const_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
00b5f8438863bb52c348cfafd5d4db478fe248a7..bcb654809c9662db0f51fc1368e37e42969efd29
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -1024,16 +1024,18 @@ typedef struct
 #define MOVE_RATIO(speed) \
   (!STRICT_ALIGNMENT ? 2 : (((speed) ? 15 : AARCH64_CALL_RATIO) / 2))
 
-/* For CLEAR_RATIO, when optimizing for size, give a better estimate
-   of the length of a memset call, but use the default otherwise.  */
+/* Like MOVE_RATIO, without -mstrict-align, make decisions in "setmem" when
+   we would use more than 3 scalar instructions.
+   Otherwise follow a sensible default: when optimizing for size, give a better
+   estimate of the length of a memset call, but use the default otherwise.  */
 #define CLEAR_RATIO(speed) \
-  ((speed) ? 15 : AARCH64_CALL_RATIO)
+  (!STRICT_ALIGNMENT ? 4 : (speed) ? 15 : AARCH64_CALL_RATIO)
 
 /* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant, so when
optimizing for size adjust the ratio to account for the overhead of loading
the constant.  */
 #define SET_RATIO(speed) \
-  ((speed) ? 15 : AARCH64_CALL_RATIO - 2)
+  (!STRICT_ALIGNMENT ? 0 : (speed) ? 15 : AARCH64_CALL_RATIO - 2)
 
 /* Disable auto-increment in move_by_pieces et al.  Use of auto-increment is
rarely a good idea in straight-line code since it adds an extra address
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
a8cc545c37044345c3f1d3bf09151c8a9578a032..16ac0c076adcc82627af43473a938e78d3a7ecdc
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7058,6 +7058,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx 
reg1, rtx mem2,
 case E_V4SImode:
   return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2);
 
+case E_V16QImode:
+  return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2);
+
 default:
   gcc_unreachable ();
 }
@@ -21373,6 +21376,134 @@ aarch64_expand_cpymem (rtx *operands)
   return true;
 }
 
+/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where
+   *src is a register we have created with the duplicated value to be set.  */
+static void
+aarch64_set_one_block_and_progress_pointer (rtx *src, rtx *dst,
+   machine_mode mode)
+{
+  /* If we are copying 128bits or 256bits,

RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-08-05 Thread Sudakshina Das

Hi Richard

Thank you for fixing this. I apologise for the trouble. I ran bootstrap only on 
an
earlier version of the patch where I should have ran it again on the final one! 
☹
I will be more careful in the future,

Thanks
Sudi

> -Original Message-
> From: Richard Sandiford 
> Sent: 05 August 2020 14:52
> To: Andreas Schwab 
> Cc: Sudakshina Das ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem
> expansion
> 
> Andreas Schwab  writes:
> > This breaks bootstrap.
> 
> I've pushed the below to fix this after bootstrapping & regression testing on
> aarch64-linux-gnu.
> 
> Richard

RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-08-04 Thread Sudakshina Das

Hi Richard

> -Original Message-
> From: Richard Sandiford 
> Sent: 31 July 2020 16:14
> To: Sudakshina Das 
> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov 
> Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem
> expansion
> 
> Sudakshina Das  writes:
> > Hi
> >
> > This is my attempt at reviving the old patch
> > https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html
> >
> > I have followed on Kyrill's comment upstream on the link above and I am
> using the recommended option iii that he mentioned.
> > "1) Adjust the copy_limit to 256 bits after checking
> AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning.
> >  2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256-
> bit moves. by iii:
> >iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs 
> > ldp/stps. This
> wouldn't need any adjustments to
> > MD patterns, but would make
> aarch64_copy_one_block_and_progress_pointers more complex as it would
> now have
> > two paths, where one handles two adjacent memory addresses in one
> calls."
> >
> > With this patch the following test
> >
> > #define N 8
> > extern int src[N], dst[N];
> >
> > void
> > foo (void)
> > {
> >   __builtin_memcpy (dst, src, N * sizeof (int)); }
> >
> > which was originally giving
> > foo:
> > adrpx1, src
> > add x1, x1, :lo12:src
> > ldp x4, x5, [x1]
> > adrpx0, dst
> > add x0, x0, :lo12:dst
> > ldp x2, x3, [x1, 16]
> > stp x4, x5, [x0]
> > stp x2, x3, [x0, 16]
> > ret
> >
> >
> > changes to the following
> > foo:
> > adrpx1, src
> > add x1, x1, :lo12:src
> > adrpx0, dst
> > add x0, x0, :lo12:dst
> > ldp q1, q0, [x1]
> > stp q1, q0, [x0]
> > ret
> >
> > This gives about 1.3% improvement on 523.xalancbmk_r in SPEC2017 and
> > an overall code size reduction on most
> > SPEC2017 Int benchmarks on Neoverse N1 due to more LDP/STP Q pair
> registers.
> 
> Sorry for the slow review.  LGTM with a very minor nit (sorry)…

Thanks. Committed with the change.
> 
> > @@ -21150,9 +21177,12 @@ aarch64_expand_cpymem (rtx *operands)
> >/* Convert n to bits to make the rest of the code simpler.  */
> >n = n * BITS_PER_UNIT;
> >
> > -  /* Maximum amount to copy in one go.  The AArch64 back-end has
> integer modes
> > - larger than TImode, but we should not use them for loads/stores here.
> */
> > -  const int copy_limit = GET_MODE_BITSIZE (TImode);
> > +  /* Maximum amount to copy in one go.  We allow 256-bit chunks based
> on the
> > + AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and
> > +TARGET_SIMD.  */
> > +  const int copy_limit = ((aarch64_tune_params.extra_tuning_flags
> > +  & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
> > + || !TARGET_SIMD)
> > +? GET_MODE_BITSIZE (TImode) :  256;
> 
> Should only be one space before “256”.
> 
> I guess at some point we should consider handling fixed-length SVE too, but
> that's only worth it for -msve-vector-bits=512 and higher.

Yes sure I will add this for future backlog.
> 
> Thanks,
> Richard

[PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-07-28 Thread Sudakshina Das

Hi

This is my attempt at reviving the old patch 
https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html

I have followed on Kyrill's comment upstream on the link above and I am using 
the recommended option iii that he mentioned.
"1) Adjust the copy_limit to 256 bits after checking 
AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning.
 2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256-bit 
moves. by iii:
   iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs ldp/stps. 
This wouldn't need any adjustments to
MD patterns, but would make 
aarch64_copy_one_block_and_progress_pointers more complex as it would now have
two paths, where one handles two adjacent memory addresses in one 
calls."

With this patch the following test

#define N 8
extern int src[N], dst[N];

void
foo (void)
{
  __builtin_memcpy (dst, src, N * sizeof (int));
}

which was originally giving
foo:
adrpx1, src
add x1, x1, :lo12:src
ldp x4, x5, [x1]
adrpx0, dst
add x0, x0, :lo12:dst
ldp x2, x3, [x1, 16]
stp x4, x5, [x0]
stp x2, x3, [x0, 16]
ret


changes to the following
foo:
adrpx1, src
add x1, x1, :lo12:src
adrpx0, dst
add x0, x0, :lo12:dst
ldp q1, q0, [x1]
stp q1, q0, [x0]
ret

This gives about 1.3% improvement on 523.xalancbmk_r in SPEC2017 and an overall 
code size reduction on most
SPEC2017 Int benchmarks on Neoverse N1 due to more LDP/STP Q pair registers.

Bootstrapped and regression tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Thanks
Sudi

gcc/ChangeLog:

2020-07-23  Sudakshina Das  
Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_gen_store_pair): Add case
for E_V4SImode.
(aarch64_gen_load_pair): Likewise.
(aarch64_copy_one_block_and_progress_pointers): Handle 256 bit copy.
(aarch64_expand_cpymem): Expand copy_limit to 256bits where
appropriate.

gcc/testsuite/ChangeLog:

2020-07-23  Sudakshina Das  
Kyrylo Tkachov  

* gcc.target/aarch64/cpymem-q-reg_1.c: New test.
* gcc.target/aarch64/large_struct_copy_2.c: Update for ldp q regs.

** Attachment inlined  
**

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
3fe1feaa80ccb0a287ee1c7ea1056e8f0a830532..a38ff39c4d5d53f056bbba3114ebaf8f0414c037
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6920,6 +6920,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx 
reg1, rtx mem2,
 case E_TFmode:
   return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2);
 
+case E_V4SImode:
+  return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2);
+
 default:
   gcc_unreachable ();
 }
@@ -6943,6 +6946,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx 
mem1, rtx reg2,
 case E_TFmode:
   return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2);
 
+case E_V4SImode:
+  return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
+
 default:
   gcc_unreachable ();
 }
@@ -21097,6 +21103,27 @@ static void
 aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,
  machine_mode mode)
 {
+  /* Handle 256-bit memcpy separately.  We do this by making 2 adjacent memory
+ address copies using V4SImode so that we can use Q registers.  */
+  if (known_eq (GET_MODE_BITSIZE (mode), 256))
+{
+  mode = V4SImode;
+  rtx reg1 = gen_reg_rtx (mode);
+  rtx reg2 = gen_reg_rtx (mode);
+  /* "Cast" the pointers to the correct mode.  */
+  *src = adjust_address (*src, mode, 0);
+  *dst = adjust_address (*dst, mode, 0);
+  /* Emit the memcpy.  */
+  emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2,
+   aarch64_progress_pointer (*src)));
+  emit_insn (aarch64_gen_store_pair (mode, *dst, reg1,
+aarch64_progress_pointer (*dst), 
reg2));
+  /* Move the pointers forward.  */
+  *src = aarch64_move_pointer (*src, 32);
+  *dst = aarch64_move_pointer (*dst, 32);
+  return;
+}
+
   rtx reg = gen_reg_rtx (mode);
 
   /* "Cast" the pointers to the correct mode.  */
@@ -21150,9 +21177,12 @@ aarch64_expand_cpymem (rtx *operands)
   /* Convert n to bits to make the rest of the code simpler.  */
   n = n * BITS_PER_UNIT;
 
-  /* Maximum amount to copy in one go.  The AArch64 back-end has integer modes
- larger than TImode, but we should not use them for loads/stores here.  */
-  const int copy_limit = GET_MODE_BITSIZE (TImode);
+  /* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
+ AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and TARGET_SIM

RE: [PATCH] Fix handling of OPT_mgeneral_regs_only in attribute.

2020-05-21 Thread Sudakshina Das

Hi Martin

> -Original Message-
> From: Martin Liška 
> Sent: 21 May 2020 16:01
> To: gcc-patches@gcc.gnu.org
> Cc: Sudakshina Das 
> Subject: [PATCH] Fix handling of OPT_mgeneral_regs_only in attribute.
> 
> Hi.
> 
> Similarly to:
> 
>  case OPT_mstrict_align:
>if (val)
>   opts->x_target_flags |= MASK_STRICT_ALIGN;
>else
>   opts->x_target_flags &= ~MASK_STRICT_ALIGN;
>return true;
> 
> the MASK_GENERAL_REGS_ONLY mask should be handled the same way.

My old patch added the -mno-* version of the option and hence needed the change.
Without the _no_ version for mgeneral-regs-only, I would imagine "val" to only 
ever have
1 as a value. Am I missing something here?

Sudi
> 
> @Sudakshina: The 'opts->x_target_flags |= MASK_STRICT_ALIGN' change is
> not backported to all active branches. Can you please do it?
> 
> Ready to be installed?
> 
> gcc/ChangeLog:
> 
> 2020-05-21  Martin Liska  
> 
>   * common/config/aarch64/aarch64-common.c
> (aarch64_handle_option):
>   Properly maask MASK_GENERAL_REGS_ONLY based on val.
> ---
>   gcc/common/config/aarch64/aarch64-common.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
>

[Committed, testsuite] Fix PR92870

2019-12-12 Thread Sudakshina Das

Hi

With my recent commit, I added a test that is not passing on all 
targets. My change was valid for targets that have a vector/scalar 
shift/rotate optabs (optab that supports vector shifted by scalar).

Since it does not seem to be easy to find out which targets would 
support it, I am limiting the test to the target that I know pass.

Committed as obvious r279310.

gcc/testsuite/ChangeLog

2019-12-12  Sudakshina Das  

PR testsuite/92870
* gcc.dg/vect/vect-shift-5.c: Add target to scan-tree-dump.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
index c1fd4f2..68e517e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
@@ -16,4 +16,7 @@ int foo (uint32_t arr[4][4])
 return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1;
 }
 
-/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */
+/* For a target that has a vector/scalar shift/rotate optab, check
+   that we are not adding the cost of creating a vector from the scalar
+   in the prologue.  */
+/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" { target { aarch64*-*-* x86_64-*-* } } } } */

Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts

2019-12-10 Thread Sudakshina Das

Hi Christophe

On 10/12/2019 09:01, Christophe Lyon wrote:
> Hi,
> 
> On Mon, 9 Dec 2019 at 11:23, Sudakshina Das  wrote:
>>
>> Hi Jeff
>>
>> On 07/12/2019 17:44, Jeff Law wrote:
>>> On Fri, 2019-12-06 at 14:05 +, Sudakshina Das wrote:
>>>> Hi
>>>>
>>>> While looking at the vectorization for following example, we
>>>> realized
>>>> that even though vectorizable_shift function was distinguishing
>>>> vector
>>>> shifted by vector from vector shifted by scalar, while modeling the
>>>> cost
>>>> it would always add the cost of building a vector constant despite
>>>> not
>>>> needing it for vector shifted by scalar.
>>>>
>>>> This patch fixes this by using scalar_shift_arg to determine whether
>>>> we
>>>> need to build a vector for the second operand or not. This reduces
>>>> prologue cost as shown in the test.
>>>>
>>>> Build and regression tests pass on aarch64-none-elf and
>>>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>>>> Spec2017 for AArch64.
>>>>
> 
> Looks like you didn't check on arm, where I can see that the new testcase 
> fails:
> FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
> scan-tree-dump vect "vectorizable_shift
> ===[\\n\\r][^\\n]*prologue_cost = 0"
> FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
> "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"
> 
> Seen on arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu neon-fp16
> 
> Christophe

Thanks for reporting this. There is already a bugzilla report PR92870 
for powerpc that I am looking at. Apologies I couldn't find your email 
address there to add you to the cc list.

Thanks
Sudi

> 
>>>> gcc/ChangeLog:
>>>>
>>>> 2019-xx-xx  Sudakshina Das  
>>>>   Richard Sandiford  
>>>>
>>>>   * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>>>>   vect_model_simple_cost call on scalar_shift_arg.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-xx-xx  Sudakshina Das  
>>>>
>>>>   * gcc.dg/vect/vect-shift-5.c: New test.
>>> It's a bit borderline, but it's really just twiddling a cost, so OK.
>>
>> Thanks :) Committed as r279114.
>>
>> Sudi
>>
>>>
>>> jeff
>>>
>>

Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts

2019-12-09 Thread Sudakshina Das

Hi Jeff

On 07/12/2019 17:44, Jeff Law wrote:
> On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
>> Hi
>>
>> While looking at the vectorization for following example, we
>> realized
>> that even though vectorizable_shift function was distinguishing
>> vector
>> shifted by vector from vector shifted by scalar, while modeling the
>> cost
>> it would always add the cost of building a vector constant despite
>> not
>> needing it for vector shifted by scalar.
>>
>> This patch fixes this by using scalar_shift_arg to determine whether
>> we
>> need to build a vector for the second operand or not. This reduces
>> prologue cost as shown in the test.
>>
>> Build and regression tests pass on aarch64-none-elf and
>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>> Spec2017 for AArch64.
>>
>> gcc/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  
>>  Richard Sandiford  
>>
>>  * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>>  vect_model_simple_cost call on scalar_shift_arg.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  
>>
>>  * gcc.dg/vect/vect-shift-5.c: New test.
> It's a bit borderline, but it's really just twiddling a cost, so OK.

Thanks :) Committed as r279114.

Sudi

> 
> jeff
>

Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts

2019-12-06 Thread Sudakshina Das

Hi

While looking at the vectorization for following example, we realized 
that even though vectorizable_shift function was distinguishing vector 
shifted by vector from vector shifted by scalar, while modeling the cost 
it would always add the cost of building a vector constant despite not 
needing it for vector shifted by scalar.

This patch fixes this by using scalar_shift_arg to determine whether we 
need to build a vector for the second operand or not. This reduces 
prologue cost as shown in the test.

Build and regression tests pass on aarch64-none-elf and 
x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in 
Spec2017 for AArch64.

gcc/ChangeLog:

2019-xx-xx  Sudakshina Das  
Richard Sandiford  

* tree-vect-stmt.c (vectorizable_shift): Condition ndts for
vect_model_simple_cost call on scalar_shift_arg.

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  

* gcc.dg/vect/vect-shift-5.c: New test.

Is this ok for trunk?

Thanks
Sudi

diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
new file mode 100644
index 000..c1fd4f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_int } */
+
+typedef unsigned int uint32_t;
+typedef short unsigned int uint16_t;
+
+int foo (uint32_t arr[4][4])
+{
+  int sum = 0;
+  for(int i = 0; i < 4; i++)
+{
+  sum += ((arr[0][i] >> 10) * 20) + ((arr[1][i] >> 11) & 53)
+	 + ((arr[2][i] >> 12) * 7)  + ((arr[3][i] >> 13) ^ 43);
+}
+return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1;
+}
+
+/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2cb6b15..396ff15 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5764,7 +5764,8 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 {
   STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type;
   DUMP_VECT_SCOPE ("vectorizable_shift");
-  vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, cost_vec);
+  vect_model_simple_cost (stmt_info, ncopies, dt,
+			  scalar_shift_arg ? 1 : ndts, slp_node, cost_vec);
   return true;
 }

Re: [Patch, GCC] Fix a condition post r278611

2019-12-05 Thread Sudakshina Das

Hi Richard

On 05/12/2019 17:04, Richard Sandiford wrote:
> Sudakshina Das  writes:
>> Hi
>>
>> While looking at vect_model_reduction_cost function, it seems Richard's
>> change in a recent commit r278611 missed an update to the following if
>> condition. Since the check for EXTRACT_LAST_REDUCTION is now split
>> above, the same check in the if condition will never be true.
>>
>> gcc/ChangeLog
>>
>> 2019-xx-xx  Sudakshina Das  
>>
>>  * tree-vect-loop.c (vect_model_reduction_cost): Remove
>>  reduction_type check from if condition.
>>
>> Is this ok for trunk?
> 
> OK, thanks.

Thanks. Committed as r279012.

Sudi

> 
> Richard
> 
>>
>> Thanks
>> Sudi
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index ca8c818..7469204 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -3933,7 +3933,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, 
>> internal_fn reduc_fn,
>> /* No extra instructions needed in the prologue.  */
>> prologue_cost = 0;
>>   
>> -  if (reduction_type == EXTRACT_LAST_REDUCTION || reduc_fn != IFN_LAST)
>> +  if (reduc_fn != IFN_LAST)
>>  /* Count one reduction-like operation per vector.  */
>>  inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar,
>>  stmt_info, 0, vect_body);

[Patch, GCC] Fix a condition post r278611

2019-12-05 Thread Sudakshina Das

Hi

While looking at vect_model_reduction_cost function, it seems Richard's 
change in a recent commit r278611 missed an update to the following if 
condition. Since the check for EXTRACT_LAST_REDUCTION is now split 
above, the same check in the if condition will never be true.

gcc/ChangeLog

2019-xx-xx  Sudakshina Das  

* tree-vect-loop.c (vect_model_reduction_cost): Remove
reduction_type check from if condition.

Is this ok for trunk?

Thanks
Sudi
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ca8c818..7469204 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3933,7 +3933,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, internal_fn reduc_fn,
   /* No extra instructions needed in the prologue.  */
   prologue_cost = 0;
 
-  if (reduction_type == EXTRACT_LAST_REDUCTION || reduc_fn != IFN_LAST)
+  if (reduc_fn != IFN_LAST)
 	/* Count one reduction-like operation per vector.  */
 	inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar,
 	stmt_info, 0, vect_body);

[Committed][Arm][testsuite] Fix failure for arm-fp16-ops-*.C

2019-12-02 Thread Sudakshina Das

Hi

Since r275022 which deprecates some uses of volatile, we have seen the 
following failures on arm-none-eabi and arm-none-linux-gnueabihf
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-1.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-2.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-5.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-6.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-7.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-8.C  -std=gnu++2a (test for 
excess errors)
Which catches the deprecated uses of volatile variables declared in 
arm-fp16-ops.h.

This patch removes the volatile declarations from the header. Since none 
of the tests are run with any high optimization levels, this should 
change should not prevent the real function of the tests.

Tests with RUNTESTFLAGS="dg.exp=arm-fp16-ops-*.C" now pass with the 
patch on arm-none-eabi.
Committed as obvious r278905

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  

* g++.dg/ext/arm-fp16/arm-fp16-ops.h: Remove volatile keyword.

Thanks
Sudi
diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h
index 320494e..a92e081 100644
--- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h
+++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h
@@ -7,16 +7,16 @@
 #define TEST(e) assert (e)
 #define TESTNOT(e) assert (!(e))
 
-volatile __fp16 h0 = 0.0;
-volatile __fp16 h1 = 1.0;
-volatile __fp16 h42 = 42.0;
-volatile __fp16 hm2 = -2.0;
-volatile __fp16 temp;
-
-volatile float f0 = 0.0;
-volatile float f1 = 1.0;
-volatile float f42 = 42.0;
-volatile float fm2 = -2.0;
+__fp16 h0 = 0.0;
+__fp16 h1 = 1.0;
+__fp16 h42 = 42.0;
+__fp16 hm2 = -2.0;
+__fp16 temp;
+
+float f0 = 0.0;
+float f1 = 1.0;
+float f42 = 42.0;
+float fm2 = -2.0;
 
 int main (void)
 {

Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

2019-11-15 Thread Sudakshina Das

Hi Richard

I apologise I should have given more explanation on my cover letter.
Although the bug was filed for vectorization, the conversation on it talked
about loops with two exits not being supported in the vectorizer and being not
being possible without lto and peeling causing more harm than benefit.

There was also no clear consensus among the discussion about the best way to do
unrolling. So I looked at Wilco's suggestion of unrolling here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398#c8
Although unroll_stupid does not exactly unroll it as he shows but
it gets closer than unroll_runtime_iterations. So I ran an experiment
to see if unrolliong the loop with unroll_stupid gets any benefit. The code
size benefit was easy to see with the small example but it also gave 
performance
benefit on Spec2017. The benefit comes because unroll_runtime_iteration adds
a switch case at the beginning for iteration check. This is less efficient
because it creates too many branch close together specially for a loop which
has more than 1 exit.
        beq     .L70
        cmp     x12, 1
        beq     .L55
        cmp     x12, 2
        beq     .L57
        cmp     x12, 3
        beq     .L59
        cmp     x12, 4
        beq     .L61
        cmp     x12, 5
        beq     .L63
        cmp     x12, 6
        bne     .L72

Finally I agree that unroll_stupid by default did not touch loops with multiple
exists but that was marked as a "TODO" to change later so I assumed that check
was not a hard requirement for the unrolling alghorithm.
/* Do not unroll loops with branches inside -- it increases number
     of mispredicts. 
     TODO: this heuristic needs tunning; call inside the loop body
     is also relatively good reason to not unroll.  */
unroll_stupid is also not touched unless there is -funroll-all-loops or a loop
pragma incidcating that maybe this could be potentially harmful on certain 
targets.
Since my experiments on AArch64 showed otherwise, I thought the easiest 
starting
point would be to do this in a target hook and only for a specific case
(multiple exits).

Thanks
Sudi

From: Richard Biener 

Sent: Friday, November 15, 2019 9:32 AM

To: Sudakshina Das 

Cc: gcc-patches@gcc.gnu.org ; Kyrill Tkachov 
; James Greenhalgh ; 
Richard Earnshaw ; bin.ch...@linux.alibaba.com 
;
 o...@ucw.cz 

Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das  wrote:

>

> Hi

>

> This patch is trying to fix PR88398 for AArch64. As discussed in the PR,

> loop unrolling is probably what we can do here. As an easy fix, the

> existing unroll_stupid is unrolling the given example better than the

> unroll_runtime_iterations since the the loop contains a break inside it.

Hm, the bug reference doesn't help me at all in reviewing this - the bug

is about vectorization.

So why is unroll_stupid better than unroll_runtime_iterations for a loop

with a break (or as your implementation, with multiple exists)?

I don't like this target hook, it seems like general heuristics can be

improved here, but it seems unroll-stupid doesn't consider loops

with multiple exits at all?

Richard.

> So all I have done here is:

> 1) Add a target hook so that this is AArch64 specific.

> 2) We are not unrolling the loops that decide_unroll_runtime_iterations

> would reject.

> 3) Out of the ones that decide_unroll_runtime_iterations would accept,

> check if the loop has more than 1 exit (this is done in the new target

> hook) and if it does, try to unroll using unroll_stupid.

>

> Regression tested on AArch64 and added the test from the PR. This gives

> an overall code size reduction of 2.35% and performance gain of 0.498%

> on Spec2017 Intrate.

>

> Is this ok for trunk?

>

> Thanks

> Sudi

>

> gcc/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  

>

> PR88398

> * cfgloop.h: Include target.h.

> (lpt_dec): Move to...

> * target.h (lpt_dec): ... Here.

> * target.def: Define TARGET_LOOP_DECISION_ADJUST.

> * loop-unroll.c (decide_unroll_runtime_iterations): Use new target 
> hook.

> (decide_unroll_stupid): Likewise.

> * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New 
> function.

> (TARGET_LOOP_DECISION_ADJUST): Define for AArch64.

> * doc/tm.texi: Regenerated.

> * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST.

>

> gcc/testsuite/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  

>

> PR88398

> * gcc.target/aarch64/pr88398.c: New test.

[PATCH, GCC, AArch64] Fix PR88398 for AArch64

2019-11-14 Thread Sudakshina Das

Hi

This patch is trying to fix PR88398 for AArch64. As discussed in the PR, 
loop unrolling is probably what we can do here. As an easy fix, the 
existing unroll_stupid is unrolling the given example better than the 
unroll_runtime_iterations since the the loop contains a break inside it.

So all I have done here is:
1) Add a target hook so that this is AArch64 specific.
2) We are not unrolling the loops that decide_unroll_runtime_iterations
would reject.
3) Out of the ones that decide_unroll_runtime_iterations would accept, 
check if the loop has more than 1 exit (this is done in the new target 
hook) and if it does, try to unroll using unroll_stupid.

Regression tested on AArch64 and added the test from the PR. This gives 
an overall code size reduction of 2.35% and performance gain of 0.498% 
on Spec2017 Intrate.

Is this ok for trunk?

Thanks
Sudi

gcc/ChangeLog:

2019-xx-xx  Sudakshina Das  

PR88398
* cfgloop.h: Include target.h.
(lpt_dec): Move to...
* target.h (lpt_dec): ... Here.
* target.def: Define TARGET_LOOP_DECISION_ADJUST.
* loop-unroll.c (decide_unroll_runtime_iterations): Use new target hook.
(decide_unroll_stupid): Likewise.
* config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New function.
(TARGET_LOOP_DECISION_ADJUST): Define for AArch64.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST.

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  

PR88398
* gcc.target/aarch64/pr88398.c: New test.
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 0b0154ffd7bf031a005de993b101d9db6dd98c43..985c74e3b60728fc8c9d34b69634488cae3451cb 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -21,15 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_CFGLOOP_H
 
 #include "cfgloopmanip.h"
-
-/* Structure to hold decision about unrolling/peeling.  */
-enum lpt_dec
-{
-  LPT_NONE,
-  LPT_UNROLL_CONSTANT,
-  LPT_UNROLL_RUNTIME,
-  LPT_UNROLL_STUPID
-};
+#include "target.h"
 
 struct GTY (()) lpt_decision {
   enum lpt_dec decision;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 599d07a729e7438080f8b5240ee95037a49fb983..f31ac41d66257c01ead8d5f5b9b22379ecb5d276 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21093,6 +21093,39 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
 }
 }
 
+/* Implement TARGET_LOOP_DECISION_ADJUST.  CONSIDER is the loop decision
+   currently being checked for loop LOOP.  This returns a decision which could
+   either be LPT_UNROLL_STUPID or the current value in LOOP.  */
+static enum lpt_dec
+aarch64_loop_decision_adjust (enum lpt_dec consider, class loop *loop)
+{
+  switch (consider)
+{
+case LPT_UNROLL_CONSTANT:
+  return loop->lpt_decision.decision;
+
+case LPT_UNROLL_RUNTIME:
+/* Fall through.  */
+case LPT_UNROLL_STUPID:
+  {
+	vec edges = get_loop_exit_edges (loop);
+	if (edges.length () > 1)
+	  {
+	if (dump_file)
+	  fprintf (dump_file, ";; Need change in loop decision\n");
+	consider = LPT_UNROLL_STUPID;
+	return consider;
+	  }
+	return loop->lpt_decision.decision;
+  }
+
+case LPT_NONE:
+/* Fall through.  */
+default:
+  gcc_unreachable ();
+}
+}
+
 /* Implement TARGET_COMPUTE_PRESSURE_CLASSES.  */
 
 static int
@@ -21839,6 +21872,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
 
+#undef TARGET_LOOP_DECISION_ADJUST
+#define TARGET_LOOP_DECISION_ADJUST aarch64_loop_decision_adjust
+
 #undef TARGET_SCHED_ADJUST_PRIORITY
 #define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cd9aed9874f4e6b2b0e2f8956ed6155975e643a8..61bd00e84c8a2a8865e95ba579c3b94790ab1331 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11857,6 +11857,15 @@ is required only when the target has special constraints like maximum
 number of memory accesses.
 @end deftypefn
 
+@deftypefn {Target Hook} {enum lpt_dec} TARGET_LOOP_DECISION_ADJUST (enum lpt_dec @var{consider}, class loop *@var{loop})
+This target hook returns either a new value for the loop unrolling
+decision or the existing value in @var{loop}. The parameter @var{consider}
+is the loop decision currently being tested. The parameter @var{loop} is a
+pointer to the loop, which is going to be checked for unrolling. This target
+hook is required only when the target wants to override the unrolling
+decisions.
+@end deftypefn
+
 @defmac POWI_MAX_MULTS
 If defined, this macro is interpreted as a signed integer C expression
 that specifies the maximum number of floating point multiplications
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 2739e9ceec5ad7253ff9135da8dbe3bf6010e8d7..7a7f917fb45a6cc22f373ff16f8b78aa3e35f210 100644
--- a/gcc/

Re: [PATCH, GCC] Fix unrolling check.

2019-11-11 Thread Sudakshina Das

On 11/11/2019 14:50, Eric Botcazou wrote:
>> Thanks for the explanation. However, I do not understand why are we
>> returning with the default value.
> 
> The regression you reported should be clear enough though: if we don't do
> that, we will unroll in cases where we would not have before.  Try with a
> compiler that predates the pragma and compare, there should be no changes.
> 
>> What "do we always do"?
> 
> What we do in the absence of specific unrolling directives for the loop.

Yeah fair enough! Sorry for the trouble.

Sudi
>

Re: [PATCH, GCC] Fix unrolling check.

2019-11-11 Thread Sudakshina Das

Hi Eric

On 08/11/2019 19:16, Eric Botcazou wrote:
>> I was fiddling around with the loop unrolling pass and noticed a check
>> in decide_unroll_* functions (in the patch). The comment on top of this
>> check says
>> "/* If we were not asked to unroll this loop, just return back silently.
>>*/"
>> However the check returns when loop->unroll == 0 rather than 1.
>>
>> The check was added in r255106 where the ChangeLog suggests that the
>> actual intention was probably to check the value 1 and not 0.
> 
> No, this is intended, 0 is the default value of the field, not 1.  And note
> that decide_unroll_constant_iterations, decide_unroll_runtime_iterations and
> decide_unroll_stupid *cannot* be called with loop->unroll == 1 because of this
> check in decide_unrolling:

Thanks for the explanation. However, I do not understand why are we 
returning with the default value. The comment for "unroll" is a bit 
ambiguous for value 0.

   /* The number of times to unroll the loop.  0 means no information given,
  just do what we always do.  A value of 1 means do not unroll the loop.
  A value of USHRT_MAX means unroll with no specific unrolling factor.
  Other values means unroll with the given unrolling factor.  */
   unsigned short unroll;

What "do we always do"?

Thanks
Sudi

> 
>if (loop->unroll == 1)
>   {
> if (dump_file)
>   fprintf (dump_file,
>";; Not unrolling loop, user didn't want it unrolled\n");
> continue;
>   }
> 
>> Tested on aarch64-none-elf with one new regression:
>> FAIL: gcc.dg/pr40209.c (test for excess errors)
>> This fails because the changes cause the loop to unroll 3 times using
>> unroll_stupid and that shows up as excess error due -fopt-info. This
>> option was added in r202077 but I am not sure why this particular test
>> was chosen for it.
> 
> That's a regression, there should be no unrolling.
>

[PATCH, GCC] Fix unrolling check.

2019-11-08 Thread Sudakshina Das

Hi

I was fiddling around with the loop unrolling pass and noticed a check 
in decide_unroll_* functions (in the patch). The comment on top of this 
check says
"/* If we were not asked to unroll this loop, just return back silently. 
  */"
However the check returns when loop->unroll == 0 rather than 1.

The check was added in r255106 where the ChangeLog suggests that the 
actual intention was probably to check the value 1 and not 0.

Tested on aarch64-none-elf with one new regression:
FAIL: gcc.dg/pr40209.c (test for excess errors)
This fails because the changes cause the loop to unroll 3 times using 
unroll_stupid and that shows up as excess error due -fopt-info. This 
option was added in r202077 but I am not sure why this particular test 
was chosen for it.

Does this change look ok? Can I just remove the -fopt-info from the test 
or unrolling the loop in the test is not desirable?

Thanks
Sudi

gcc/ChangeLog:

2019-11-07  Sudakshina Das  

* loop-unroll.c (decide_unroll_constant_iterations): Update condition 
to check
loop->unroll.
(decide_unroll_runtime_iterations): Likewise.
(decide_unroll_stupid): Likewise.
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 63fccd23fae38f8918a7d94411aaa43c72830dd3..9f7ab4b5c1c9b2333148e452b84afbf040707456 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -354,7 +354,7 @@ decide_unroll_constant_iterations (class loop *loop, int flags)
   widest_int iterations;
 
   /* If we were not asked to unroll this loop, just return back silently.  */
-  if (!(flags & UAP_UNROLL) && !loop->unroll)
+  if (!(flags & UAP_UNROLL) && loop->unroll == 1)
 return;
 
   if (dump_enabled_p ())
@@ -674,7 +674,7 @@ decide_unroll_runtime_iterations (class loop *loop, int flags)
   widest_int iterations;
 
   /* If we were not asked to unroll this loop, just return back silently.  */
-  if (!(flags & UAP_UNROLL) && !loop->unroll)
+  if (!(flags & UAP_UNROLL) && loop->unroll == 1)
 return;
 
   if (dump_enabled_p ())
@@ -1159,7 +1159,7 @@ decide_unroll_stupid (class loop *loop, int flags)
   widest_int iterations;
 
   /* If we were not asked to unroll this loop, just return back silently.  */
-  if (!(flags & UAP_UNROLL_ALL) && !loop->unroll)
+  if (!(flags & UAP_UNROLL_ALL) && loop->unroll == 1)
 return;
 
   if (dump_enabled_p ())

[PATCH, GCC, AArch64] Enable Transactional Memory Extension

2019-07-10 Thread Sudakshina Das

Hi

This patch enables the new Transactional Memory Extension announced 
recently as part of Arm's new architecture technologies.
We introduce a new optional extension "tme" to enable this. The 
following instructions are part of the extension:
* tstart 
* ttest 
* tcommit
* tcancel #
The documentation for the above can be found here:
https://developer.arm.com/docs/ddi0602/latest/base-instructions-alphabetic-order

We have also added ACLE intrinsics for the instructions above according to:
https://developer.arm.com/docs/101028/latest/transactional-memory-extension-tme-intrinsics

Builds and regression tested on aarch64-none-linux-gnu and added new 
tests for the new instructions.

Is this okay for trunk?

Thanks
Sudi

*** gcc/ChangeLog ***

2019-xx-xx  Sudakshina Das  

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
AARCH64_TME_BUILTIN_TSTART, AARCH64_TME_BUILTIN_TCOMMIT,
AARCH64_TME_BUILTIN_TTEST and AARCH64_TME_BUILTIN_TCANCEL.
(aarch64_init_tme_builtins): New.
(aarch64_init_builtins): Call aarch64_init_tme_builtins.
(aarch64_expand_builtin_tme): New.
(aarch64_expand_builtin): Handle TME builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_TME when enabled.
* config/aarch64/aarch64-option-extensions.def: Add "tme".
* config/aarch64/aarch64.h (AARCH64_FL_TME, AARCH64_ISA_TME): New.
(TARGET_TME): New.
* config/aarch64/aarch64.md (define_c_enum "unspec"): Add UNSPEC_TTEST.
(define_c_enum "unspecv"): Add UNSPECV_TSTART, UNSPECV_TCOMMIT and
UNSPECV_TCANCEL.
(tstart, ttest, tcommit, tcancel): New instructions.
* config/aarch64/arm_acle.h (__tstart, __tcommit): New.
(__tcancel, __ttest): New.
(_TMFAILURE_REASON, _TMFAILURE_RTRY, _TMFAILURE_CNCL): New macro.
(_TMFAILURE_MEM, _TMFAILURE_IMP, _TMFAILURE_ERR): Likewise.
(_TMFAILURE_SIZE, _TMFAILURE_NEST, _TMFAILURE_DBG): Likewise.
(_TMFAILURE_INT, _TMFAILURE_TRIVIAL): Likewise.
* config/arm/types.md: Add new tme type attr.
* doc/invoke.texi: Document "tme".

*** gcc/testsuite/ChangeLog ***

2019-xx-xx  Sudakshina Das  

* gcc.target/aarch64/acle/tme.c: New test.
* gcc.target/aarch64/pragma_cpp_predefs_2.c: New test.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 549a6c249243372eacb5d29923b5d1abce4ac79a..16c1d42ea2be0f477692be592e30ba8ce27f05a7 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -438,6 +438,11 @@ enum aarch64_builtins
   /* Special cased Armv8.3-A Complex FMA by Lane quad Builtins.  */
   AARCH64_SIMD_FCMLA_LANEQ_BUILTIN_BASE,
   AARCH64_SIMD_FCMLA_LANEQ_BUILTINS
+  /* TME builtins.  */
+  AARCH64_TME_BUILTIN_TSTART,
+  AARCH64_TME_BUILTIN_TCOMMIT,
+  AARCH64_TME_BUILTIN_TTEST,
+  AARCH64_TME_BUILTIN_TCANCEL,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1067,6 +1072,35 @@ aarch64_init_pauth_hint_builtins (void)
 			NULL_TREE);
 }
 
+/* Initialize the transactional memory extension (TME) builtins.  */
+static void
+aarch64_init_tme_builtins (void)
+{
+  tree ftype_uint64_void
+= build_function_type_list (uint64_type_node, NULL);
+  tree ftype_void_void
+= build_function_type_list (void_type_node, NULL);
+  tree ftype_void_uint64
+= build_function_type_list (void_type_node, uint64_type_node, NULL);
+
+  aarch64_builtin_decls[AARCH64_TME_BUILTIN_TSTART]
+= add_builtin_function ("__builtin_aarch64_tstart", ftype_uint64_void,
+			AARCH64_TME_BUILTIN_TSTART, BUILT_IN_MD,
+			NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_TME_BUILTIN_TTEST]
+= add_builtin_function ("__builtin_aarch64_ttest", ftype_uint64_void,
+			AARCH64_TME_BUILTIN_TTEST, BUILT_IN_MD,
+			NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCOMMIT]
+= add_builtin_function ("__builtin_aarch64_tcommit", ftype_void_void,
+			AARCH64_TME_BUILTIN_TCOMMIT, BUILT_IN_MD,
+			NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_TME_BUILTIN_TCANCEL]
+= add_builtin_function ("__builtin_aarch64_tcancel", ftype_void_uint64,
+			AARCH64_TME_BUILTIN_TCANCEL, BUILT_IN_MD,
+			NULL, NULL_TREE);
+}
+
 void
 aarch64_init_builtins (void)
 {
@@ -1104,6 +1138,9 @@ aarch64_init_builtins (void)
  register them.  */
   if (!TARGET_ILP32)
 aarch64_init_pauth_hint_builtins ();
+
+  if (TARGET_TME)
+aarch64_init_tme_builtins ();
 }
 
 tree
@@ -1507,6 +1544,47 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
   return target;
 }
 
+/* Function to expand an expression EXP which calls one of the Transactional
+   Memory Extension (TME) builtins FCODE with the result going to TARGET.  */
+static rtx
+aarch64_expand_builtin_tme (int fcode, tree exp, rtx target)
+{

Re: [PATCH][AArch64] Make use of FADDP in simple reductions

2019-05-30 Thread Sudakshina Das

Hi Elen

Thank you for doing this. You will need a maintainer's approval but I 
would like to add a couple of comments. Please find them inline.

On 08/05/2019 14:36, Elen Kalda wrote:
> Hi,
> 
> This patch adds a pattern to support the FADDP (scalar) instruction.
> 
> Before the patch, the C code
> 
> typedef double v2df __attribute__((vector_size (16)));
> 
> double
> foo (v2df x)
> {
>return x[1] + x[0];
> }
> 
> generated:
> foo:
>  dup d1, v0.d[0]
>  dup d0, v0.d[1]
>  faddd0, d1, d0
>  ret
> 
> After patch:
> foo:
>   faddp   d0, v0.2d
>   ret
> 
> 
> Bootstrapped and done regression tests on aarch64-none-linux-gnu -
> no issues found.
> 
> Best wishes,
> Elen
> 
> 
> gcc/ChangeLog:
> 
> 2019-04-24  Elen Kalda  
> 
>   * config/aarch64/aarch64-simd.md (*aarch64_faddp): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-04-24  Elen Kalda  
> 
>   * gcc.target/aarch64/simd/scalar_faddp.c: New test.
> 

 > diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
 > index 
e3852c5d182b70978d7603225fce55c0b8ee2894..89fedc6cb3f0c6eb74c6f8d0b21cedb5ae20a095
 
100644
 > --- a/gcc/config/aarch64/aarch64-simd.md
 > +++ b/gcc/config/aarch64/aarch64-simd.md
 > @@ -2372,6 +2372,21 @@
 >[(set_attr "type" "neon_fp_reduc_add_")]
 >  )
 >
 > +(define_insn "*aarch64_faddp"
 > +  [(set (match_operand: 0 "register_operand" "=w")
 > +(plus:
 > +  (vec_select: (match_operand:VHSDF 1 "register_operand" "w")

I do not think the VHSDF mode should be used here. I believe you may 
have taken this from the vector form of this instruction but that seems 
to be different than the scalar one. Someone with more floating point 
instruction experience can chime in here.

 > +(parallel[(match_operand 2 "const_int_operand" "n")]))
 > +  (vec_select: (match_dup:VHSDF 1)
 > +(parallel[(match_operand 3 "const_int_operand" "n")]]
 > +  "TARGET_SIMD
 > +  && ((INTVAL (operands[2]) == 0 && INTVAL (operands[3]) == 1)

Just some minor indentation issue. The && should be below T

 > +|| (INTVAL (operands[2]) == 1 && INTVAL (operands[3]) == 0))"

Likewise this should be below the second opening brace '('

...

 > --- /dev/null
 > +++ b/gcc/testsuite/gcc.target/aarch64/simd/scalar_faddp.c
 > @@ -0,0 +1,31 @@
 > +/* { dg-do assemble } */

This can be dg-do compile since you only want an assembly file

 > +/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
 > +/* { dg-add-options arm_v8_2a_fp16_scalar } */
 > +/* { dg-additional-options "-save-temps -O1" } */

The --save-temps can then be removed as the dg-do compile will produce 
the .s file for you

 > +/* { dg-final { scan-assembler-not "dup" } } */
...


Thanks
Sudi

RE: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.

2019-04-23 Thread Sudakshina Das

Hi James

-Original Message-
From: James Greenhalgh  
Sent: 18 April 2019 09:56
To: Sudakshina Das 
Cc: Richard Henderson ; H.J. Lu 
; Richard Henderson ; 
gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw 
; Marcus Shawcroft ; 
ni...@redhat.com
Subject: Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.

On Thu, Apr 04, 2019 at 05:01:06PM +0100, Sudakshina Das wrote:
> Hi Richard
> 
> On 03/04/2019 11:28, Richard Henderson wrote:
> > On 4/3/19 5:19 PM, Sudakshina Das wrote:
> >> +  /* PT_NOTE header: namesz, descsz, type.
> >> +   namesz = 4 ("GNU\0")
> >> +   descsz = 16 (Size of the program property array)
> >> +   type   = 5 (NT_GNU_PROPERTY_TYPE_0).  */
> >> +  assemble_align (POINTER_SIZE);
> >> +  assemble_integer (GEN_INT (4), 4, 32, 1);
> >> +  assemble_integer (GEN_INT (16), 4, 32, 1);
> > 
> > So, it's 16 only if POINTER_SIZE == 64.
> > 
> > I think ROUND_UP (12, POINTER_BYTES) is what you want here.
> >
> 
> 
> Ah yes. I have made that change now.

This is OK, but instead of:

> diff --git a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c 
> b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c
> index 
> e8e3cdac51350b545e5c2a644a3e1f4d1c37f88d..1fe92ff08935d4c6f08affcbd77e
> a91537030640 100644
> --- a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c
> @@ -4,7 +4,9 @@
>  int
>  f (int a, ...)
>  {
> -  /* { dg-final { scan-assembler-not "str" } } */
> +  /* Fails on aarch64*-*-linux* if configured with
> +--enable-standard-branch-protection because of the GNU NOTE 
> + section.  */
> +  /* { dg-final { scan-assembler-not "str" { target { ! 
> + aarch64*-*-linux* } || { ! default_branch_protection } } } } */
>return a;
>  }

> Can you just change the regex to check for str followed by a tab, or 
> something that looks else which looks like the instruction and doesn't match 
> against 'string'.

>Thanks,
>James

Ah yes, I have reduced the diff in this test to only update the scan directive 
to look for 'str\t' instead.
Committed as r270515.

Thanks
Sudi

> 
> Thanks
> Sudi
> 
> > 
> > r~
> > 
>

Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.

2019-04-12 Thread Sudakshina Das

Ping.

On 04/04/2019 17:01, Sudakshina Das wrote:
> Hi Richard
> 
> On 03/04/2019 11:28, Richard Henderson wrote:
>> On 4/3/19 5:19 PM, Sudakshina Das wrote:
>>> +  /* PT_NOTE header: namesz, descsz, type.
>>> + namesz = 4 ("GNU\0")
>>> + descsz = 16 (Size of the program property array)
>>> + type   = 5 (NT_GNU_PROPERTY_TYPE_0).  */
>>> +  assemble_align (POINTER_SIZE);
>>> +  assemble_integer (GEN_INT (4), 4, 32, 1);
>>> +  assemble_integer (GEN_INT (16), 4, 32, 1);
>>
>> So, it's 16 only if POINTER_SIZE == 64.
>>
>> I think ROUND_UP (12, POINTER_BYTES) is what you want here.
>>
> 
> 
> Ah yes. I have made that change now.
> 
> Thanks
> Sudi
> 
>>
>> r~
>>
>

Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.

2019-04-04 Thread Sudakshina Das

Hi Richard

On 03/04/2019 11:28, Richard Henderson wrote:
> On 4/3/19 5:19 PM, Sudakshina Das wrote:
>> +  /* PT_NOTE header: namesz, descsz, type.
>> + namesz = 4 ("GNU\0")
>> + descsz = 16 (Size of the program property array)
>> + type   = 5 (NT_GNU_PROPERTY_TYPE_0).  */
>> +  assemble_align (POINTER_SIZE);
>> +  assemble_integer (GEN_INT (4), 4, 32, 1);
>> +  assemble_integer (GEN_INT (16), 4, 32, 1);
> 
> So, it's 16 only if POINTER_SIZE == 64.
> 
> I think ROUND_UP (12, POINTER_BYTES) is what you want here.
>


Ah yes. I have made that change now.

Thanks
Sudi

> 
> r~
> 

diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h
index 9d0292d64f20939ccedd7ab56027aa1282826b23..5e8b34ded03c78493f868e38647bf57c2da5187c 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -83,7 +83,7 @@
 
 #define GNU_USER_TARGET_D_CRITSEC_SIZE 48
 
-#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
+#define TARGET_ASM_FILE_END aarch64_file_end_indicate_exec_stack
 
 /* Uninitialized common symbols in non-PIE executables, even with
strong definitions in dependent shared libraries, will resolve
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b38505b0872688634b2d3f625ab8d313e89cfca0..83b8ef84808c19fa1214fa06c32957936f5eb520 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18744,6 +18744,57 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
+/* Implement TARGET_ASM_FILE_END for AArch64.  This adds the AArch64 GNU NOTE
+   section at the end if needed.  */
+#define GNU_PROPERTY_AARCH64_FEATURE_1_AND	0xc000
+#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI	(1U << 0)
+#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC	(1U << 1)
+void
+aarch64_file_end_indicate_exec_stack ()
+{
+  file_end_indicate_exec_stack ();
+
+  unsigned feature_1_and = 0;
+  if (aarch64_bti_enabled ())
+feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI;
+
+  if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE)
+feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
+
+  if (feature_1_and)
+{
+  /* Generate .note.gnu.property section.  */
+  switch_to_section (get_section (".note.gnu.property",
+  SECTION_NOTYPE, NULL));
+
+  /* PT_NOTE header: namesz, descsz, type.
+	 namesz = 4 ("GNU\0")
+	 descsz = 16 (Size of the program property array)
+		  [(12 + padding) * Number of array elements]
+	 type   = 5 (NT_GNU_PROPERTY_TYPE_0).  */
+  assemble_align (POINTER_SIZE);
+  assemble_integer (GEN_INT (4), 4, 32, 1);
+  assemble_integer (GEN_INT (ROUND_UP (12, POINTER_BYTES)), 4, 32, 1);
+  assemble_integer (GEN_INT (5), 4, 32, 1);
+
+  /* PT_NOTE name.  */
+  assemble_string ("GNU", 4);
+
+  /* PT_NOTE contents for NT_GNU_PROPERTY_TYPE_0:
+	 type   = GNU_PROPERTY_AARCH64_FEATURE_1_AND
+	 datasz = 4
+	 data   = feature_1_and.  */
+  assemble_integer (GEN_INT (GNU_PROPERTY_AARCH64_FEATURE_1_AND), 4, 32, 1);
+  assemble_integer (GEN_INT (4), 4, 32, 1);
+  assemble_integer (GEN_INT (feature_1_and), 4, 32, 1);
+
+  /* Pad the size of the note to the required alignment.  */
+  assemble_align (POINTER_SIZE);
+}
+}
+#undef GNU_PROPERTY_AARCH64_FEATURE_1_PAC
+#undef GNU_PROPERTY_AARCH64_FEATURE_1_BTI
+#undef GNU_PROPERTY_AARCH64_FEATURE_1_AND
 
 /* Target-specific selftests.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/bti-1.c b/gcc/testsuite/gcc.target/aarch64/bti-1.c
index a8c60412e310a4f322372f334ae5314f426d310e..5a556b08ed15679b25676a11fe9c7a64641ee671 100644
--- a/gcc/testsuite/gcc.target/aarch64/bti-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/bti-1.c
@@ -61,3 +61,4 @@ lab2:
 }
 /* { dg-final { scan-assembler-times "hint\t34" 1 } } */
 /* { dg-final { scan-assembler-times "hint\t36" 12 } } */
+/* { dg-final { scan-assembler ".note.gnu.property" { target *-*-linux* } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c
index e8e3cdac51350b545e5c2a644a3e1f4d1c37f88d..1fe92ff08935d4c6f08affcbd77ea91537030640 100644
--- a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c
@@ -4,7 +4,9 @@
 int
 f (int a, ...)
 {
-  /* { dg-final { scan-assembler-not "str" } } */
+  /* Fails on aarch64*-*-linux* if configured with
+--enable-standard-branch-protection because of the GNU NOTE section.  */
+  /* { dg-final { scan-assembler-not "str" { target { ! aarch64*-*-linux* } || { ! default_branch_protection } } } } */
   return a;
 }

Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.

2019-04-03 Thread Sudakshina Das

Hi Richard

On 02/04/2019 10:25, Sudakshina Das wrote:
> Hi
> 
> On 02/04/2019 03:27, H.J. Lu wrote:
>> On Tue, Apr 2, 2019 at 10:05 AM Richard Henderson  
>> wrote:
>>>
>>> On 4/1/19 8:53 PM, Sudakshina Das wrote:
>>>>> This could stand to use a comment, a moment's thinking about the 
>>>>> sizes, and to
>>>>> use the existing asm output functions.
>>>>>
>>>>>   /* PT_NOTE header: namesz, descsz, type.
>>>>>  namesz = 4 ("GNU\0")
>>>>>  descsz = 12 (see below)
>>>> I was trying out these changes but the descsz of 12 gets rejected by
>>>> readelf. It hits the following
>>>>
>>>>     unsigned int    size = is_32bit_elf ? 4 : 8;
>>>>
>>>>     printf (_("  Properties: "));
>>>>
>>>>     if (pnote->descsz < 8 || (pnote->descsz % size) != 0)
>>>>   {
>>>>     printf (_("\n"),
>>>> pnote->descsz);
>>>>     return;
>>>>   }
>>>
>>> Hmm, interesting.  The docs say that padding is not to be included in 
>>> descsz
>>> (gabi4.1, page 82).  To my eye this is a bug in binutils, but perhaps 
>>> we will
>>> have to live with it.
>>>
>>> Nick, thoughts?
>>
>> descsz is wrong.  From:
>>
>> https://github.com/hjl-tools/linux-abi/wiki/Linux-Extensions-to-gABI
>>
>> n_desc The note descriptor. The first n_descsz bytes in n_desc is the 
>> pro-
>> gram property array.
>>
>> The program property array
>> Each array element represents one program property with type, data
>> size and data.
>> In 64-bit objects, each element is an array of 8-byte integers in the
>> format of the
>> target processor. In 32-bit objects, each element is an array of
>> 4-byte integers in
>> the format of the target processor.
> 
> Thanks @HJ for clarifying that. I should have been more careful in 
> spotting the difference.
> 
> @Richard I will update my patch according to your suggestions but 
> keeping in mind decssz should be the size of the entire program property 
> array so 16 in this case.
> 

I have updated the patch as per your suggestions. The Changelog is still 
valid from my original patch.

Thanks
Sudi

> Thanks
> Sudi
>>
>>
> 

diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h
index 9d0292d64f20939ccedd7ab56027aa1282826b23..5e8b34ded03c78493f868e38647bf57c2da5187c 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -83,7 +83,7 @@
 
 #define GNU_USER_TARGET_D_CRITSEC_SIZE 48
 
-#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
+#define TARGET_ASM_FILE_END aarch64_file_end_indicate_exec_stack
 
 /* Uninitialized common symbols in non-PIE executables, even with
strong definitions in dependent shared libraries, will resolve
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b38505b0872688634b2d3f625ab8d313e89cfca0..f25f7da8f0224167db68e61a2ba88f0943316360 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18744,6 +18744,56 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
+/* Implement TARGET_ASM_FILE_END for AArch64.  This adds the AArch64 GNU NOTE
+   section at the end if needed.  */
+#define GNU_PROPERTY_AARCH64_FEATURE_1_AND	0xc000
+#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI	(1U << 0)
+#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC	(1U << 1)
+void
+aarch64_file_end_indicate_exec_stack ()
+{
+  file_end_indicate_exec_stack ();
+
+  unsigned feature_1_and = 0;
+  if (aarch64_bti_enabled ())
+feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI;
+
+  if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE)
+feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
+
+  if (feature_1_and)
+{
+  /* Generate .note.gnu.property section.  */
+  switch_to_section (get_section (".note.gnu.property",
+  SECTION_NOTYPE, NULL));
+
+  /* PT_NOTE header: namesz, descsz, type.
+	 namesz = 4 ("GNU\0")
+	 descsz = 16 (Size of the program property array)
+	 type   = 5 (NT_GNU_PROPERTY_TYPE_0).  */
+  assemble_align (POINTER_SIZE);
+  assemble_integer (GEN_INT (4), 4, 32, 1);
+  assemble_integer (GEN_INT (16), 4, 32, 1);
+  assemble_integer (GEN_INT (5), 4, 32, 1);
+
+  /* PT_NOTE name.  */
+  assemble_string ("GNU", 4);
+
+  /* PT_NOTE contents for NT_GNU_PROPERTY_TYPE_0:
+	 type   = GNU_PROPERTY_AARCH64_FEATURE_1_AND
+	 datasz = 4
+	 data   = feature_1_and.  */
+  assemble_

Re: [PATCH, GCC, DOCS, AArch64] Add missing documenation for mbranch-protection

2019-04-03 Thread Sudakshina Das

Hi Sandra

On 02/04/2019 16:32, Sandra Loosemore wrote:
> On 4/2/19 6:45 AM, Sudakshina Das wrote:
>> Hi
>>
>> This patch add the missing documentation bits for -mbranch-protection in
>> both extend.texi and invoke.texi.
>>
>> Is this ok for trunk?
>>
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 
>> ef7adb6a9c0fe1abd769e237fd8d0ce4c614aef8..7e1c28182138aeba163e50f5b7ed60812c1dfe27
>>  
>> 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -3925,7 +3925,15 @@ same as for the @option{-mcpu=} command-line 
>> option.
>>  @cindex @code{sign-return-address} function attribute, AArch64
>>  Select the function scope on which return address signing will be 
>> applied.  The
>>  behavior and permissible arguments are the same as for the 
>> command-line option
>> -@option{-msign-return-address=}.  The default value is @code{none}.
>> +@option{-msign-return-address=}.  The default value is @code{none}.  
>> This
>> +attribute is @code{deprecated}.  The @code{branch-protection} 
>> attribute should
>> +be used instead.
> 
> s/@code{deprecated}/deprecated/
> 
> The patch is OK with that tweak.

Thanks. I have made the change and committed as r270119.

Sudi
> 
> -Sandra

[PATCH, GCC, DOCS, AArch64] Add missing documenation for mbranch-protection

2019-04-02 Thread Sudakshina Das

Hi

This patch add the missing documentation bits for -mbranch-protection in 
both extend.texi and invoke.texi.

Is this ok for trunk?

Sudi

*** gcc/ChangeLog ***

2019-xx-xx  Sudakshina Das  

* doc/extend.texi: Add deprecated comment on sign-return-address
function attribute and add mbranch-protection.
* doc/invoke.texi: Add bti to the options for mbranch-protection.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ef7adb6a9c0fe1abd769e237fd8d0ce4c614aef8..7e1c28182138aeba163e50f5b7ed60812c1dfe27 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3925,7 +3925,15 @@ same as for the @option{-mcpu=} command-line option.
 @cindex @code{sign-return-address} function attribute, AArch64
 Select the function scope on which return address signing will be applied.  The
 behavior and permissible arguments are the same as for the command-line option
-@option{-msign-return-address=}.  The default value is @code{none}.
+@option{-msign-return-address=}.  The default value is @code{none}.  This
+attribute is @code{deprecated}.  The @code{branch-protection} attribute should
+be used instead.
+
+@item branch-protection
+@cindex @code{branch-protection} function attribute, AArch64
+Select the function scope on which branch protection will be applied.  The
+behavior and permissible arguments are the same as for the command-line option
+@option{-mbranch-protection=}.  The default value is @code{none}.
 
 @end table
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 029b33a688060a558bb7b78312f090c64e6d0a4a..27b51aaab99680180f46383e5a4b22f7f3ceea91 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -632,7 +632,7 @@ Objective-C and Objective-C++ Dialects}.
 -mlow-precision-recip-sqrt  -mlow-precision-sqrt  -mlow-precision-div @gol
 -mpc-relative-literal-loads @gol
 -msign-return-address=@var{scope} @gol
--mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}] @gol
+-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]|@var{bti} @gol
 -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
 -moverride=@var{string}  -mverbose-cost-dump @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
@@ -15884,7 +15884,7 @@ functions, and @samp{all}, which enables pointer signing for all functions.  The
 default value is @samp{none}. This option has been deprecated by
 -mbranch-protection.
 
-@item -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]
+@item -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]|@var{bti}
 @opindex mbranch-protection
 Select the branch protection features to use.
 @samp{none} is the default and turns off all types of branch protection.

Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.

2019-04-02 Thread Sudakshina Das

Hi

On 02/04/2019 03:27, H.J. Lu wrote:
> On Tue, Apr 2, 2019 at 10:05 AM Richard Henderson  wrote:
>>
>> On 4/1/19 8:53 PM, Sudakshina Das wrote:
>>>> This could stand to use a comment, a moment's thinking about the sizes, 
>>>> and to
>>>> use the existing asm output functions.
>>>>
>>>>   /* PT_NOTE header: namesz, descsz, type.
>>>>  namesz = 4 ("GNU\0")
>>>>  descsz = 12 (see below)
>>> I was trying out these changes but the descsz of 12 gets rejected by
>>> readelf. It hits the following
>>>
>>> unsigned intsize = is_32bit_elf ? 4 : 8;
>>>
>>> printf (_("  Properties: "));
>>>
>>> if (pnote->descsz < 8 || (pnote->descsz % size) != 0)
>>>   {
>>> printf (_("\n"),
>>> pnote->descsz);
>>> return;
>>>   }
>>
>> Hmm, interesting.  The docs say that padding is not to be included in descsz
>> (gabi4.1, page 82).  To my eye this is a bug in binutils, but perhaps we will
>> have to live with it.
>>
>> Nick, thoughts?
> 
> descsz is wrong.  From:
> 
> https://github.com/hjl-tools/linux-abi/wiki/Linux-Extensions-to-gABI
> 
> n_desc The note descriptor. The first n_descsz bytes in n_desc is the pro-
> gram property array.
> 
> The program property array
> Each array element represents one program property with type, data
> size and data.
> In 64-bit objects, each element is an array of 8-byte integers in the
> format of the
> target processor. In 32-bit objects, each element is an array of
> 4-byte integers in
> the format of the target processor.

Thanks @HJ for clarifying that. I should have been more careful in 
spotting the difference.

@Richard I will update my patch according to your suggestions but 
keeping in mind decssz should be the size of the entire program property 
array so 16 in this case.

Thanks
Sudi
> 
>

Re: [PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9

2019-04-01 Thread Sudakshina Das

Hi James

On 29/03/2019 13:41, Sudakshina Das wrote:
> Hi James
> 
> On 22/03/2019 16:25, James Greenhalgh wrote:
>> On Wed, Mar 20, 2019 at 10:17:41AM +0000, Sudakshina Das wrote:
>>> Hi Kyrill
>>>
>>> On 12/03/2019 12:03, Kyrill Tkachov wrote:
>>>> Hi Sudi,
>>>>
>>>> On 2/22/19 10:45 AM, Sudakshina Das wrote:
>>>>> Hi
>>>>>
>>>>> This patch documents the addition of the new Armv8.5-A and 
>>>>> corresponding
>>>>> extensions in the gcc-9/changes.html.
>>>>> As per https://gcc.gnu.org/about.html, I have used W3 Validator.
>>>>> Is this ok for cvs?
>>>>>
>>>>> Thanks
>>>>> Sudi
>>>>
>>>>
>>>> Index: htdocs/gcc-9/changes.html
>>>> ===
>>>> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
>>>> retrieving revision 1.43
>>>> diff -u -r1.43 changes.html
>>>> --- htdocs/gcc-9/changes.html    21 Feb 2019 10:32:55 -    1.43
>>>> +++ htdocs/gcc-9/changes.html    21 Feb 2019 18:25:09 -
>>>> @@ -283,6 +283,19 @@
>>>>    
>>>>    The intrinsics are defined by the ACLE specification.
>>>>      
>>>> +  
>>>> +    The Armv8.5-A architecture is now supported. This can be used by
>>>> specifying the
>>>> +   -march=armv8.5-a option.
>>>>
>>>>
>>>> I tend to prefer the wording "... is now supported through the
>>>> -march=armv8.5-a option".
>>>> Otherwise it reads as the compiler "using" the architecture, whereas we
>>>> usually talk about "targeting" an architecture.
>>>>
>>>> +  
>>>> +   The Armv8.5-A architecture also adds some security features 
>>>> that
>>>> are optional to all older
>>>> +    architecture versions. These are also supported now and only 
>>>> effect
>>>> the assembler.
>>>> +    
>>>> +     Speculation Barrier instruction using
>>>> -march=armv8-a+sb.
>>>> +     Execution and Data Prediction Restriction instructions using
>>>> -march=armv8-a+predres.
>>>> +     Speculative Store Bypass Safe instruction using
>>>> -march=armv8-a+ssbs. This does not
>>>> + require a compiler option for Arm and thus
>>>> -march=armv8-a+ssbs is a AArch64 specific option.
>>>>
>>>> "AArch64-specific"
>>>>
>>>>
>>>> LGTM otherwise.
>>>> Thanks,
>>>> Kyrill
>>>
>>> Thanks for the review and sorry for the delay in response. I had edited
>>> the language for adding new options in a few other places as well.
>>>
>>> +   The Armv8.5-A architecture also adds some security features 
>>> that are
>>> +    optional to all older architecture versions. These are also 
>>> supported now
>>
>> s/also supported now/now supported/
>>
>>> +    and only effect the assembler.
>>
>> s/effect/affect/
>>
>>> +    
>>> +     Speculation Barrier instruction through the
>>> + -march=armv8-a+sb option.
>>> +     Execution and Data Prediction Restriction instructions through
>>> + the -march=armv8-a+predres option.
>>> +     Speculative Store Bypass Safe instruction through the
>>> + -march=armv8-a+ssbs option. This does not 
>>> require a
>>> + compiler option for Arm and thus 
>>> -march=armv8-a+ssbs
>>> + is an AArch64-specific option.
>>> +    
>>> +  
>>>   
>>>   AArch64 specific
>>> @@ -362,6 +380,23 @@
>>>   The default value is 16 (64Kb) and can be changed at configure
>>>   time using the flag 
>>> --with-stack-clash-protection-guard-size=12|16.
>>>     
>>> +  
>>> +    The option -msign-return-address= has been 
>>> deprecated. This
>>> +    has been replaced by the new -mbranch-protection= 
>>> option. This
>>> +    new option can now be used to enable the return address signing 
>>> as well as
>>> +    the new Branch Target Identification feature of Armv8.5-A 
>>> architecture. For
>>> +    more information on the arguments accepted by this option, 
>>> please refer to
>>> + >> href="https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;>AArch64-Options.
>>>  
>>>
>>> +  
>>> +   The following optional extensions to Armv8.5-A architecture 
>>> are also
>>> +   supported now and only effect the assembler.
>>
>> s/effect/affect/
>>
>>> +    
>>> +     Random Number Generation instructions through the
>>> + -march=armv8.5-a+rng option.
>>> +     Memory Tagging Extension through the
>>> + -march=armv8.5-a+memtag option.
>>> +    
>>> +  
>>>   
>>>   Arm specific
>>
>> Otherwise, OK by me but feel free to wait for people with gooder
>> grammar than me to have their say.
>>
> 
> Thanks for spotting those. So far no one else with gooder grammar has 
> pointed out anything else. I will commit the patch with the changes you 
> suggested on Monday if no one else has any other objections.
> 

Committed as 1.56

Thanks
Sudi

> Thanks
> Sudi
> 
>> Thanks,
>> James
>>
>

Re: [PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.

2019-04-01 Thread Sudakshina Das

Hi Richard

Thanks for the comments and pointing out the much cleaner existing asm 
output functions!

On 29/03/2019 17:51, Richard Henderson wrote:
>> +#define ASM_LONG "\t.long\t"
> 
> Do not replicate targetm.asm_out.aligned_op.si, or integer_asm_op, really.
> 
>> +aarch64_file_end_indicate_exec_stack ()
>> +{
>> +  file_end_indicate_exec_stack ();
>> +
>> +  if (!aarch64_bti_enabled ()
>> +  && aarch64_ra_sign_scope == AARCH64_FUNCTION_NONE)
>> +{
>> +  return;
>> +}
> 
> This is redundant with...
> 
>> +
>> +  unsigned feature_1_and = 0;
>> +  if (aarch64_bti_enabled ())
>> +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI;
>> +
>> +  if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE)
>> +feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
>> +
>> +  if (feature_1_and)
> 
> ... this.  I prefer the second, as it's obvious.
> 
>> +  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
>> +  /* name length.  */
>> +  fprintf (asm_out_file, ASM_LONG " 1f - 0f\n");
>> +  /* data length.  */
>> +  fprintf (asm_out_file, ASM_LONG " 4f - 1f\n");
>> +  /* note type: NT_GNU_PROPERTY_TYPE_0.  */
>> +  fprintf (asm_out_file, ASM_LONG " 5\n");
>> +  fprintf (asm_out_file, "0:\n");
>> +  /* vendor name: "GNU".  */
>> +  fprintf (asm_out_file, STRING_ASM_OP " \"GNU\"\n");
>> +  fprintf (asm_out_file, "1:\n");
>> +  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
>> +  /* pr_type: GNU_PROPERTY_AARCH64_FEATURE_1_AND.  */
>> +  fprintf (asm_out_file, ASM_LONG " 0x%x\n",
>> +   GNU_PROPERTY_AARCH64_FEATURE_1_AND);
>> +  /* pr_datasz.  */\
>> +  fprintf (asm_out_file, ASM_LONG " 3f - 2f\n");
>> +  fprintf (asm_out_file, "2:\n");
>> +  /* GNU_PROPERTY_AARCH64_FEATURE_1_XXX.  */
>> +  fprintf (asm_out_file, ASM_LONG " 0x%x\n", feature_1_and);
>> +  fprintf (asm_out_file, "3:\n");
>> +  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
>> +  fprintf (asm_out_file, "4:\n");
> 
> This could stand to use a comment, a moment's thinking about the sizes, and to
> use the existing asm output functions.
> 
>   /* PT_NOTE header: namesz, descsz, type.
>  namesz = 4 ("GNU\0")
>  descsz = 12 (see below)

I was trying out these changes but the descsz of 12 gets rejected by 
readelf. It hits the following

   unsigned intsize = is_32bit_elf ? 4 : 8;

   printf (_("  Properties: "));

   if (pnote->descsz < 8 || (pnote->descsz % size) != 0)
 {
   printf (_("\n"), 
pnote->descsz);
   return;
 }

Thanks
Sudi

>  type   = 5 (NT_GNU_PROPERTY_TYPE_0).  */
>   assemble_align (POINTER_SIZE);
>   assemble_integer (GEN_INT (4), 4, 32, 1);
>   assemble_integer (GEN_INT (12), 4, 32, 1);
>   assemble_integer (GEN_INT (5), 4, 32, 1);
> 
>   /* PT_NOTE name */
>   assemble_string ("GNU", 4);
> 
>   /* PT_NOTE contents for NT_GNU_PROPERTY_TYPE_0:
>  type   = 0xc000 (GNU_PROPERTY_AARCH64_FEATURE_1_AND),
> datasz = 4
> data   = feature_1_and
> Note that the current section offset is 16,
> and there has been no padding so far.  */
>   assemble_integer (GEN_INT (0xc000), 4, 32, 1);
>   assemble_integer (GEN_INT (4), 4, 32, 1);
>   assemble_integer (GEN_INT (feature_1_and), 4, 32, 1);
> 
>   /* Pad the size of the note to the required alignment. */
>   assemble_align (POINTER_SIZE);
> 
> 
> r~
>

[PATCH, GCC, AARCH64] Add GNU note section with BTI and PAC.

2019-03-29 Thread Sudakshina Das

Hi

This patch adds the GNU NOTE section to the BTI and/or PAC
enabled objects for linux targets.
The ABI document that we published mentioning GNU NOTE section is below:
https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2018q4

The patches needed for these in binutils are already approved and committed.
https://sourceware.org/ml/binutils/2019-03/msg00072.html

Bootstrapped and regression tested with aarch64-none-linux-gnu.
Is this ok for trunk?

Thanks
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Define for
AArch64.
(aarch64_file_end_indicate_exec_stack): Add gnu note section.

gcc/testsuite/ChangeLog:

2018-xx-xx  Sudakshina Das  

* gcc.target/aarch64/bti-1.c: Add scan directive for gnu note section
for linux targets.
* gcc.target/aarch64/va_arg_1.c: Don't run for aarch64 linux
targets 
with --enable-standard-branch-protection.

diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h
index 9d0292d64f20939ccedd7ab56027aa1282826b23..5e8b34ded03c78493f868e38647bf57c2da5187c 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -83,7 +83,7 @@
 
 #define GNU_USER_TARGET_D_CRITSEC_SIZE 48
 
-#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
+#define TARGET_ASM_FILE_END aarch64_file_end_indicate_exec_stack
 
 /* Uninitialized common symbols in non-PIE executables, even with
strong definitions in dependent shared libraries, will resolve
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b38505b0872688634b2d3f625ab8d313e89cfca0..d616c8360b396ebe3ab2ac0fb799b30830df2b3e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18744,6 +18744,67 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
+/* Implement TARGET_ASM_FILE_END for AArch64.  This adds the AArch64 GNU NOTE
+   section at the end if needed.  */
+#define ASM_LONG "\t.long\t"
+#define GNU_PROPERTY_AARCH64_FEATURE_1_AND	0xc000
+#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI	(1U << 0)
+#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC	(1U << 1)
+void
+aarch64_file_end_indicate_exec_stack ()
+{
+  file_end_indicate_exec_stack ();
+
+  if (!aarch64_bti_enabled ()
+  && aarch64_ra_sign_scope == AARCH64_FUNCTION_NONE)
+{
+  return;
+}
+
+  unsigned feature_1_and = 0;
+  if (aarch64_bti_enabled ())
+feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI;
+
+  if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE)
+feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
+
+  if (feature_1_and)
+{
+  int p2align = ptr_mode == SImode ? 2 : 3;
+
+  /* Generate GNU_PROPERTY_AARCH64_FEATURE_1_XXX.  */
+  switch_to_section (get_section (".note.gnu.property",
+  SECTION_NOTYPE, NULL));
+
+  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
+  /* name length.  */
+  fprintf (asm_out_file, ASM_LONG " 1f - 0f\n");
+  /* data length.  */
+  fprintf (asm_out_file, ASM_LONG " 4f - 1f\n");
+  /* note type: NT_GNU_PROPERTY_TYPE_0.  */
+  fprintf (asm_out_file, ASM_LONG " 5\n");
+  fprintf (asm_out_file, "0:\n");
+  /* vendor name: "GNU".  */
+  fprintf (asm_out_file, STRING_ASM_OP " \"GNU\"\n");
+  fprintf (asm_out_file, "1:\n");
+  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
+  /* pr_type: GNU_PROPERTY_AARCH64_FEATURE_1_AND.  */
+  fprintf (asm_out_file, ASM_LONG " 0x%x\n",
+	   GNU_PROPERTY_AARCH64_FEATURE_1_AND);
+  /* pr_datasz.  */\
+  fprintf (asm_out_file, ASM_LONG " 3f - 2f\n");
+  fprintf (asm_out_file, "2:\n");
+  /* GNU_PROPERTY_AARCH64_FEATURE_1_XXX.  */
+  fprintf (asm_out_file, ASM_LONG " 0x%x\n", feature_1_and);
+  fprintf (asm_out_file, "3:\n");
+  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
+  fprintf (asm_out_file, "4:\n");
+}
+}
+#undef GNU_PROPERTY_AARCH64_FEATURE_1_PAC
+#undef GNU_PROPERTY_AARCH64_FEATURE_1_BTI
+#undef GNU_PROPERTY_AARCH64_FEATURE_1_AND
+#undef ASM_LONG
 
 /* Target-specific selftests.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/bti-1.c b/gcc/testsuite/gcc.target/aarch64/bti-1.c
index a8c60412e310a4f322372f334ae5314f426d310e..5a556b08ed15679b25676a11fe9c7a64641ee671 100644
--- a/gcc/testsuite/gcc.target/aarch64/bti-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/bti-1.c
@@ -61,3 +61,4 @@ lab2:
 }
 /* { dg-final { scan-assembler-times "hint\t34" 1 } } */
 /* { dg-final { scan-assembler-times "hint\t36" 12 } } */
+/* { dg-final { scan-assembler ".note.gnu.property" { target *-*-linux* } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/va_arg_1.c b/gcc/testsuite/gcc.target/aarch64/va_arg_1.c
index e8e

Re: [PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9

2019-03-29 Thread Sudakshina Das

Hi James

On 22/03/2019 16:25, James Greenhalgh wrote:
> On Wed, Mar 20, 2019 at 10:17:41AM +0000, Sudakshina Das wrote:
>> Hi Kyrill
>>
>> On 12/03/2019 12:03, Kyrill Tkachov wrote:
>>> Hi Sudi,
>>>
>>> On 2/22/19 10:45 AM, Sudakshina Das wrote:
>>>> Hi
>>>>
>>>> This patch documents the addition of the new Armv8.5-A and corresponding
>>>> extensions in the gcc-9/changes.html.
>>>> As per https://gcc.gnu.org/about.html, I have used W3 Validator.
>>>> Is this ok for cvs?
>>>>
>>>> Thanks
>>>> Sudi
>>>
>>>
>>> Index: htdocs/gcc-9/changes.html
>>> ===
>>> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
>>> retrieving revision 1.43
>>> diff -u -r1.43 changes.html
>>> --- htdocs/gcc-9/changes.html    21 Feb 2019 10:32:55 -    1.43
>>> +++ htdocs/gcc-9/changes.html    21 Feb 2019 18:25:09 -
>>> @@ -283,6 +283,19 @@
>>>    
>>>    The intrinsics are defined by the ACLE specification.
>>>      
>>> +  
>>> +    The Armv8.5-A architecture is now supported. This can be used by
>>> specifying the
>>> +   -march=armv8.5-a option.
>>>
>>>
>>> I tend to prefer the wording "... is now supported through the
>>> -march=armv8.5-a option".
>>> Otherwise it reads as the compiler "using" the architecture, whereas we
>>> usually talk about "targeting" an architecture.
>>>
>>> +  
>>> +   The Armv8.5-A architecture also adds some security features that
>>> are optional to all older
>>> +    architecture versions. These are also supported now and only effect
>>> the assembler.
>>> +    
>>> +     Speculation Barrier instruction using
>>> -march=armv8-a+sb.
>>> +     Execution and Data Prediction Restriction instructions using
>>> -march=armv8-a+predres.
>>> +     Speculative Store Bypass Safe instruction using
>>> -march=armv8-a+ssbs. This does not
>>> + require a compiler option for Arm and thus
>>> -march=armv8-a+ssbs is a AArch64 specific option.
>>>
>>> "AArch64-specific"
>>>
>>>
>>> LGTM otherwise.
>>> Thanks,
>>> Kyrill
>>
>> Thanks for the review and sorry for the delay in response. I had edited
>> the language for adding new options in a few other places as well.
>>
>> +   The Armv8.5-A architecture also adds some security features that are
>> +optional to all older architecture versions. These are also supported 
>> now
> 
> s/also supported now/now supported/
> 
>> +and only effect the assembler.
> 
> s/effect/affect/
> 
>> +
>> + Speculation Barrier instruction through the
>> + -march=armv8-a+sb option.
>> + Execution and Data Prediction Restriction instructions through
>> + the -march=armv8-a+predres option.
>> + Speculative Store Bypass Safe instruction through the
>> + -march=armv8-a+ssbs option. This does not require a
>> + compiler option for Arm and thus -march=armv8-a+ssbs
>> + is an AArch64-specific option.
>> +
>> +  
>>   
>>   
>>   AArch64 specific
>> @@ -362,6 +380,23 @@
>>   The default value is 16 (64Kb) and can be changed at configure
>>   time using the flag 
>> --with-stack-clash-protection-guard-size=12|16.
>> 
>> +  
>> +The option -msign-return-address= has been deprecated. This
>> +has been replaced by the new -mbranch-protection= option. 
>> This
>> +new option can now be used to enable the return address signing as well 
>> as
>> +the new Branch Target Identification feature of Armv8.5-A architecture. 
>> For
>> +more information on the arguments accepted by this option, please refer 
>> to
>> + > href="https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;>AArch64-Options.
>> +  
>> +   The following optional extensions to Armv8.5-A architecture are also
>> +   supported now and only effect the assembler.
> 
> s/effect/affect/
> 
>> +
>> + Random Number Generation instructions through the
>> + -march=armv8.5-a+rng option.
>> + Memory Tagging Extension through the
>> + -march=armv8.5-a+memtag option.
>> +
>> +  
>>   
>>   
>>   Arm specific
> 
> Otherwise, OK by me but feel free to wait for people with gooder
> grammar than me to have their say.
> 

Thanks for spotting those. So far no one else with gooder grammar has 
pointed out anything else. I will commit the patch with the changes you 
suggested on Monday if no one else has any other objections.

Thanks
Sudi

> Thanks,
> James
>

Re: [PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9

2019-03-20 Thread Sudakshina Das

Hi Kyrill

On 12/03/2019 12:03, Kyrill Tkachov wrote:
> Hi Sudi,
> 
> On 2/22/19 10:45 AM, Sudakshina Das wrote:
>> Hi
>>
>> This patch documents the addition of the new Armv8.5-A and corresponding
>> extensions in the gcc-9/changes.html.
>> As per https://gcc.gnu.org/about.html, I have used W3 Validator.
>> Is this ok for cvs?
>>
>> Thanks
>> Sudi
> 
> 
> Index: htdocs/gcc-9/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
> retrieving revision 1.43
> diff -u -r1.43 changes.html
> --- htdocs/gcc-9/changes.html    21 Feb 2019 10:32:55 -    1.43
> +++ htdocs/gcc-9/changes.html    21 Feb 2019 18:25:09 -
> @@ -283,6 +283,19 @@
>   
>   The intrinsics are defined by the ACLE specification.
>     
> +  
> +    The Armv8.5-A architecture is now supported. This can be used by 
> specifying the
> +   -march=armv8.5-a option.
> 
> 
> I tend to prefer the wording "... is now supported through the 
> -march=armv8.5-a option".
> Otherwise it reads as the compiler "using" the architecture, whereas we 
> usually talk about "targeting" an architecture.
> 
> +  
> +   The Armv8.5-A architecture also adds some security features that 
> are optional to all older
> +    architecture versions. These are also supported now and only effect 
> the assembler.
> +    
> +     Speculation Barrier instruction using 
> -march=armv8-a+sb.
> +     Execution and Data Prediction Restriction instructions using 
> -march=armv8-a+predres.
> +     Speculative Store Bypass Safe instruction using 
> -march=armv8-a+ssbs. This does not
> + require a compiler option for Arm and thus 
> -march=armv8-a+ssbs is a AArch64 specific option.
> 
> "AArch64-specific"
> 
> 
> LGTM otherwise.
> Thanks,
> Kyrill

Thanks for the review and sorry for the delay in response. I had edited 
the language for adding new options in a few other places as well.

Thanks
Sudi

> 
> +    
> +  
>   
> 
>   AArch64 specific
> @@ -298,6 +311,22 @@
>   The default value is 16 (64Kb) and can be changed at configure
>   time using the flag 
> --with-stack-clash-protection-guard-size=12|16.
>     
> +  
> +    The option -msign-return-address= has been deprecated. 
> This has been replaced
> +    by the new -mbranch-protection= option. This new 
> option can now be used to
> +    enable the return address signing as well as the new Branch Target 
> Identification
> +    feature of Armv8.5-A architecture. For more information on the 
> arguments accepted by
> +    this option, please refer to
> +  href="https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;>
>  
> 
> +    AArch64-Options.
> +  
> +   The following optional extensions to Armv8.5-A architecture are 
> also supported now and
> +   only effect the assembler.
> +    
> +     Random Number Generation instructions using 
> -march=armv8.5-a+rng.
> +     Memory Tagging Extension using 
> -march=armv8.5-a+memtag.
> +    
> +  
>   
> 
>   Arm specific
> 

Index: htdocs/gcc-9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.52
diff -u -r1.52 changes.html
--- htdocs/gcc-9/changes.html	7 Mar 2019 14:40:06 -	1.52
+++ htdocs/gcc-9/changes.html	18 Mar 2019 18:55:24 -
@@ -342,6 +342,24 @@
 
 The intrinsics are defined by the ACLE specification.
   
+  
+The Armv8.5-A architecture is now supported through the
+-march=armv8.5-a option.
+  
+   The Armv8.5-A architecture also adds some security features that are
+optional to all older architecture versions. These are also supported now
+and only effect the assembler.
+
+	 Speculation Barrier instruction through the
+	 -march=armv8-a+sb option.
+	 Execution and Data Prediction Restriction instructions through
+	 the -march=armv8-a+predres option.
+	 Speculative Store Bypass Safe instruction through the
+	 -march=armv8-a+ssbs option. This does not require a
+	 compiler option for Arm and thus -march=armv8-a+ssbs
+	 is an AArch64-specific option.
+
+  
 
 
 AArch64 specific
@@ -362,6 +380,23 @@
 The default value is 16 (64Kb) and can be changed at configure
 time using the flag --with-stack-clash-protection-guard-size=12|16.
   
+  
+The option -msign-return-address= has been deprecated. This
+has been replaced by the new -mbranch-protection= option. This
+new option can now be used to enable the return address signing as well as
+the new Branch Target Identification feature of A

Re: [PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9

2019-03-06 Thread Sudakshina Das

Pinging and adding Gerald to the CC list.

On 22/02/2019 10:45, Sudakshina Das wrote:
> Hi
> 
> This patch documents the addition of the new Armv8.5-A and corresponding 
> extensions in the gcc-9/changes.html.
> As per https://gcc.gnu.org/about.html, I have used W3 Validator.
> Is this ok for cvs?
> 
> Thanks
> Sudi

Re: [PATCH, GCC, AArch64] Fix a couple of bugs in BTI

2019-02-22 Thread Sudakshina Das

On 21/02/2019 22:52, James Greenhalgh wrote:
> On Thu, Feb 21, 2019 at 06:19:10AM -0600, Sudakshina Das wrote:
>> Hi
>>
>> While doing more testing I found a couple of issues with my BTI patches.
>> This patch fixes them:
>> 1) Remove a reference to return address key. The original patch was
>> written based on a different not yet committed patch ([PATCH
>> 3/3][GCC][AARCH64] Add support for pointer authentication B key) and I
>> missed out on cleaning this up. This is hidden behind the configuration
>> option and thus went unnoticed.
>> 2) Add a missed case for adding the BTI instruction in thunk functions.
>>
>> Bootstrapped on aarch64-none-linux-gnu and regression tested on
>> aarch64-none-elf with configuration turned on.
> 
> OK.
> 

Thanks committed as r269112.
Sudi

> Thanks,
> James
> 
>>
>> gcc/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  
>>
>>  * config/aarch64/aarch64.c (aarch64_output_mi_thunk): Add bti
>>  instruction if enabled.
>>  (aarch64_override_options): Remove reference to return address
>>  key.
>>
>>
>> Is this ok for trunk?
>> Sudi
>

[PATCH, wwwdocs] Mention -march=armv8.5-a and other new command line options for AArch64 and Arm for GCC 9

2019-02-22 Thread Sudakshina Das

Hi

This patch documents the addition of the new Armv8.5-A and corresponding 
extensions in the gcc-9/changes.html.
As per https://gcc.gnu.org/about.html, I have used W3 Validator.
Is this ok for cvs?

Thanks
Sudi
Index: htdocs/gcc-9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.43
diff -u -r1.43 changes.html
--- htdocs/gcc-9/changes.html	21 Feb 2019 10:32:55 -	1.43
+++ htdocs/gcc-9/changes.html	21 Feb 2019 18:25:09 -
@@ -283,6 +283,19 @@
 
 The intrinsics are defined by the ACLE specification.
   
+  
+The Armv8.5-A architecture is now supported. This can be used by specifying the
+   -march=armv8.5-a option.
+  
+   The Armv8.5-A architecture also adds some security features that are optional to all older
+architecture versions. These are also supported now and only effect the assembler.
+
+	 Speculation Barrier instruction using -march=armv8-a+sb.
+	 Execution and Data Prediction Restriction instructions using -march=armv8-a+predres.
+	 Speculative Store Bypass Safe instruction using -march=armv8-a+ssbs. This does not
+	 require a compiler option for Arm and thus -march=armv8-a+ssbs is a AArch64 specific option.
+
+  
 
 
 AArch64 specific
@@ -298,6 +311,22 @@
 The default value is 16 (64Kb) and can be changed at configure
 time using the flag --with-stack-clash-protection-guard-size=12|16.
   
+  
+The option -msign-return-address= has been deprecated. This has been replaced
+by the new -mbranch-protection= option. This new option can now be used to
+enable the return address signing as well as the new Branch Target Identification
+feature of Armv8.5-A architecture. For more information on the arguments accepted by
+this option, please refer to
+ https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;>
+	AArch64-Options.
+  
+   The following optional extensions to Armv8.5-A architecture are also supported now and
+   only effect the assembler.
+
+	 Random Number Generation instructions using -march=armv8.5-a+rng.
+	 Memory Tagging Extension using -march=armv8.5-a+memtag.
+
+  
 
 
 Arm specific

[PATCH, GCC, AArch64] Fix a couple of bugs in BTI

2019-02-21 Thread Sudakshina Das

Hi

While doing more testing I found a couple of issues with my BTI patches. 
This patch fixes them:
1) Remove a reference to return address key. The original patch was 
written based on a different not yet committed patch ([PATCH 
3/3][GCC][AARCH64] Add support for pointer authentication B key) and I 
missed out on cleaning this up. This is hidden behind the configuration 
option and thus went unnoticed.
2) Add a missed case for adding the BTI instruction in thunk functions.

Bootstrapped on aarch64-none-linux-gnu and regression tested on 
aarch64-none-elf with configuration turned on.

gcc/ChangeLog:

2019-xx-xx  Sudakshina Das  

* config/aarch64/aarch64.c (aarch64_output_mi_thunk): Add bti
instruction if enabled.
(aarch64_override_options): Remove reference to return address
key.


Is this ok for trunk?
Sudi
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9f52cc9..7d9824a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5980,6 +5980,9 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
   rtx this_rtx, temp0, temp1, addr, funexp;
   rtx_insn *insn;
 
+  if (aarch64_bti_enabled ())
+emit_insn (gen_bti_c());
+
   reload_completed = 1;
   emit_note (NOTE_INSN_PROLOGUE_END);
 
@@ -12032,7 +12035,6 @@ aarch64_override_options (void)
 {
 #ifdef TARGET_ENABLE_PAC_RET
   aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF;
-  aarch64_ra_sign_key = AARCH64_KEY_A;
 #else
   aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE;
 #endif

Re: [PATCH 2/2][GCC][ARM] Implement hint intrinsics for ARM

2019-01-11 Thread Sudakshina Das

Hi Srinath

On 10/01/19 19:20, Srinath Parvathaneni wrote:
> Hi All,
> 
> This patch implements the ACLE hint intrinsics (nop,yield,wfe,wfi,sev
> and sevl), for all ARM targets.
> 
> The intrinsics specification will be published on the Arm website [1].
> 
> [1]
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf
> 
> Bootstrapped on arm-none-linux-gnueabihf, regression tested on
> arm-none-eabi with no regressions and
> ran the added tests for arm, thumb-1 and thumb-2 modes.
> 
> Ok for trunk? If ok, could someone commit the patch on my behalf, I
> don't have commit rights.
> 
> Thanks,
> Srinath
> 
> gcc/ChangeLog:
> 
> 2019-01-10  Srinath Parvathaneni  
> 
>   * config/arm/arm-builtins.c (NOP_QUALIFIERS): New qualifier.
>   (arm_expand_builtin_args): New case.
>   * config/arm/arm.md (yield): New pattern name.
>   (wfe): Likewise.
>   (wfi): Likewise.
>   (sev): Likewise.
>   (sevl): Likewise.
>   * config/arm/arm_acle.h (__nop ): New inline function.
>   (__yield): Likewise.
>   (__sev): Likewise.
>   (__sevl): Likewise.
>   (__wfi): Likewise.
>   (__wfe): Likewise.
>   * config/arm/arm_acle_builtins.def (VAR1):
>   (nop): New builtin definitions.
>   (yield): Likewise.
>   (sev): Likewise.
>   (sevl): Likewise.
>   (wfi): Likewise.
>   (wfe): Likewise.
>   * config/arm/unspecs.md (unspecv):
>   (VUNSPEC_YIELD): New volatile unspec.
>   (VUNSPEC_SEV): Likewise.
>   (VUNSPEC_SEVL): Likewise.
>   (VUNSPEC_WFI): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-01-10  Srinath Parvathaneni  
> 
>   * gcc.target/arm/acle/nop.c: New test.
>   * gcc.target/arm/acle/sev-1.c: Likewise.
>   * gcc.target/arm/acle/sev-2.c: Likewise.
>   * gcc.target/arm/acle/sev-3.c: Likewise.
>   * gcc.target/arm/acle/sevl-1.c: Likewise.
>   * gcc.target/arm/acle/sevl-2.c: Likewise.
>   * gcc.target/arm/acle/sevl-3.c: Likewise.
>   * gcc.target/arm/acle/wfe-1.c: Likewise.
>   * gcc.target/arm/acle/wfe-2.c: Likewise.
>   * gcc.target/arm/acle/wfe-3.c: Likewise.
>   * gcc.target/arm/acle/wfi-1.c: Likewise.
>   * gcc.target/arm/acle/wfi-2.c: Likewise.
>   * gcc.target/arm/acle/wfi-3.c: Likewise.
>   * gcc.target/arm/acle/yield-1.c: Likewise.
>   * gcc.target/arm/acle/yield-2.c: Likewise.
>   * gcc.target/arm/acle/yield-3.c: Likewise.
> 

Thanks for doing this and I am not a maintainer. I do have a few questions:

...

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
f6196e9316898e3258e08d8f2ece8fe9640676ca..36b24cfdfa6c61d952a5c704f54d37f2b0fdd34e
 
100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8906,6 +8906,76 @@
 (set_attr "type" "mov_reg")]
  )

+(define_insn "yield"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_YIELD)]
+  ""
+{
+  if (TARGET_ARM)
+return ".inst\t0xe320f001\t//yield";
+  else if(TARGET_THUMB2)

There should be a space after the if. Likewise for all the other 
instructions.

+return ".inst\t0xf3af8001\t//yield";
+  else /* TARGET_THUMB1 */
+return ".inst\t0xbf10\t//yield";
+}
+  [(set_attr "type" "coproc")]

Can you please explain the coproc attribute. Also I think maybe you can 
use the "length" attribute here. Likewise for all the other instructions.

Finally, for the tests why not combine the tests like the AArch64 patch 
where all the intrinsics were tested in the same file with common 
testing options? You could have only three new files for all the testing?

Thanks
Sudi

+)
+

> 
> 
>

Re: [PATCH 1/2][GCC][AArch64] Implement hint intrinsics for AArch64

2019-01-11 Thread Sudakshina Das

Hi Srinath

On 10/01/19 19:20, Srinath Parvathaneni wrote:
> Hi All,
> 
> This patch implements the ACLE hint intrinsics (nop, yield, wfe, wfi,
> sev and sevl), for AArch64.
> 
> The instructions are documented in the ArmARM[1] and the intrinsics
> specification will be
> published on the Arm website [2].
> 
> [1]
> https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
> [2]
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf
> 
> Bootstrapped on aarch64-none-linux-gnu and regression tested on
> aarch64-none-elf with no regressions.
> 
> Ok for trunk? If ok, could someone commit the patch on my behalf, I
> don't have commit rights.
> 
> Thanks,
> Srinath
> 
> gcc/ChangeLog:
> 
> 2019-01-10  Srinath Parvathaneni  
> 
>   * config/aarch64/aarch64.md (yield): New pattern name.
>   (wfe): Likewise.
>   (wfi): Likewise.
>   (sev): Likewise.
>   (sevl): Likewise.
>   (UNSPECV_YIELD): New volatile unspec.
>   (UNSPECV_WFE): Likewise.
>   (UNSPECV_WFI): Likewise.
>   (UNSPECV_SEV): Likewise.
>   (UNSPECV_SEVL): Likewise.
>   * config/aarch64/aarch64-builtins.c (aarch64_builtins):
>   AARCH64_SYSHINTOP_BUILTIN_NOP: New builtin.
>   AARCH64_SYSHINTOP_BUILTIN_YIELD: Likewise.
>   AARCH64_SYSHINTOP_BUILTIN_WFE: Likewise.
>   AARCH64_SYSHINTOP_BUILTIN_WFI: Likewise.
>   AARCH64_SYSHINTOP_BUILTIN_SEV: Likewise.
>   AARCH64_SYSHINTOP_BUILTIN_SEVL: Likewise.
>   (aarch64_init_syshintop_builtins): New function.
>   (aarch64_init_builtins): New call statement.
>   (aarch64_expand_builtin): New case.
>   * config/aarch64/arm_acle.h (__nop ): New inline function.
>   (__yield): Likewise.
>   (__sev): Likewise.
>   (__sevl): Likewise.
>   (__wfi): Likewise.
>   (__wfe): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-01-10  Srinath Parvathaneni  
> 
>   * gcc.target/aarch64/acle/hint-1.c: New test.
>   * gcc.target/aarch64/acle/hint-2.c: Likewise.
> 
> 

Thank you for doing this and I am not a maintainer. I have some comments 
bellow:

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
8cced94567008e28b1761ec8771589a3925f2904..d5424f98df1f5c8f206cbded097bdd2dfcd1ca8e
 
100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -399,6 +399,13 @@ enum aarch64_builtins
AARCH64_PAUTH_BUILTIN_AUTIA1716,
AARCH64_PAUTH_BUILTIN_PACIA1716,
AARCH64_PAUTH_BUILTIN_XPACLRI,
+  /* System Hint Operation Builtins for AArch64.  */
+  AARCH64_SYSHINTOP_BUILTIN_NOP,
+  AARCH64_SYSHINTOP_BUILTIN_YIELD,
+  AARCH64_SYSHINTOP_BUILTIN_WFE,
+  AARCH64_SYSHINTOP_BUILTIN_WFI,
+  AARCH64_SYSHINTOP_BUILTIN_SEV,
+  AARCH64_SYSHINTOP_BUILTIN_SEVL,
AARCH64_BUILTIN_MAX
  };

Is there any reason for the naming? They don't seem to be part of any 
extensions? IMHO AARCH64_BUILTIN_NOP, etc looks cleaner and follows 
other builtins which are not part of any extensions.

...
@@ -1395,6 +1436,29 @@ aarch64_expand_builtin (tree exp,
}

return target;
+case AARCH64_SYSHINTOP_BUILTIN_NOP:
+  emit_insn (GEN_FCN (CODE_FOR_nop) ());
+  return gen_reg_rtx (VOIDmode);
+

Needs a newline before the new case.

...
+(define_insn "yield"
+  [(unspec_volatile [(const_int 0)] UNSPECV_YIELD)]
+  ""
+  "yield"
+  [(set_attr "type" "coproc")]
+)

I don't believe setting the type to coproc in AArch64 is correct. 
Likewise for the other instructions.

...
+/* Test the nop ACLE hint intrinsic */
+/* { dg-do compile } */
+/* { dg-additional-options "-O0" } */
+/* { dg-options "-march=armv8-a" } */
+
+#include "arm_acle.h"
+
+void
+test_hint (void)
+{
+ __nop ();
+}
+
+/* { dg-final { scan-assembler-times "\tnop" 3 } } */

Just curious, why are there 3 nops here?

Thanks
Sudi

> 
> 
> 
>

[Committed, GCC, AArch64] Disable tests for ilp32.

2019-01-10 Thread Sudakshina Das

Hi

Currently Return Address Signing is only supported in lp64. Thus the
tests that I added recently (that enables return address signing by the
mbranch-protection=standard option), should also be exempted from testing in
ilp32. This patch adds the needed dg-require-effective-target directive 
in the
tests.

*** gcc/testsuite/ChangeLog ***

2019-01-10  Sudakshina Das  

* gcc.target/aarch64/bti-1.c: Exempt for ilp32.
* gcc.target/aarch64/bti-2.c: Likewise.
* gcc.target/aarch64/bti-3.c: Likewise.

Only test directive change, hence only tested the above tests with:
RUNTESTFLAGS="--target_board \"unix{-mabi=ilp32}\" aarch64.exp="

Committed as obvious as r267818

Thanks
Sudi
diff --git a/gcc/testsuite/gcc.target/aarch64/bti-1.c b/gcc/testsuite/gcc.target/aarch64/bti-1.c
index 975528cbf290af421f20d8c7edaef22a6bd6..5a556b08ed15679b25676a11fe9c7a64641ee671 100644
--- a/gcc/testsuite/gcc.target/aarch64/bti-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/bti-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* -Os to create jump table.  */
 /* { dg-options "-Os" } */
+/* { dg-require-effective-target lp64 } */
 /* If configured with --enable-standard-branch-protection, don't use
command line option.  */
 /* { dg-additional-options "-mbranch-protection=standard" { target { ! default_branch_protection } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/bti-2.c b/gcc/testsuite/gcc.target/aarch64/bti-2.c
index 85943c3d6415b010c858cb948221e33b0d30a310..6ad89284e1b74ec92ff4661e6a71c92230450d58 100644
--- a/gcc/testsuite/gcc.target/aarch64/bti-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/bti-2.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target aarch64_bti_hw } */
 /* If configured with --enable-standard-branch-protection, don't use
command line option.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/bti-3.c b/gcc/testsuite/gcc.target/aarch64/bti-3.c
index 97cf5d37f42b9313da75481c2ceac884735ac995..9ff9f9d6be1d8708f34f50dc7303a1783c18f204 100644
--- a/gcc/testsuite/gcc.target/aarch64/bti-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/bti-3.c
@@ -1,6 +1,7 @@
 /* This is a copy of gcc/testsuite/gcc.c-torture/execute/pr56982.c to test the
setjmp case of the bti pass.  */
 /* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target aarch64_bti_hw } */
 /* { dg-options "--save-temps -mbranch-protection=standard" } */

Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2019-01-09 Thread Sudakshina Das

Hi

On 20/12/18 16:40, Sudakshina Das wrote:
> Hi James
> 
> On 19/12/18 3:40 PM, James Greenhalgh wrote:
>> On Fri, Dec 14, 2018 at 10:09:03AM -0600, Sudakshina Das wrote:
>>
>> 
>>
>>> I have updated the patch according to our discussions offline.
>>> The md pattern is now split into 4 patterns and i have added a new
>>> test for the setjmp case along with some comments where missing.
>>
>> This is OK for trunk.
>>
> 
> Thanks for the approvals. With this my series is ready to go in trunk. I 
> will wait for Sam's options patch to go in trunk before I commit mine.
> 

Series is committed with a rebase without Sam Tebbs's 3rd patch for 
B-Key addition as r267765 to r267770.

Thanks
Sudi

> Thanks
> Sudi
> 
>> Thanks,
>> James
>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2018-xx-xx  Sudakshina Das  
>>>     Ramana Radhakrishnan  
>>>
>>> * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
>>> * gcc/config/aarch64/aarch64.h: Update comment for
>>> TRAMPOLINE_SIZE.
>>> * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
>>> Update if bti is enabled.
>>> * config/aarch64/aarch64-bti-insert.c: New file.
>>> * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
>>> bti pass.
>>> * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
>>> Declare the new bti pass.
>>> * config/aarch64/aarch64.md (unspecv): Add UNSPECV_BTI_NOARG,
>>> UNSPECV_BTI_C, UNSPECV_BTI_J and UNSPECV_BTI_JC.
>>> (bti_noarg, bti_j, bti_c, bti_jc): New define_insns.
>>> * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2018-xx-xx  Sudakshina Das  
>>>
>>> * gcc.target/aarch64/bti-1.c: New test.
>>> * gcc.target/aarch64/bti-2.c: New test.
>>> * gcc.target/aarch64/bti-3.c: New test.
>>> * lib/target-supports.exp
>>> (check_effective_target_aarch64_bti_hw): Add new check for
>>> BTI hw.
>>>
>>> Thanks
>>> Sudi

Re: [PATCH][GCC][Aarch64] Change expected bfxil count in gcc.target/aarch64/combine_bfxil.c to 18 (PR/87763)

2019-01-04 Thread Sudakshina Das

Hi Sam

On 04/01/19 10:26, Sam Tebbs wrote:
> 
> On 12/19/18 4:47 PM, Sam Tebbs wrote:
> 
>> Hi all,
>>
>> Since r265398 (combine: Do not combine moves from hard registers), the bfxil
>> scan in gcc.target/aarch64/combine_bfxil.c has been failing.
>>
>> FAIL: gcc.target/aarch64/combine_bfxil.c scan-assembler-times bfxil\\t 13
>>
>> This is because bfi was generated for the combine_* functions in the
>> above test,
>> but as of r265398, bfxil is preferred over bfi and so the bfxil count has
>> increased. This patch increases the scan count to 18 to account for this so
>> that the test passes.
>>
>> Before r265398
>>
>> combine_zero_extended_int:
>>    bfxil   x0, x1, 0, 16
>>    ret
>>
>> combine_balanced:
>>    bfi x0, x1, 0, 32
>>    ret
>>
>> combine_minimal:
>>    bfi x0, x1, 0, 1
>>    ret
>>
>> combine_unbalanced:
>>    bfi x0, x1, 0, 24
>>    ret
>>
>> combine_balanced_int:
>>    bfi w0, w1, 0, 16
>>    ret
>>
>> combine_unbalanced_int:
>>    bfi w0, w1, 0, 8
>>    ret
>>
>> With r265398
>>
>> combine_zero_extended_int:
>>    bfxil   x0, x1, 0, 16
>>    ret
>>
>> combine_balanced:
>>    bfxil   x0, x1, 0, 32
>>    ret
>>
>> combine_minimal:
>>    bfxil   x0, x1, 0, 1
>>    ret
>>
>> combine_unbalanced:
>>    bfxil   x0, x1, 0, 24
>>    ret
>>
>> combine_balanced_int:
>>    bfxil   w0, w1, 0, 16
>>    ret
>>
>> combine_unbalanced_int:
>>    bfxil   w0, w1, 0, 8
>>    ret
>>
>> These bfxil and bfi invocations are equivalent, so this patch won't hide any
>> incorrect code-gen.
>>
>> Bootstrapped on aarch64-none-linux-gnu and regression tested on
>> aarch64-none-elf with no regressions.
>>
>> OK for trunk?
>>

I am not a maintainer but this looks ok to me on its own. However I see 
that you commented about this patch on PR87763. Can you please add the 
PR tag in your changelog entry. Also since I did not see anyone else 
comment on the PR after your comment, I am adding some of the people 
from the PR to the cc list.

Thanks
Sudi

>> gcc/testsuite/Changelog:
>>
>> 2018-12-19  Sam Tebbs  
>>
>>    * gcc.target/aarch64/combine_bfxil.c: Change
>> scan-assembler-times bfxil count to 18.
> ping
>

Re: Fix devirtualiation in expanded thunks

2018-12-28 Thread Sudakshina Das

Hi Jan

On 21/12/18 7:20 PM, Jan Hubicka wrote:
> Hi,
> this patch fixes polymorphic call analysis in thunks.  Unlike normal
> methods, thunks take THIS pointer offsetted by a known constant. This
> needs t be compensated for when calculating address of outer type.
>
> Bootstrapped/regtested x86_64-linux, also tested with Firefox where this
> bug trigger misoptimization in spellchecker.  I plan to backport it to
> release branches soon.
>
> Honza
>
>   PR ipa/88561
>   * ipa-polymorphic-call.c
>   (ipa_polymorphic_call_context::ipa_polymorphic_call_context): Handle
>   arguments of thunks correctly.
>   (ipa_polymorphic_call_context::get_dynamic_context): Be ready for
>   NULL instance pinter.
>   * lto-cgraph.c (lto_output_node): Always stream thunk info.
>   * g++.dg/tree-prof/devirt.C: New testcase.
> Index: ipa-polymorphic-call.c
> ===
> --- ipa-polymorphic-call.c(revision 267325)
> +++ ipa-polymorphic-call.c(working copy)
> @@ -995,9 +995,22 @@ ipa_polymorphic_call_context::ipa_polymo
>   {
> outer_type
>= TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (base_pointer)));
> +   cgraph_node *node = cgraph_node::get (current_function_decl);
> gcc_assert (TREE_CODE (outer_type) == RECORD_TYPE
> || TREE_CODE (outer_type) == UNION_TYPE);
>   
> +   /* Handle the case we inlined into a thunk.  In this case
> +  thunk has THIS pointer of type bar, but it really receives
> +  address to its base type foo which sits in bar at
> +  0-thunk.fixed_offset.  It starts with code that adds
> +  think.fixed_offset to the pointer to compensate for this.
> +
> +  Because we walked all the way to the begining of thunk, we now
> +  see pointer _offset and need to compensate
> +  for it.  */
> +   if (node->thunk.fixed_offset)
> + offset -= node->thunk.fixed_offset * BITS_PER_UNIT;
> +
> /* Dynamic casting has possibly upcasted the type
>in the hiearchy.  In this case outer type is less
>informative than inner type and we should forget
> @@ -1005,7 +1018,11 @@ ipa_polymorphic_call_context::ipa_polymo
> if ((otr_type
>  && !contains_type_p (outer_type, offset,
>   otr_type))
> -   || !contains_polymorphic_type_p (outer_type))
> +   || !contains_polymorphic_type_p (outer_type)
> +   /* If we compile thunk with virtual offset, the THIS pointer
> +  is adjusted by unknown value.  We can't thus use outer info
> +  at all.  */
> +   || node->thunk.virtual_offset_p)
>   {
> outer_type = NULL;
> if (instance)
> @@ -1030,7 +1047,15 @@ ipa_polymorphic_call_context::ipa_polymo
> maybe_in_construction = false;
>   }
> if (instance)
> - *instance = base_pointer;
> + {
> +   /* If method is expanded thunk, we need to apply thunk offset
> +  to instance pointer.  */
> +   if (node->thunk.virtual_offset_p
> +   || node->thunk.fixed_offset)
> + *instance = NULL;
> +   else
> + *instance = base_pointer;
> + }
> return;
>   }
> /* Non-PODs passed by value are really passed by invisible
> @@ -1547,6 +1572,9 @@ ipa_polymorphic_call_context::get_dynami
> HOST_WIDE_INT instance_offset = offset;
> tree instance_outer_type = outer_type;
>   
> +  if (!instance)
> +return false;
> +
> if (otr_type)
>   otr_type = TYPE_MAIN_VARIANT (otr_type);
>   
> Index: lto-cgraph.c
> ===
> --- lto-cgraph.c  (revision 267325)
> +++ lto-cgraph.c  (working copy)
> @@ -547,7 +547,11 @@ lto_output_node (struct lto_simple_outpu
> streamer_write_bitpack ();
> streamer_write_data_stream (ob->main_stream, section, strlen (section) + 
> 1);
>   
> -  if (node->thunk.thunk_p)
> +  /* Stream thunk info always because we use it in
> + ipa_polymorphic_call_context::ipa_polymorphic_call_context
> + to properly interpret THIS pointers for thunks that has been converted
> + to Gimple.  */
> +  if (node->definition)
>   {
> streamer_write_uhwi_stream
>(ob->main_stream,
> @@ -1295,7 +1299,7 @@ input_node (struct lto_file_decl_data *f
> if (section)
>   node->set_section_for_node (section);
>   
> -  if (node->thunk.thunk_p)
> +  if (node->definition)
>   {
> int type = streamer_read_uhwi (ib);
> HOST_WIDE_INT fixed_offset = streamer_read_uhwi (ib);
> Index: testsuite/g++.dg/tree-prof/devirt.C
> ===
> --- testsuite/g++.dg/tree-prof/devirt.C   (nonexistent)
> +++ testsuite/g++.dg/tree-prof/devirt.C   (working copy)
> @@ -0,0

Re: GCC 8 backports

2018-12-28 Thread Sudakshina Das

Hi Martin

On 27/12/18 12:32 PM, Martin Liška wrote:
> On 11/20/18 11:58 AM, Martin Liška wrote:
>> On 10/3/18 11:23 AM, Martin Liška wrote:
>>> On 9/25/18 8:48 AM, Martin Liška wrote:
 Hi.

 One more tested patch.

 Martin

>>> One more tested patch.
>>>
>>> Martin
>>>
>> Hi.
>>
>> One another tested patch that I'm going to install.
>>
>> Martin
>>
> Hi.
>
> One another tested patch that I'm going to install.
>
> Thanks,
> Martin

The last backport of r267338 causes the following failures on 
arm-none-linux-gnueabihf and aarch64-none-linux-gnu

UNRESOLVED: g++.dg/tree-prof/devirt.C scan-ipa-dump-times dom3 "3" 
folding virtual function call to virtual unsigned int 
mozPersonalDictionary::AddRef
UNRESOLVED: g++.dg/tree-prof/devirt.C scan-ipa-dump-times dom3 "3" 
folding virtual function call to virtual unsigned int 
mozPersonalDictionary::_ZThn16

with

g++.dg/tree-prof/devirt.C: dump file does not exist

Thanks

Sudi

Re: [PATCH] PR fortran/81509 and fortran/45513

2018-12-28 Thread Sudakshina Das

Hi Steve

On 27/12/18 8:58 PM, Steve Kargl wrote:
> On Thu, Dec 27, 2018 at 11:24:07AM +0000, Sudakshina Das wrote:
>> With the failure as:
>>
>> Excess errors:
>> /build/src/gcc/libgomp/testsuite/libgomp.fortran/aligned1.f03:55:14:
>> Error: Arguments of 'iand' have different kind type parameters at (1)
>> /build/src/gcc/libgomp/testsuite/libgomp.fortran/aligned1.f03:59:14:
>> Error: Arguments of 'iand' have different kind type parameters at (1)
>>
> This should be fixed, now.  Sorry about the breakage.

Thanks for the quick fix!

Sudi

Re: [PATCH] PR fortran/81509 and fortran/45513

2018-12-27 Thread Sudakshina Das

Hi Steve

On 23/12/18 6:49 PM, Steve Kargl wrote:
> This is a re-submission of a patch I submitted 15 months ago.
> See https://gcc.gnu.org/ml/fortran/2017-09/msg00124.html
>
> At that time one reviewer OK'd the patch for committing,
> and one reviewer raised objections to the patch as I
> chose to remove dubious extensions to the Fortran standard.
> I withdrew that patch with the expection that Someone
> would fix the bug.  Well, Someone has not materialized.
>
> The patch has been retested on i586-*-freebsd and x86_64-*-freebsd.
>
> OK to commit as-is?
>
> Here's the text from the above URL.
>
> In short, F2008 now allows boz-literal-constants in IAND, IOR, IEOR,
> DSHIFTL, DSHIFTR, and MERGE_BITS.  gfortran currently allows a BOZ
> argument, but she was not enforcing restrictions in F2008.  The
> attach patch causes gfortran to conform to F2008.
>
> As a side effect, the patch removes a questionable GNU Fortran
> extension that allowed arguments to IAND, IOR, and IEOR to have
> different kind type parameters.  The behavior of this extension
> was not documented.
>
> 2017-09-27  Steven G. Kargl  
>
>   PR fortran/45513
>   PR fortran/81509
>   * check.c: Rename function gfc_check_iand to gfc_check_iand_ieor_ior.
>   * check.c (boz_args_check): New function.  Check I and J not both BOZ.
>   (gfc_check_dshift,gfc_check_iand_ieor_ior, gfc_check_ishft,
>gfc_check_and, gfc_check_merge_bits): Use it.
>   * check.c (gfc_check_iand_ieor_ior): Force conversion of BOZ to kind
>   type of other agrument.  Remove silly GNU extension.
>   (gfc_check_ieor, gfc_check_ior): Delete now unused functions.
>   * intrinsic.c (add_functions): Use gfc_check_iand_ieor_ior. Wrap long
>   line.
>   * intrinsic.h: Rename gfc_check_iand to gfc_check_iand_ieor_ior.
>   Delete prototype for bool gfc_check_ieor and gfc_check_ior
>   * intrinsic.texi: Update documentation for boz-literal-constant.
>
> 2017-09-27  Steven G. Kargl  
>
>   PR fortran/45513
>   PR fortran/81509
>   * gfortran.dg/graphite/id-26.f03: Fix non-conforming use of IAND.
>   * gfortran.dg/pr81509_1.f90: New test.
>   * gfortran.dg/pr81509_2.f90: New test.
>
This patch has caused the following failures on aarch64-none-linux-gnu:

FAIL: libgomp.fortran/aligned1.f03   -O0  (test for excess errors)
FAIL: libgomp.fortran/aligned1.f03   -O1  (test for excess errors)
FAIL: libgomp.fortran/aligned1.f03   -O2  (test for excess errors)
FAIL: libgomp.fortran/aligned1.f03   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors)
FAIL: libgomp.fortran/aligned1.f03   -O3 -g  (test for excess errors)
FAIL: libgomp.fortran/aligned1.f03   -Os  (test for excess errors)

With the failure as:

Excess errors:
/build/src/gcc/libgomp/testsuite/libgomp.fortran/aligned1.f03:55:14: 
Error: Arguments of 'iand' have different kind type parameters at (1)
/build/src/gcc/libgomp/testsuite/libgomp.fortran/aligned1.f03:59:14: 
Error: Arguments of 'iand' have different kind type parameters at (1)

Thanks

Sudi

Re: [Committed] XFAIL gfortran.dg/ieee/ieee_9.f90

2018-12-27 Thread Sudakshina Das

Hi

On 25/12/18 5:13 PM, Steve Kargl wrote:
> On Tue, Dec 25, 2018 at 09:51:03AM +0200, Janne Blomqvist wrote:
>> On Mon, Dec 24, 2018 at 9:42 PM Steve Kargl <
>> s...@troutmask.apl.washington.edu> wrote:
>>
>>> On Mon, Dec 24, 2018 at 09:29:50PM +0200, Janne Blomqvist wrote:
 On Mon, Dec 24, 2018 at 8:05 PM Steve Kargl <
 s...@troutmask.apl.washington.edu> wrote:

> I've added the following patch to a recently committed testcase.
>
> Index: gcc/testsuite/gfortran.dg/ieee/ieee_9.f90
> ===
> --- gcc/testsuite/gfortran.dg/ieee/ieee_9.f90   (revision 267413)
> +++ gcc/testsuite/gfortran.dg/ieee/ieee_9.f90   (working copy)
> @@ -1,4 +1,4 @@
> -! { dg-do run }
> +! { dg-do run { xfail arm*-*-gnueabi arm*-*-gnueabihf } }
>   program foo
>  use ieee_arithmetic
>  use iso_fortran_env
>
 The problem seems to be that GFortran says the real128 kind value is > 0
 (i.e. that the target supports quad precision floating point (with
>>> software
 emulation, presumably)), but then trying to use it fails.

 Would be nice if somebody who cares about arm-none-linux-gnueabihf could
 help figure out the proper resolution instead of papering over it with
 XFAIL.

 But I guess XFAIL is good enough until said somebody turns up.

>>> Thanks for chasing down the details.  I have no access to arm*-*-*.
>>>
>>> It's a shame the real128 is defined, and arm*-*-* doesn't
>>> actually use it.  I certainly have no time or interest in
>>> fix this.
>>>
>> I think there are arm systems on the compile farm, but I haven't actually
>> checked myself, just going by the error messages Sudi Das reported.
>>
>> That being said, having slept over it, I actually think there is a problem
>> with the testcase, and not with arm*. So the errors in the testcase occurs
>> in code like
>>
>> if (real128 > 0) then
>>  p = int(ieee_scalb(real(x, real128), int(i, int8)))
>>  if (p /= 64) stop 3
>> end if
>>
>> So if real128 is negative, as it should be if the target doesn't support
>> quad precision float, the branch will never be taken, but the frontend will
>> still generate code for it (though it will later be optimized away as
>> unreachable), and that's where the error occurs. So the testcase would need
>> something like
>>
>> integer, parameter :: large_real = max (real64, real128)
>> ! ...
>> if (real128 > 0) then
>>  p = int(ieee_scalb(real(x, large_real), int(i, int8)))
>>  if (p /= 64) stop 3
>> end if
>>
>> If you concur, please consider a patch fixing the testcase and removing the
>> xfail pre-approved.
>>
> Indeed, you are probably correct that gfortran will generate
> intermediate code and then garbage collect it.  This then will
> give an error for real(..., real128) in the statement for p.
> If real128 /= 4, 8, 10, or 16.  I'll fix the testcase.
>
> Do you know if we can get gfortran to pre-define macros for cpp?
> That is, it would be nice if gfortran would recognize, say,
> HAVE_GFC_REAL_10 and HAVE_GFC_REAL_16 if the target supports those
> types.  Then the testcase could be copied to ieee_9.F90, and
> modified to
>
> #ifdef HAVE_REAL_16
>  p = int(ieee_scalb(real(x, 16), int(i, int8)))
>  if (p /= 64) stop 3
> #endif
>
Thanks for looking into this. Sorry I was on holiday for Christmas. 
CC'ing Arm maintainers in case they have something to add.

Thanks

Sudi

Re: Fix devirtualization with LTO

2018-12-24 Thread Sudakshina Das

Hi Jan

On 22/12/18 8:08 PM, Jan Hubicka wrote:
> Hi,
> while fixing Firefox issues I also noticed that type simplification
> completely disabled type based devirtualization on LTO path. Problem
> is that method pointers now point to simplified type and
> obj_type_ref_class is not ready for that.
>
> I also moved testcases where it makes sense to lto so this does not
> happen again. This is not trivial task since one needs to work out why
> testcases behaves differently when they do, so I will follow up on this
> and convert more.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> Honza
>
>   * tree.c: (obj_type_ref_class): Move to...
>   * ipa-devirt.c (obj_type_ref_class): Move to here; lookup main
>   odr type.
>   (get_odr_type): Compensate for type simplification.
>
>   * g++.dg/ipa/devirt-30.C: Add dg-do.
>   * g++.dg/lto/devirt-1_0.C: New testcase.
>   * g++.dg/lto/devirt-2_0.C: New testcase.
>   * g++.dg/lto/devirt-3_0.C: New testcase.
>   * g++.dg/lto/devirt-4_0.C: New testcase.
>   * g++.dg/lto/devirt-5_0.C: New testcase.
>   * g++.dg/lto/devirt-6_0.C: New testcase.
>   * g++.dg/lto/devirt-13_0.C: New testcase.
>   * g++.dg/lto/devirt-14_0.C: New testcase.
>   * g++.dg/lto/devirt-19_0.C: New testcase.
>   * g++.dg/lto/devirt-22_0.C: New testcase.
>   * g++.dg/lto/devirt-23_0.C: New testcase.
>   * g++.dg/lto/devirt-30_0.C: New testcase.
>   * g++.dg/lto/devirt-34_0.C: New testcase.
>   
I am seeing the following failures on aarch64-none-elf, 
aarch64-none-linux-gnu, aarch64_be-none-elf, arm-none-eabi, 
arm-none-linux-gnueabihf:

UNRESOLVED: g++-dg-lto-devirt-13-01.exe scan-tree-dump-times ssa 
"OBJ_TYPE_REF" 0
UNRESOLVED: g++-dg-lto-devirt-13-11.exe scan-tree-dump-times ssa 
"OBJ_TYPE_REF" 0
UNRESOLVED: g++-dg-lto-devirt-13-21.exe scan-tree-dump-times ssa 
"OBJ_TYPE_REF" 0
UNRESOLVED: g++-dg-lto-devirt-14-01.exe scan-tree-dump-not ssa "A.*foo"
UNRESOLVED: g++-dg-lto-devirt-14-11.exe scan-tree-dump-not ssa "A.*foo"
UNRESOLVED: g++-dg-lto-devirt-23-01.exe scan-wpa-ipa-dump cp "Discovered 
a virtual call to"

With an error like:

g++-dg-lto-devirt-14-11.exe: dump file does not exist

In my brief attempt, I can see that the scan-dump* routines are 
computing the wrong base name. I get the following if I edit

diff --git a/gcc/testsuite/lib/scandump.exp 
b/gcc/testsuite/lib/scandump.exp
index 3d42692..5961623 100644
--- a/gcc/testsuite/lib/scandump.exp
+++ b/gcc/testsuite/lib/scandump.exp
@@ -160,7 +160,7 @@ proc scan-dump-not { args } {
  set dumpbase [dump-base $src [lindex $args 3]]
  set output_file "[glob -nocomplain $dumpbase.[lindex $args 2]]"
  if { $output_file == "" } {
-   verbose -log "$testcase: dump file does not exist"
+   verbose -log "$testcase: dump file does not exist $dumpbase"
     unresolved "$testname"
     return
  }

g++-dg-lto-devirt-14-11.exe: dump file does not exist 
g++-dg-lto-devirt-14-11.exe

UNRESOLVED: g++-dg-lto-devirt-14-11.exe scan-tree-dump-not ssa "A.*foo

Thanks

Sudi

> Index: ipa-devirt.c
> ===
> --- ipa-devirt.c  (revision 267337)
> +++ ipa-devirt.c  (working copy)
> @@ -1985,6 +1985,30 @@ add_type_duplicate (odr_type val, tree t
> return build_bases;
>   }
>   
> +/* REF is OBJ_TYPE_REF, return the class the ref corresponds to.  */
> +
> +tree
> +obj_type_ref_class (const_tree ref)
> +{
> +  gcc_checking_assert (TREE_CODE (ref) == OBJ_TYPE_REF);
> +  ref = TREE_TYPE (ref);
> +  gcc_checking_assert (TREE_CODE (ref) == POINTER_TYPE);
> +  ref = TREE_TYPE (ref);
> +  /* We look for type THIS points to.  ObjC also builds
> + OBJ_TYPE_REF with non-method calls, Their first parameter
> + ID however also corresponds to class type. */
> +  gcc_checking_assert (TREE_CODE (ref) == METHOD_TYPE
> +|| TREE_CODE (ref) == FUNCTION_TYPE);
> +  ref = TREE_VALUE (TYPE_ARG_TYPES (ref));
> +  gcc_checking_assert (TREE_CODE (ref) == POINTER_TYPE);
> +  tree ret = TREE_TYPE (ref);
> +  if (!in_lto_p)
> +ret = TYPE_CANONICAL (ret);
> +  else
> +ret = get_odr_type (ret)->type;
> +  return ret;
> +}
> +
>   /* Get ODR type hash entry for TYPE.  If INSERT is true, create
>  possibly new entry.  */
>   
> @@ -2000,6 +2024,8 @@ get_odr_type (tree type, bool insert)
> int base_id = -1;
>   
> type = TYPE_MAIN_VARIANT (type);
> +  if (!in_lto_p)
> +type = TYPE_CANONICAL (type);
>   
> gcc_checking_assert (can_be_name_hashed_p (type)
>  || can_be_vtable_hashed_p (type));
> Index: testsuite/g++.dg/ipa/devirt-30.C
> ===
> --- testsuite/g++.dg/ipa/devirt-30.C  (revision 267337)
> +++ testsuite/g++.dg/ipa/devirt-30.C  (working copy)
> @@ -1,4 +1,5 @@
>   // PR c++/58678
> +// { dg-do compile }
>   // { dg-options "-O3 -fdump-ipa-devirt" }
>   
>   // We shouldn't

Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-24 Thread Sudakshina Das

Hi Steve

On 21/12/18 8:01 PM, Steve Kargl wrote:
> On Fri, Dec 21, 2018 at 07:39:45PM +, Joseph Myers wrote:
>> On Fri, 21 Dec 2018, Steve Kargl wrote:
>>
>>> scalbln(double x, long n)
>>> {
>>>
>>>  return (scalbn(x, (n > NMAX) ? NMAX : (n < NMIN) ? NMIN : (int)n));
>>> }
>>>
>>> A search for glibc's libm locateshttps://tinyurl.com/ybcy8w4t
>>> which is a bit-twiddling routine.  Not sure it's worth the
>>> effort.  Joseph Myers might have an opinion.
>> Such comparisons are needed in the scalbn / scalbln implementations anyway
>> to deal with large exponents.  I suppose where there's a suitable scalbln
>> implementation, and you don't know if the arguments are within the range
>> of int, calling scalbln at least saves code size in the caller and avoids
>> duplicating those range checks.
>>
> I was thinking along the lines of -ffast-math and whether
> __builtin_scalbn and __builtin_scalbln are then inlined.
> The comparisons may inhibit inlining __builtin_scalbn;
> while, if gfortran used __builtin_scalbln, inlining would
> occur.
>
> As it is, for
>
> function foo(x,i)
>   use ieee_arithmetic
>   real(8) foo, c
>   integer(8) i
>   foo = ieee_scalb(c, i)
> end function foo
>
> the options -ffast-math -O3 -fdump-tree-optimized give
>
> [local count: 1073741824]:
>_gfortran_ieee_procedure_entry ();
>_8 = *i_7(D);
>_1 = MIN_EXPR <_8, 2147483647>;
>_2 = MAX_EXPR <_1, -2147483647>;
>_3 = (integer(kind=4)) _2;
>_4 = __builtin_scalbn (c_9(D), _3);
>_gfortran_ieee_procedure_exit ();
>fpstate.0 ={v} {CLOBBER};
>return _4;
>
> It seems this could be
>
> [local count: 1073741824]:
>_gfortran_ieee_procedure_entry ();
>_3 = (integer(kind=4)) *i_7(D);
>_4 = __builtin_scalbn (c_9(D), _3
>_gfortran_ieee_procedure_exit ();
>fpstate.0 ={v} {CLOBBER};
>
I am observing your new test pr88328.f90 failing on 
arm-none-linux-gnueabihf:
Excess errors:
/build/src/gcc/gcc/testsuite/gfortran.dg/ieee/ieee_9.f90:20:36: Error: 
Invalid kind for REAL at (1)
/build/src/gcc/gcc/testsuite/gfortran.dg/ieee/ieee_9.f90:35:36: Error: 
Invalid kind for REAL at (1)
/build/src/gcc/gcc/testsuite/gfortran.dg/ieee/ieee_9.f90:50:36: Error: 
Invalid kind for REAL at (1)
/build/src/gcc/gcc/testsuite/gfortran.dg/ieee/ieee_9.f90:65:36: Error: 
Invalid kind for REAL at (1)

Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2018-12-20 Thread Sudakshina Das

Hi James

On 19/12/18 3:40 PM, James Greenhalgh wrote:
> On Fri, Dec 14, 2018 at 10:09:03AM -0600, Sudakshina Das wrote:
> 
> 
> 
>> I have updated the patch according to our discussions offline.
>> The md pattern is now split into 4 patterns and i have added a new
>> test for the setjmp case along with some comments where missing.
> 
> This is OK for trunk.
> 

Thanks for the approvals. With this my series is ready to go in trunk. I 
will wait for Sam's options patch to go in trunk before I commit mine.

Thanks
Sudi

> Thanks,
> James
> 
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>  Ramana Radhakrishnan  
>>
>>  * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
>>  * gcc/config/aarch64/aarch64.h: Update comment for
>>  TRAMPOLINE_SIZE.
>>  * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
>>  Update if bti is enabled.
>>  * config/aarch64/aarch64-bti-insert.c: New file.
>>  * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
>>  bti pass.
>>  * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
>>  Declare the new bti pass.
>>  * config/aarch64/aarch64.md (unspecv): Add UNSPECV_BTI_NOARG,
>>  UNSPECV_BTI_C, UNSPECV_BTI_J and UNSPECV_BTI_JC.
>>  (bti_noarg, bti_j, bti_c, bti_jc): New define_insns.
>>  * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * gcc.target/aarch64/bti-1.c: New test.
>>  * gcc.target/aarch64/bti-2.c: New test.
>>  * gcc.target/aarch64/bti-3.c: New test.
>>  * lib/target-supports.exp
>>  (check_effective_target_aarch64_bti_hw): Add new check for
>>  BTI hw.
>>
>> Thanks
>> Sudi

Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2018-12-14 Thread Sudakshina Das

Hi James

On 29/11/18 16:47, Sudakshina Das wrote:
> Hi
> 
> On 13/11/18 14:47, Sudakshina Das wrote:
>> Hi
>>
>> On 02/11/18 18:38, Sudakshina Das wrote:
>>> Hi
>>>
>>> This patch is part of a series that enables ARMv8.5-A in GCC and
>>> adds Branch Target Identification Mechanism.
>>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>>
>>> This patch adds a new pass called "bti" which is triggered by the
>>> command line argument -mbranch-protection whenever "bti" is turned on.
>>>
>>> The pass iterates through the instructions and adds appropriated BTI
>>> instructions based on the following:
>>>* Add a new "BTI C" at the beginning of a function, unless its 
>>> already
>>>  protected by a "PACIASP/PACIBSP". We exempt the functions that are
>>>  only called directly.
>>>* Add a new "BTI J" for every target of an indirect jump, jump table
>>>  targets, non-local goto targets or labels that might be referenced
>>>  by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL)
>>>
>>> Since we have already changed the use of indirect tail calls to only x16
>>> and x17, we do not have to use "BTI JC".
>>> (check patch 3/6).
>>>
>>
>> I missed out on the explanation for the changes to the trampoline code.
>> The patch also updates the trampoline code in case BTI is enabled. Since
>> the trampoline code is a target of an indirect branch, we need to add an
>> appropriate BTI instruction at the beginning of it to avoid a branch
>> target exception.
>>
>>> Bootstrapped and regression tested with aarch64-none-linux-gnu. Added
>>> new tests.
>>> Is this ok for trunk?
>>>
>>> Thanks
>>> Sudi
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2018-xx-xx  Sudakshina Das  
>>> Ramana Radhakrishnan  
>>>
>>> * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
>>> * gcc/config/aarch64/aarch64.h: Update comment for
>>> TRAMPOLINE_SIZE.
>>> * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
>>> Update if bti is enabled.
>>> * config/aarch64/aarch64-bti-insert.c: New file.
>>> * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
>>> bti pass.
>>> * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
>>> Declare the new bti pass.
>>> * config/aarch64/aarch64.md (bti_nop): Define.
>>> * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2018-xx-xx  Sudakshina Das  
>>>
>>> * gcc.target/aarch64/bti-1.c: New test.
>>> * gcc.target/aarch64/bti-2.c: New test.
>>> * lib/target-supports.exp
>>> (check_effective_target_aarch64_bti_hw): Add new check for
>>> BTI hw.
>>>
>>
>> Updated patch attached with more comments and a bit of simplification
>> in aarch64-bti-insert.c. ChangeLog still applies.
>>
>> Thanks
>> Sudi
>>
> 
> I found a missed case in the bti pass and edited the patch to include
> it. This made me realize that the only 2 regressions I saw with the
> BTI enabled model can now be avoided. (as quoted below from my 6/6
> patch)
> "Bootstrapped and regression tested with aarch64-none-linux-gnu with
> and without the configure option turned on.
> Also tested on aarch64-none-elf with and without configure option with a
> BTI enabled aem. Only 2 regressions and these were because newlib
> requires patches to protect hand coded libraries with BTI."
> 
> The ChangeLog still applies.
> 
> Sudi
> 
I have updated the patch according to our discussions offline.
The md pattern is now split into 4 patterns and i have added a new
test for the setjmp case along with some comments where missing.

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  
Ramana Radhakrishnan  

* config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
        * gcc/config/aarch64/aarch64.h: Update comment for
TRAMPOLINE_SIZE.
* config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
Update if bti is enabled.
* config/aarch64/aarch64-bti-insert.c: New file.
* config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
bti pass.
* config/aarch64/aarch64-protos.h (make_pass_insert_bti):
Declare the new

Re: [PATCH, GCC, AARCH64, 3/6] Restrict indirect tail calls to x16 and x17

2018-11-29 Thread Sudakshina Das

Hi

On 02/11/18 18:37, Sudakshina Das wrote:
> Hi
> 
> This patch is part of a series that enables ARMv8.5-A in GCC and
> adds Branch Target Identification Mechanism.
> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
> 
> This patch changes the registers that are allowed for indirect tail
> calls. We are choosing to restrict these to only x16 or x17.
> 
> Indirect tail calls are special in a way that they convert a call
> statement (BLR instruction) to a jump statement (BR instruction). For
> the best possible use of Branch Target Identification Mechanism, we
> would like to place a "BTI C" (call) at the beginning of the function
> which is only compatible with BLRs and BR X16/X17. In order to make
> indirect tail calls compatible with this scenario, we are restricting
> the TAILCALL_ADDR_REGS.
> 
> In order to use x16/x17 for this purpose, we also had to change the use
> of these registers in the epilogue/prologue handling. For this purpose
> we are now using x12 and x13 named as EP0_REGNUM and EP1_REGNUM as
> scratch registers for epilogue and prologue.
> 
> Bootstrapped and regression tested with aarch64-none-linux-gnu. Updated
> test. Ran Spec2017 and no performance hit.
> 
> Is this ok for trunk?
> 
> Thanks
> Sudi
> 
> 
> *** gcc/ChangeLog***
> 
> 2018-xx-xx  Sudakshina Das  
> 
>* config/aarch64/aarch64.c (aarch64_expand_prologue): Use new
>epilogue/prologue scratch registers EP0_REGNUM and EP1_REGNUM.
>(aarch64_expand_epilogue): Likewise.
>(aarch64_output_mi_thunk): Likewise
>* config/aarch64/aarch64.h (REG_CLASS_CONTENTS): Change
>   TAILCALL_ADDR_REGS
>to x16 and x17.
>    * config/aarch64/aarch64.md: Define EP0_REGNUM and EP1_REGNUM.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
> 
>* gcc.target/aarch64/test_frame_17.c: Update to check for
>   EP0_REGNUM instead of IP0_REGNUM and add test case.
> 
I have edited the patch to take out a change that was not needed as part
of this patch in aarch64_expand_epilogue. The only change now happening
there is as mentioned in the ChangeLog to replace the uses of IP0/IP1.
ChangeLog still applies.

Thanks
Sudi
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 4bec6bd963d91c475a4e18f883955093e9268cfd..cc95be32d40268d3647c8280188f17ff8212a156 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -586,7 +586,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS		\
 {	\
   { 0x, 0x, 0x },	/* NO_REGS */		\
-  { 0x0004, 0x, 0x },	/* TAILCALL_ADDR_REGS */\
+  { 0x0003, 0x, 0x },	/* TAILCALL_ADDR_REGS */\
   { 0x7fff, 0x, 0x0003 },	/* GENERAL_REGS */	\
   { 0x8000, 0x, 0x },	/* STACK_REG */		\
   { 0x, 0x, 0x0003 },	/* POINTER_REGS */	\
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index da7430f1fd88566c4f017a1b491f8de7dce724e8..f4ff300b883ce832335a4915b22bcbfefe64d9ae 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5357,8 +5357,8 @@ aarch64_expand_prologue (void)
 	aarch64_emit_probe_stack_range (get_stack_check_protect (), frame_size);
 }
 
-  rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
-  rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
+  rtx tmp0_rtx = gen_rtx_REG (Pmode, EP0_REGNUM);
+  rtx tmp1_rtx = gen_rtx_REG (Pmode, EP1_REGNUM);
 
   /* In theory we should never have both an initial adjustment
  and a callee save adjustment.  Verify that is the case since the
@@ -5368,7 +5368,7 @@ aarch64_expand_prologue (void)
   /* Will only probe if the initial adjustment is larger than the guard
  less the amount of the guard reserved for use by the caller's
  outgoing args.  */
-  aarch64_allocate_and_probe_stack_space (ip0_rtx, ip1_rtx, initial_adjust,
+  aarch64_allocate_and_probe_stack_space (tmp0_rtx, tmp1_rtx, initial_adjust,
 	  true, false);
 
   if (callee_adjust != 0)
@@ -5386,7 +5386,7 @@ aarch64_expand_prologue (void)
 	}
   aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
 			  stack_pointer_rtx, callee_offset,
-			  ip1_rtx, ip0_rtx, frame_pointer_needed);
+			  tmp1_rtx, tmp0_rtx, frame_pointer_needed);
   if (frame_pointer_needed && !frame_size.is_constant ())
 	{
 	  /* Variable-sized frames need to describe the save slot
@@ -5428,7 +5428,7 @@ aarch64_expand_prologue (void)
 
   /* We may need to probe the final adjustment if it is larger than the guard
  that is assumed by the called.  */
-  aarch64_allocate_and_probe_stack_space (ip1_rtx, ip0_rtx, final_adjust,
+  aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, final_adjust

Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2018-11-29 Thread Sudakshina Das

Hi

On 13/11/18 14:47, Sudakshina Das wrote:
> Hi
> 
> On 02/11/18 18:38, Sudakshina Das wrote:
>> Hi
>>
>> This patch is part of a series that enables ARMv8.5-A in GCC and
>> adds Branch Target Identification Mechanism.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>
>> This patch adds a new pass called "bti" which is triggered by the
>> command line argument -mbranch-protection whenever "bti" is turned on.
>>
>> The pass iterates through the instructions and adds appropriated BTI
>> instructions based on the following:
>>   * Add a new "BTI C" at the beginning of a function, unless its already
>> protected by a "PACIASP/PACIBSP". We exempt the functions that are
>> only called directly.
>>   * Add a new "BTI J" for every target of an indirect jump, jump table
>> targets, non-local goto targets or labels that might be referenced
>> by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL)
>>
>> Since we have already changed the use of indirect tail calls to only x16
>> and x17, we do not have to use "BTI JC".
>> (check patch 3/6).
>>
> 
> I missed out on the explanation for the changes to the trampoline code.
> The patch also updates the trampoline code in case BTI is enabled. Since
> the trampoline code is a target of an indirect branch, we need to add an
> appropriate BTI instruction at the beginning of it to avoid a branch
> target exception.
> 
>> Bootstrapped and regression tested with aarch64-none-linux-gnu. Added
>> new tests.
>> Is this ok for trunk?
>>
>> Thanks
>> Sudi
>>
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>  Ramana Radhakrishnan  
>>
>>  * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
>>  * gcc/config/aarch64/aarch64.h: Update comment for
>>  TRAMPOLINE_SIZE.
>>  * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
>>  Update if bti is enabled.
>>  * config/aarch64/aarch64-bti-insert.c: New file.
>>  * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
>>  bti pass.
>>  * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
>>  Declare the new bti pass.
>>  * config/aarch64/aarch64.md (bti_nop): Define.
>>  * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * gcc.target/aarch64/bti-1.c: New test.
>>  * gcc.target/aarch64/bti-2.c: New test.
>>  * lib/target-supports.exp
>>  (check_effective_target_aarch64_bti_hw): Add new check for
>>  BTI hw.
>>
> 
> Updated patch attached with more comments and a bit of simplification
> in aarch64-bti-insert.c. ChangeLog still applies.
> 
> Thanks
> Sudi
> 

I found a missed case in the bti pass and edited the patch to include
it. This made me realize that the only 2 regressions I saw with the
BTI enabled model can now be avoided. (as quoted below from my 6/6
patch)
"Bootstrapped and regression tested with aarch64-none-linux-gnu with
and without the configure option turned on.
Also tested on aarch64-none-elf with and without configure option with a
BTI enabled aem. Only 2 regressions and these were because newlib
requires patches to protect hand coded libraries with BTI."

The ChangeLog still applies.

Sudi
diff --git a/gcc/config.gcc b/gcc/config.gcc
index b108697cfc7b1c9c6dc1f30cca6fd1158182c29e..3e77f9df6ad6ca55fccca50387eab4b2501af647 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -317,7 +317,7 @@ aarch64*-*-*)
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
-	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o"
+	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-bti-insert.c b/gcc/config/aarch64/aarch64-bti-insert.c
new file mode 100644
index ..be604fb2fd5df052971cc81b7e6d7760880a6b79
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-bti-insert.c
@@ -0,0 +1,236 @@
+/* Branch Target Identification for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part

Re: [PATCH, GCC, AARCH64, 6/6] Enable BTI: Add configure option for BTI and PAC-RET

2018-11-13 Thread Sudakshina Das

Hi James

On 07/11/18 15:36, James Greenhalgh wrote:
> On Fri, Nov 02, 2018 at 01:38:46PM -0500, Sudakshina Das wrote:
>> Hi
>>
>> This patch is part of a series that enables ARMv8.5-A in GCC and
>> adds Branch Target Identification Mechanism.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>
>> This patch is adding a new configure option for enabling and return
>> address signing by default with --enable-standard-branch-protection.
>> This is equivalent to -mbranch-protection=standard which would
>> imply -mbranch-protection=pac-ret+bti.
>>
>> Bootstrapped and regression tested with aarch64-none-linux-gnu with
>> and without the configure option turned on.
>> Also tested on aarch64-none-elf with and without configure option with a
>> BTI enabled aem. Only 2 regressions and these were because newlib
>> requires patches to protect hand coded libraries with BTI.
>>
>> Is this ok for trunk?
> 
> With a tweak to the comment above your changes in aarch64.c, yes this is OK.
> 
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * config/aarch64/aarch64.c (aarch64_override_options): Add case to check
>>  configure option to set BTI and Return Address Signing.
>>  * configure.ac: Add --enable-standard-branch-protection and
>>  --disable-standard-branch-protection.
>>  * configure: Regenerated.
>>  * doc/install.texi: Document the same.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * gcc.target/aarch64/bti-1.c: Update test to not add command
>>  line option when configure with bti.
>>  * gcc.target/aarch64/bti-2.c: Likewise.
>>  * lib/target-supports.exp
>>  (check_effective_target_default_branch_protection):
>>  Add configure check for --enable-standard-branch-protection.
>>
> 
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 
>> 12a55a640de4fdc5df21d313c7ea6841f1daf3f2..a1a5b7b464eaa2ce67ac66d9aea837159590aa07
>>  100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -11558,6 +11558,26 @@ aarch64_override_options (void)
>> if (!selected_tune)
>>   selected_tune = selected_cpu;
>>   
>> +  if (aarch64_enable_bti == 2)
>> +{
>> +#ifdef TARGET_ENABLE_BTI
>> +  aarch64_enable_bti = 1;
>> +#else
>> +  aarch64_enable_bti = 0;
>> +#endif
>> +}
>> +
>> +  /* No command-line option yet.  */
> 
> This is too broad. Can you narrow this down to which command line option this
> relates to, and what the expected default behaviours are (for both LP64 and
> ILP32).
> 

Updated patch attached. Return address signing is not supported for
ILP32 currently. This patch just follows that and hence the extra ILP32
check is added.

Thanks
Sudi

> Thanks,
> James
> 
>> +  if (accepted_branch_protection_string == NULL && !TARGET_ILP32)
>> +{
>> +#ifdef TARGET_ENABLE_PAC_RET
>> +  aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF;
>> +  aarch64_ra_sign_key = AARCH64_KEY_A;
>> +#else
>> +  aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE;
>> +#endif
>> +}
>> +
>>   #ifndef HAVE_AS_MABI_OPTION
>> /* The compiler may have been configured with 2.23.* binutils, which does
>>not have support for ILP32.  */
> 



diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b97d9e4deecf5ca33761dfd1008c39bb4b849881..e267d3441fd7f21105bfba339b69f2ecdb7595ae 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11579,6 +11579,28 @@ aarch64_override_options (void)
   if (!selected_tune)
 selected_tune = selected_cpu;
 
+  if (aarch64_enable_bti == 2)
+{
+#ifdef TARGET_ENABLE_BTI
+  aarch64_enable_bti = 1;
+#else
+  aarch64_enable_bti = 0;
+#endif
+}
+
+  /* Return address signing is currently not supported for ILP32 targets.  For
+ LP64 targets use the configured option in the absence of a command-line
+ option for -mbranch-protection.  */
+  if (!TARGET_ILP32 && accepted_branch_protection_string == NULL)
+{
+#ifdef TARGET_ENABLE_PAC_RET
+  aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF;
+  aarch64_ra_sign_key = AARCH64_KEY_A;
+#else
+  aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE;
+#endif
+}
+
 #ifndef HAVE_AS_MABI_OPTION
   /* The compiler may have been configured with 2.23.* binutils, which does
  not have support for ILP32.  */
diff --git a/gcc/configure b/gcc/configure
index 03461f1e2753

Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2018-11-13 Thread Sudakshina Das

Hi

On 02/11/18 18:38, Sudakshina Das wrote:
> Hi
> 
> This patch is part of a series that enables ARMv8.5-A in GCC and
> adds Branch Target Identification Mechanism.
> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
> 
> This patch adds a new pass called "bti" which is triggered by the
> command line argument -mbranch-protection whenever "bti" is turned on.
> 
> The pass iterates through the instructions and adds appropriated BTI
> instructions based on the following:
>  * Add a new "BTI C" at the beginning of a function, unless its already
>protected by a "PACIASP/PACIBSP". We exempt the functions that are
>only called directly.
>  * Add a new "BTI J" for every target of an indirect jump, jump table
>targets, non-local goto targets or labels that might be referenced
>by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL)
> 
> Since we have already changed the use of indirect tail calls to only x16
> and x17, we do not have to use "BTI JC".
> (check patch 3/6).
> 

I missed out on the explanation for the changes to the trampoline code.
The patch also updates the trampoline code in case BTI is enabled. Since
the trampoline code is a target of an indirect branch, we need to add an
appropriate BTI instruction at the beginning of it to avoid a branch
target exception.

> Bootstrapped and regression tested with aarch64-none-linux-gnu. Added
> new tests.
> Is this ok for trunk?
> 
> Thanks
> Sudi
> 
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
>   Ramana Radhakrishnan  
> 
>   * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
>   * gcc/config/aarch64/aarch64.h: Update comment for
>   TRAMPOLINE_SIZE.
>   * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
>   Update if bti is enabled.
>   * config/aarch64/aarch64-bti-insert.c: New file.
>   * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
>   bti pass.
>   * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
>   Declare the new bti pass.
>   * config/aarch64/aarch64.md (bti_nop): Define.
>   * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
> 
>   * gcc.target/aarch64/bti-1.c: New test.
>   * gcc.target/aarch64/bti-2.c: New test.
>   * lib/target-supports.exp
>   (check_effective_target_aarch64_bti_hw): Add new check for
>   BTI hw.
>

Updated patch attached with more comments and a bit of simplification
in aarch64-bti-insert.c. ChangeLog still applies.

Thanks
Sudi

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b108697cfc7b1c9c6dc1f30cca6fd1158182c29e..3e77f9df6ad6ca55fccca50387eab4b2501af647 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -317,7 +317,7 @@ aarch64*-*-*)
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
-	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o"
+	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-bti-insert.c b/gcc/config/aarch64/aarch64-bti-insert.c
new file mode 100644
index ..15202e0def3b514bdbd1564b39a121e43e01a67f
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-bti-insert.c
@@ -0,0 +1,226 @@
+/* Branch Target Identification for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#define INCLUDE_STRING
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include &qu

Re: [PATCH, GCC, AARCH64, 1/6] Enable ARMv8.5-A in gcc

2018-11-13 Thread Sudakshina Das

Hi James

On 07/11/18 15:16, James Greenhalgh wrote:
> On Fri, Nov 02, 2018 at 01:37:33PM -0500, Sudakshina Das wrote:
>> Hi
>>
>> This patch is part of a series that enables ARMv8.5-A in GCC and
>> adds Branch Target Identification Mechanism.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>
>> This patch add the march option for armv8.5-a.
>>
>> Bootstrapped and regression tested with aarch64-none-linux-gnu.
>> Is this ok for trunk?
> 
> One minor tweak, otherwise OK.
> 
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * config/aarch64/aarch64-arches.def: Define AARCH64_ARCH for
>>  ARMv8.5-A.
>>  * gcc/config/aarch64/aarch64.h (AARCH64_FL_V8_5): New.
>>  (AARCH64_FL_FOR_ARCH8_5, AARCH64_ISA_V8_5): New.
>>  * gcc/doc/invoke.texi: Document ARMv8.5-A.
> 
>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>> index 
>> fa9af26fd40fd23b1c9cd6da9b6300fd77089103..b324cdd2fede33af13c03362750401f9eb1c9a90
>>  100644
>> --- a/gcc/config/aarch64/aarch64.h
>> +++ b/gcc/config/aarch64/aarch64.h
>> @@ -170,6 +170,8 @@ extern unsigned aarch64_architecture_version;
>>   #define AARCH64_FL_SHA3  (1 << 18)  /* Has ARMv8.4-a SHA3 and 
>> SHA512.  */
>>   #define AARCH64_FL_F16FML (1 << 19)  /* Has ARMv8.4-a FP16 extensions. 
>>  */
>>   #define AARCH64_FL_RCPC8_4(1 << 20)  /* Has ARMv8.4-a RCPC extensions. 
>>  */
>> +/* ARMv8.5-A architecture extensions.  */
>> +#define AARCH64_FL_V8_5   (1 << 22)  /* Has ARMv8.5-A features.  */
>>   
>>   /* Statistical Profiling extensions.  */
>>   #define AARCH64_FL_PROFILE(1 << 21)
> 
> Let's keep this in order. 20, 21, 22.
> 

I have the moved the Armv8.5 stuff below. Patch attached.
If this looks ok, I will rebase 2/6 on top. Let me know if you
want me to resend the rebased 2/6 too.

Thanks
Sudi

> Thanks,
> James
> 
> 



diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index a37a5553894d6ab1d629017ea204478f69d8773d..7d05cd604093d15f27e5b197803a50c45a260e6e 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -35,5 +35,6 @@ AARCH64_ARCH("armv8.1-a", generic,	 8_1A,	8,  AARCH64_FL_FOR_ARCH8_1)
 AARCH64_ARCH("armv8.2-a", generic,	 8_2A,	8,  AARCH64_FL_FOR_ARCH8_2)
 AARCH64_ARCH("armv8.3-a", generic,	 8_3A,	8,  AARCH64_FL_FOR_ARCH8_3)
 AARCH64_ARCH("armv8.4-a", generic,	 8_4A,	8,  AARCH64_FL_FOR_ARCH8_4)
+AARCH64_ARCH("armv8.5-a", generic,	 8_5A,	8,  AARCH64_FL_FOR_ARCH8_5)
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 8ab21e7bc37c7d5ffba1a365345f70d9f501b3ac..8ce8445586f29963107848604c5e2bab8e853685 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -177,6 +177,9 @@ extern unsigned aarch64_architecture_version;
 /* Statistical Profiling extensions.  */
 #define AARCH64_FL_PROFILE(1 << 21)
 
+/* ARMv8.5-A architecture extensions.  */
+#define AARCH64_FL_V8_5	  (1 << 22)  /* Has ARMv8.5-A features.  */
+
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
 
@@ -195,6 +198,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8_4			\
   (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \
| AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4)
+#define AARCH64_FL_FOR_ARCH8_5			\
+  (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5)
 
 /* Macros to test ISA flags.  */
 
@@ -216,6 +221,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_SHA3	   (aarch64_isa_flags & AARCH64_FL_SHA3)
 #define AARCH64_ISA_F16FML	   (aarch64_isa_flags & AARCH64_FL_F16FML)
 #define AARCH64_ISA_RCPC8_4	   (aarch64_isa_flags & AARCH64_FL_RCPC8_4)
+#define AARCH64_ISA_V8_5	   (aarch64_isa_flags & AARCH64_FL_V8_5)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3e54087ab98049ba932caa34ba2fb135eda48396..26770c5aafda1524d63a89cacf8cc069b7c8b9b6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15118,8 +15118,11 @@ more feature modifiers.  This option has the form
 @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
 
 The permissible values for @var{arch} are @samp{armv8-a},
-@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @samp{armv8.4-a}
-or @var{native}.
+@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a}, @samp{armv8.4-a},
+@samp{armv8.5-a} or @var{native}.
+
+The value @samp{armv8.5-a} implies @samp{armv8.4-a} and enables compiler
+support for the ARMv8.5-A architecture extensions.
 
 The value @samp{armv8.4-a} implies @samp{armv8.3-a} and enables compiler
 support for the ARMv8.4-A architecture extensions.

Re: [PATCH 2/3][GCC][AARCH64] Add new -mbranch-protection option to combine pointer signing and BTI

2018-11-12 Thread Sudakshina Das

Hi Sam

On 02/11/18 17:31, Sam Tebbs wrote:
> Hi all,
> 
> The -mbranch-protection option combines the functionality of
> -msign-return-address and the BTI features new in Armv8.5 to better reflect
> their relationship. This new option therefore supersedes and deprecates the
> existing -msign-return-address option.
> 
> -mbranch-protection=[none|standard|] - Turns on different types of 
> branch
> protection available where:
> 
>   * "none": Turn of all types of branch protection
>   * "standard" : Turns on all the types of protection to their respective
> standard levels.
>   *  can be "+" separated protection types:
> 
>   * "bti" : Branch Target Identification Mechanism.
>   * "pac-ret{+leaf+b-key}": Return Address Signing. The default return
> address signing is enabled by signing functions that save the return
> address to memory (non-leaf functions will practically always do this)
> using the a-key. The optional tuning arguments allow the user to
> extend the scope of return address signing to include leaf functions
> and to change the key to b-key. The tuning arguments must proceed the
> protection type "pac-ret".
> 
> Thus -mbranch-protection=standard -> -mbranch-protection=bti+pac-ret.
> 
> Its mapping to -msign-return-address is as follows:
> 
>   * -mbranch-protection=none -> -msign-return-address=none
>   * -mbranch-protection=standard -> -msign-return-address=leaf
>   * -mbranch-protection=pac-ret -> -msign-return-address=non-leaf
>   * -mbranch-protection=pac-ret+leaf -> -msign-return-address=all
> 
> This patch implements the option's skeleton and the "none", "standard" and
> "pac-ret" types (along with its "leaf" subtype).
> 
> The previous patch in this series is here:
> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00103.html
> 
> Bootstrapped successfully and tested on aarch64-none-elf with no regressions.
> 
> OK for trunk?
> 

Thank for doing this. I am not a maintainer so you will need a
maintainer's approval. Only nit, that I would add is that it would
be good to have more test coverage, specially for the new parsing
functions that have been added and the errors that are added.

Example checking a few valid and invalid combinations of the options
like:
-mbranch-protection=pac-ret -mbranch-protection=none //disables
everything
-mbranch-protection=leaf  //errors out
-mbranch-protection=none+pac-ret //errors out
... etc

Also instead of removing all the old deprecated options, you can keep
one (or a copy of one) to check for the deprecated warning.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
e290128f535f3e6b515bff5a81fae0aa0d1c8baf..07cfe69dc3dd9161a2dd93089ccf52ef251208d2
 
100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15221,13 +15222,18 @@ accessed using a single instruction and 
emitted after each function.  This
  limits the maximum size of functions to 1MB.  This is enabled by 
default for
  @option{-mcmodel=tiny}.

-@item -msign-return-address=@var{scope}
-@opindex msign-return-address
-Select the function scope on which return address signing will be applied.
-Permissible values are @samp{none}, which disables return address signing,
-@samp{non-leaf}, which enables pointer signing for functions which are 
not leaf
-functions, and @samp{all}, which enables pointer signing for all 
functions.  The
-default value is @samp{none}.
+@item 
-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]
+@opindex mbranch-protection
+Select the branch protection features to use.
+@samp{none} is the default and turns off all types of branch protection.
+@samp{standard} turns on all types of branch protection features.  If a 
feature
+has additional tuning options, then @samp{standard} sets it to its standard
+level.
+@samp{pac-ret[+@var{leaf}]} turns on return address signing to its standard
+level: signing functions that save the return address to memory (non-leaf
+functions will practically always do this) using the a-key.  The optional
+argument @samp{leaf} can be used to extend the signing to include leaf
+functions.

I am not sure if deleting the previous documentation of
-msign-retun-address is the way to go. Maybe add a "this has been
deprecated and refer to -mbranch-protection" to its description.

Thanks
Sudi

> gcc/ChangeLog:
> 
> 2018-11-02  Sam Tebbs
> 
>   * config/aarch64/aarch64.c (BRANCH_PROTEC_STR_MAX,
>   aarch64_parse_branch_protection,
>   struct aarch64_branch_protec_type,
>   aarch64_handle_no_branch_protection,
>   aarch64_handle_standard_branch_protection,
>   aarch64_validate_mbranch_protection,
>   aarch64_handle_pac_ret_protection,
>   aarch64_handle_attr_branch_protection,
>   accepted_branch_protection_string,
>   aarch64_pac_ret_subtypes,
>   aarch64_branch_protec_types,
>   aarch64_handle_pac_ret_leaf): Define.
>   (aarch64_override_options_after_change_1): Add

Re: [PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM

2018-11-12 Thread Sudakshina Das

Hi Kyrill

On 09/11/18 18:21, Kyrill Tkachov wrote:
> Hi Sudi,
> 
> On 09/11/18 15:33, Sudakshina Das wrote:
>> Hi
>>
>> This patch adds -march=armv8.5-a to the Arm backend.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>  
>>
>> Armv8.5-A also adds two new security features:
>> - Speculation Barrier instruction
>> - Execution and Data Prediction Restriction Instructions
>> These are made optional to all older Armv8-A versions. Thus we are
>> adding two new options "+sb" and "+predres" to all older Armv8-A. These
>> are passed on to the assembler and have no code generation effects and
>> have already gone in the trunk of binutils.
>>
>> Bootstrapped and regression tested with arm-none-linux-gnueabihf.
>>
>> Is this ok for trunk?
>> Sudi
>>
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>> * config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
>> (ARMv8_5a): New fgroup.
>> (armv8.5-a): New arch.
>> (armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
>> options sb and predres.
>> * config/arm/arm-tables.opt: Regenerate.
>> * config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
>> * config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
>> * config/arm/t-multilib (v8_5_a_simd_variants): New variable.
>> Add matching rules for -march=armv8.5-a and extensions.
>> * doc/invoke.texi (ARM options): Document -march=armv8.5-a.
>> Add sb and predres to all armv8-a except armv8.5-a.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>> * gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
>> combination tests.
> 
> Hi
> 
> This patch adds -march=armv8.5-a to the Arm backend.
> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>  
> 
> Armv8.5-A also adds two new security features:
> - Speculation Barrier instruction
> - Execution and Data Prediction Restriction Instructions
> These are made optional to all older Armv8-A versions. Thus we are
> adding two new options "+sb" and "+predres" to all older Armv8-A. These
> are passed on to the assembler and have no code generation effects and
> have already gone in the trunk of binutils.
> 
> Bootstrapped and regression tested with arm-none-linux-gnueabihf.
> 
> Is this ok for trunk?
> Sudi
> 
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das
> 
>  * config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
>  (ARMv8_5a): New fgroup.
>  (armv8.5-a): New arch.
>  (armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
>  options sb and predres.
>  * config/arm/arm-tables.opt: Regenerate.
>  * config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
>  * config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
>  * config/arm/t-multilib (v8_5_a_simd_variants): New variable.
>  Add matching rules for -march=armv8.5-a and extensions.
>  * doc/invoke.texi (ARM options): Document -march=armv8.5-a.
>  Add sb and predres to all armv8-a except armv8.5-a.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das
> 
>  * gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
>  combination tests.
> 
> 
> 
> This is ok modulo a typo fix below.
> 
> Thanks,
> Kyrill
> 

Thanks. Fixed and committed as r266031.

Sudi

> 
> 
> index 
> 25788ad09851daf41038b1578307bf23b7f34a94..eba038f9d20bc54bef7bdb7fa1c0e7028d954ed7
>  
> 100644
> --- a/gcc/config/arm/t-multilib
> +++ b/gcc/config/arm/t-multilib
> @@ -70,7 +70,8 @@ v8_a_simd_variants    := $(call all_feat_combs, simd 
> crypto)
>   v8_1_a_simd_variants    := $(call all_feat_combs, simd crypto)
>   v8_2_a_simd_variants    := $(call all_feat_combs, simd fp16 fp16fml 
> crypto dotprod)
>   v8_4_a_simd_variants    := $(call all_feat_combs, simd fp16 crypto)
> -v8_r_nosimd_variants    := +crc
> +v8_5_a_simd_variants    := $(call all_feat_combs, simd fp16 crypto)
> +v8_r_nosimd_variants    := +cr5
> 
> 
> Typo, should be +crc
> 
> 
>

[PATCH, GCC, AArch64] Branch Dilution Pass

2018-11-09 Thread Sudakshina Das

Hi

I am posting this patch on behalf of Carey (cc'ed). I also have some
review comments that I will make as a reply to this later.


This implements a new AArch64 specific back-end pass that helps optimize
branch-dense code, which can be a bottleneck for performance on some Arm
cores. This is achieved by padding out the branch-dense sections of the
instruction stream with nops.

This has proven to show up to a 2.61%~ improvement on the Cortex A-72
(SPEC CPU 2006: sjeng).

The implementation includes the addition of a new RTX instruction class
FILLER_INSN, which has been white listed to allow placement of NOPs
outside of a basic block. This is to allow padding after unconditional
branches. This is favorable so that any performance gained from
diluting branches is not paid straight back via excessive eating of nops.

It was deemed that a new RTX class was less invasive than modifying
behavior in regards to standard UNSPEC nops.

## Command Line Options

Three new target-specific options are provided:
- mbranch-dilution
- mbranch-dilution-granularity={num}
- mbranch-dilution-max-branches={num}

A number of cores known to be able to benefit from this pass have been
given default tuning values for their granularity and max-branches.
Each affected core has a very specific granule size and associated
max-branch limit. This is a microarchitecture specific optimization. 
Typical usage should be -mdilute-branches with a specificed -mcpu. Cores 
with a granularity tuned to 0 will be ignored. Options are provided for 
experimentation.

## Algorithm and Heuristic

The pass takes a very simple 'sliding window' approach to the problem. 
We crawl through each instruction (starting at the first branch) and 
keep track of the number of branches within the current "granule" (or 
window). When this exceeds the max-branch value, the pass will dilute 
the current granule, inserting nops to push out some of the branches. 
The heuristic will favour unconditonal branches (for performance 
reasons), or branches that are between two other branches (in order to 
decrease the likelihood of another dilution call being needed).

Each branch type required a different method for nop insertion due to 
RTL/basic_block restrictions:

- Returning calls do not end a basic block so can be handled by emitting
a generic nop.
- Unconditional branches must be the end of a basic block, and nops 
cannot be outside of a basic block.
   Thus the need for FILLER_INSN, which allows placement outside of a 
basic block - and translates to a nop.
- For most conditional branches we've taken a simple approach and only 
handle the fallthru edge for simplicity,
   which we do by inserting a "nop block" of nops on the fallthru edge, 
mapping that back to the original destination block.
- asm gotos and pcsets are going to be tricky to analyse from a dilution 
perspective so are ignored at present.


## Changelog

gcc/testsuite/ChangeLog:

2018-11-09  Carey Williams  

* gcc.target/aarch64/branch-dilution-off.c: New test.
* gcc.target/aarch64/branch-dilution-on.c: New test.


gcc/ChangeLog:

2018-11-09  Carey Williams  

* cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case.
* cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
basic blocks.
* config.gcc (extra_objs): Add aarch64-branch-dilution.o.
* config/aarch64/aarch64-branch-dilution.c: New file.
* config/aarch64/aarch64-passes.def (branch-dilution): Register
pass.
* config/aarch64/aarch64-protos.h (struct tune_params): Declare
tuning parameters bdilution_gsize and bdilution_maxb.
(make_pass_branch_dilution): New declaration.
* config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
cortexa53_tunings,cortexa57_tunings,cortexa72_tunings,
cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings,
thunderx_tunings,tsv110_tunings,xgene1_tunings,
qdf24xx_tunings,saphira_tunings,thunderx2t99_tunings):
Provide default tunings for bdilution_gsize and bdilution_maxb.
* config/aarch64/aarch64.md (filler_insn): Define new insn.
* config/aarch64/aarch64.opt (mbranch-dilution,
mbranch-dilution-granularity,
mbranch-dilution-max-branches): Define new branch dilution
options.
* config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule
for aarch64-branch-dilution.c.
* coretypes.h (rtx_filler_insn): New rtx class.
* doc/invoke.texi (mbranch-dilution,
mbranch-dilution-granularity,
mbranch-dilution-max-branches): Document branch dilution
options.
* emit-rtl.c (emit_filler_after): New emit function.
* rtl.def (FILLER_INSN): New RTL EXPR of type RTX_INSN.
* rtl.h (class GTY): New class for rtx_filler_insn.
(is_a_helper ::test): New test helper for rtx_filler_insn.
(macro FILLER_INSN_P(X)): New predicate.
* target-insns.def

[PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM

2018-11-09 Thread Sudakshina Das

Hi

This patch adds -march=armv8.5-a to the Arm backend.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
Armv8.5-A also adds two new security features:
- Speculation Barrier instruction
- Execution and Data Prediction Restriction Instructions
These are made optional to all older Armv8-A versions. Thus we are
adding two new options "+sb" and "+predres" to all older Armv8-A. These
are passed on to the assembler and have no code generation effects and
have already gone in the trunk of binutils.

Bootstrapped and regression tested with arm-none-linux-gnueabihf.

Is this ok for trunk?
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
(ARMv8_5a): New fgroup.
(armv8.5-a): New arch.
(armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
options sb and predres.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
* config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
* config/arm/t-multilib (v8_5_a_simd_variants): New variable.
Add matching rules for -march=armv8.5-a and extensions.
* doc/invoke.texi (ARM options): Document -march=armv8.5-a.
Add sb and predres to all armv8-a except armv8.5-a.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
combination tests.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index d82e95a226659948e59b317f07e0fd386ed674a2..e6bcc3c720b64f4c80d9bff101e756de82d760e6 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -114,6 +114,9 @@ define feature armv8_3
 # Architecture rel 8.4.
 define feature armv8_4
 
+# Architecture rel 8.5.
+define feature armv8_5
+
 # M-Profile security extensions.
 define feature cmse
 
@@ -174,6 +177,14 @@ define feature quirk_cm3_ldrd
 # (Very) slow multiply operations.  Should probably be a tuning bit.
 define feature smallmul
 
+# Speculation Barrier Instruction for v8-A architectures, added by
+# default to v8.5-A
+define feature sb
+
+# Execution and Data Prediction Restriction Instruction for
+# v8-A architectures, added by default from v8.5-A
+define feature predres
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -235,6 +246,7 @@ define fgroup ARMv8_1aARMv8a crc32 armv8_1
 define fgroup ARMv8_2aARMv8_1a armv8_2
 define fgroup ARMv8_3aARMv8_2a armv8_3
 define fgroup ARMv8_4aARMv8_3a armv8_4
+define fgroup ARMv8_5aARMv8_4a armv8_5 sb predres
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
@@ -505,6 +517,8 @@ begin arch armv8-a
  option crypto add FP_ARMv8 CRYPTO
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option sb add sb
+ option predres add predres
 end arch armv8-a
 
 begin arch armv8.1-a
@@ -517,6 +531,8 @@ begin arch armv8.1-a
  option crypto add FP_ARMv8 CRYPTO
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option sb add sb
+ option predres add predres
 end arch armv8.1-a
 
 begin arch armv8.2-a
@@ -532,6 +548,8 @@ begin arch armv8.2-a
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
  option dotprod add FP_ARMv8 DOTPROD
+ option sb add sb
+ option predres add predres
 end arch armv8.2-a
 
 begin arch armv8.3-a
@@ -547,6 +565,8 @@ begin arch armv8.3-a
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
  option dotprod add FP_ARMv8 DOTPROD
+ option sb add sb
+ option predres add predres
 end arch armv8.3-a
 
 begin arch armv8.4-a
@@ -560,8 +580,23 @@ begin arch armv8.4-a
  option crypto add FP_ARMv8 CRYPTO DOTPROD
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option sb add sb
+ option predres add predres
 end arch armv8.4-a
 
+begin arch armv8.5-a
+ tune for cortex-a53
+ tune flags CO_PROC
+ base 8A
+ profile A
+ isa ARMv8_5a
+ option simd add FP_ARMv8 DOTPROD
+ option fp16 add fp16 fp16fml FP_ARMv8 DOTPROD
+ option crypto add FP_ARMv8 CRYPTO DOTPROD
+ option nocrypto remove ALL_CRYPTO
+ option nofp remove ALL_FP
+end arch armv8.5-a
+
 begin arch armv8-m.base
  tune for cortex-m23
  base 8M_BASE
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index eacee746a39912d04aa03c636f9a95e0e72ce43b..dde6e137db5598d92df6a1e69a63140146bf7372 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -377,19 +377,22 @@ EnumValue
 Enum(arm_arch) String(armv8.4-a) Value(24)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.base) Value(25)
+Enum(arm_arch) String(armv8.5-a) Value(25)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.main) Value(26)
+Enum(arm_arch)

Re: [PATCH, arm] Backport -- Fix ICE during thunk generation with -mlong-calls

2018-11-08 Thread Sudakshina Das

Hi Mihail

On 08/11/18 10:02, Ramana Radhakrishnan wrote:
> On 07/11/2018 17:49, Mihail Ionescu wrote:
>> Hi All,
>>
>> This is a backport from trunk for GCC 8 and 7.
>>
>> SVN revision: r264595.
>>
>> Regression tested on arm-none-eabi.
>>
>>
>> gcc/ChangeLog
>>
>> 2018-11-02  Mihail Ionescu  
>>
>>  Backport from mainiline
>>  2018-09-26  Eric Botcazou  
>>
>>  * config/arm/arm.c (arm_reorg): Skip Thumb reorg pass for thunks.
>>  (arm32_output_mi_thunk): Deal with long calls.
>>
>> gcc/testsuite/ChangeLog
>>
>> 2018-11-02  Mihail Ionescu  
>>
>>  Backport from mainiline
>>2018-09-17  Eric Botcazou  
>>
>>  * g++.dg/other/thunk2a.C: New test.
>>  * g++.dg/other/thunk2b.C: Likewise.
>>
>>
>> If everything is ok, could someone commit it on my behalf?
>>
>> Best regards,
>>   Mihail
>>
> 
> It is a regression since my rewrite of this code.
> 
> Ok to backport to the release branches, it's been on trunk for a while
> and not shown any issues - please give the release managers a day or so
> to object.
> 
> regards
> Ramana
> 

Does this fix PR87867 you reported? If yes, then it would be easier
to add the PR tag in the ChangeLog so that the ticket gets updated once
committed.

Thanks
Sudi

Re: [PATCH, GCC, AARCH64, 1/6] Enable ARMv8.5-A in gcc

2018-11-02 Thread Sudakshina Das

Hi

On 02/11/18 18:37, Sudakshina Das wrote:
> Hi
> 
> This patch is part of a series that enables ARMv8.5-A in GCC and
> adds Branch Target Identification Mechanism.
> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>  
> 
> 
> This patch add the march option for armv8.5-a.
> 
> Bootstrapped and regression tested with aarch64-none-linux-gnu.
> Is this ok for trunk?
> 
> Thanks
> Sudi
> 
> 
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
> 
>  * config/aarch64/aarch64-arches.def: Define AARCH64_ARCH for
>  ARMv8.5-A.
>  * gcc/config/aarch64/aarch64.h (AARCH64_FL_V8_5): New.
>  (AARCH64_FL_FOR_ARCH8_5, AARCH64_ISA_V8_5): New.
>  * gcc/doc/invoke.texi: Document ARMv8.5-A.
> 

As per an offline chat earlier with Richard, I was supposed to send
future patch series as a reply on a single thread. Sadly I forgot to
do that this time. So I am adding links of the other patches here to
make it easy to link the series:

[PATCH, GCC, AARCH64, 2/6] Add new arch command line feaures from 
ARMv8.5-A : https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00111.html

[PATCH, GCC, AARCH64, 3/6] Restrict indirect tail calls to x16 and x17:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00113.html

[PATCH, GCC, AARCH64, 4/6] Enable BTI: Add new  to 
-mbranch-protection: 
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00114.html

[PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00115.html

[PATCH, GCC, AARCH64, 6/6] Enable BTI: Add configure option for BTI and 
PAC-RET: https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00116.html


Sorry!
Sudi

[PATCH, GCC, AARCH64, 6/6] Enable BTI: Add configure option for BTI and PAC-RET

2018-11-02 Thread Sudakshina Das

Hi

This patch is part of a series that enables ARMv8.5-A in GCC and
adds Branch Target Identification Mechanism.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)

This patch is adding a new configure option for enabling and return
address signing by default with --enable-standard-branch-protection.
This is equivalent to -mbranch-protection=standard which would
imply -mbranch-protection=pac-ret+bti.

Bootstrapped and regression tested with aarch64-none-linux-gnu with
and without the configure option turned on.
Also tested on aarch64-none-elf with and without configure option with a
BTI enabled aem. Only 2 regressions and these were because newlib
requires patches to protect hand coded libraries with BTI.

Is this ok for trunk?

Thanks
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/aarch64/aarch64.c (aarch64_override_options): Add case to check
configure option to set BTI and Return Address Signing.
* configure.ac: Add --enable-standard-branch-protection and
--disable-standard-branch-protection.
* configure: Regenerated.
* doc/install.texi: Document the same.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* gcc.target/aarch64/bti-1.c: Update test to not add command
line option when configure with bti.
* gcc.target/aarch64/bti-2.c: Likewise.
* lib/target-supports.exp
(check_effective_target_default_branch_protection):
Add configure check for --enable-standard-branch-protection.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12a55a640de4fdc5df21d313c7ea6841f1daf3f2..a1a5b7b464eaa2ce67ac66d9aea837159590aa07 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11558,6 +11558,26 @@ aarch64_override_options (void)
   if (!selected_tune)
 selected_tune = selected_cpu;
 
+  if (aarch64_enable_bti == 2)
+{
+#ifdef TARGET_ENABLE_BTI
+  aarch64_enable_bti = 1;
+#else
+  aarch64_enable_bti = 0;
+#endif
+}
+
+  /* No command-line option yet.  */
+  if (accepted_branch_protection_string == NULL && !TARGET_ILP32)
+{
+#ifdef TARGET_ENABLE_PAC_RET
+  aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF;
+  aarch64_ra_sign_key = AARCH64_KEY_A;
+#else
+  aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE;
+#endif
+}
+
 #ifndef HAVE_AS_MABI_OPTION
   /* The compiler may have been configured with 2.23.* binutils, which does
  not have support for ILP32.  */
diff --git a/gcc/configure b/gcc/configure
index 03461f1e27538a3a0791c2b61b0e75c3ff1a25be..a0f95106c22ee858bbf4516f14cd9d265dede272 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -947,6 +947,7 @@ with_plugin_ld
 enable_gnu_indirect_function
 enable_initfini_array
 enable_comdat
+enable_standard_branch_protection
 enable_fix_cortex_a53_835769
 enable_fix_cortex_a53_843419
 with_glibc_version
@@ -1677,6 +1678,14 @@ Optional Features:
   --enable-initfini-array	use .init_array/.fini_array sections
   --enable-comdat enable COMDAT group support
 
+  --enable-standard-branch-protection
+  enable Branch Target Identification Mechanism and
+  Return Address Signing by default for AArch64
+  --disable-standard-branch-protection
+  disable Branch Target Identification Mechanism and
+  Return Address Signing by default for AArch64
+
+
   --enable-fix-cortex-a53-835769
   enable workaround for AArch64 Cortex-A53 erratum
   835769 by default
@@ -18529,7 +18538,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18532 "configure"
+#line 18541 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18635,7 +18644,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18638 "configure"
+#line 18647 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -24939,6 +24948,25 @@ $as_echo "#define HAVE_AS_SMALL_PIC_RELOCS 1" >>confdefs.h
 
 fi
 
+# Enable Branch Target Identification Mechanism and Return Address
+# Signing by default.
+# Check whether --enable-standard-branch-protection was given.
+if test "${enable_standard_branch_protection+set}" = set; then :
+  enableval=$enable_standard_branch_protection;
+case $enableval in
+  yes)
+tm_defines="${tm_defines} TARGET_ENABLE_BTI=1 TARGET_ENABLE_PAC_RET=1"
+;;
+  no)
+;;
+  *)
+as_fn_error "'$enableval' is an invalid value for --enable-standard-branch-protection.\
+  Valid choices are 'yes' and 'no'." "$LINENO" 5
+

[PATCH, GCC, AARCH64, 4/6] Enable BTI: Add new to -mbranch-protection.

2018-11-02 Thread Sudakshina Das

Hi

This patch is part of a series that enables ARMv8.5-A in GCC and
adds Branch Target Identification Mechanism.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)

NOTE: This patch is dependent on Sam Tebbs patch to deprecate
-msign-return-address and add new -mbranch-protection option
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00104.html

This pass updates the CLI of -mbranch-protection to add "bti" as a new
type of branch protection and also add it its definition of "none" and
"standard". Since the BTI instructions, just like the return address
signing instructions are in the HINT space, this option is not limited
to ARMv8.5-A architecture version.

The option does not really do anything functional.
The functional changes are in the next patch. I am initializing the 
target variable aarch64_enable_bti to 2 since I am also adding a
configure option in a later patch and a value different from 0 and 1
would help identify if its already been updated.

Bootstrapped and regression tested with aarch64-none-linux-gnu.
Is this ok for trunk?

Thanks
Sudi


*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/aarch64/aarch64-protos.h (aarch64_bti_enabled):
Declare.
* config/aarch64/aarch64.c
(aarch64_handle_no_branch_protection): Disable bti for
-mbranch-protection=none.
(aarch64_handle_standard_branch_protection): Enable bti for
-mbranch-protection=standard.
(aarch64_handle_bti_protection): Enable bti for "bti" in the
string to -mbranch-protection.
(aarch64_bti_enabled): Check if bti is enabled.
* config/aarch64/aarch64.opt: Declare target variable.
* doc/invoke.texi: Add bti to the -mbranch-protection
documentation.


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index bba8204fa53083da49d00a8c2b29e62849bd233c..a5ccfe534b6c59c90bd91215f89c59d67fd88688 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -525,6 +525,7 @@ void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
+bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
 void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
  rtx *, rtx *,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 039aec828d7dae60918493abb0d044001ac0b366..836275ab58de894529a72be88ff226da503598dc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1140,6 +1140,7 @@ static enum aarch64_parse_opt_result
 aarch64_handle_no_branch_protection (char* str ATTRIBUTE_UNUSED, char* rest)
 {
   aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE;
+  aarch64_enable_bti = 0;
   if (rest)
 {
   error ("unexpected %<%s%> after %<%s%>", rest, str);
@@ -1154,6 +1155,7 @@ aarch64_handle_standard_branch_protection (char* str ATTRIBUTE_UNUSED,
 {
   aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF;
   aarch64_ra_sign_key = AARCH64_KEY_A;
+  aarch64_enable_bti = 1;
   if (rest)
 {
   error ("unexpected %<%s%> after %<%s%>", rest, str);
@@ -1187,6 +1189,14 @@ aarch64_handle_pac_ret_b_key (char* str ATTRIBUTE_UNUSED,
   return AARCH64_PARSE_OK;
 }
 
+static enum aarch64_parse_opt_result
+aarch64_handle_bti_protection (char* str ATTRIBUTE_UNUSED,
+char* rest ATTRIBUTE_UNUSED)
+{
+  aarch64_enable_bti = 1;
+  return AARCH64_PARSE_OK;
+}
+
 static const struct aarch64_branch_protec_type aarch64_pac_ret_subtypes[] = {
   { "leaf", aarch64_handle_pac_ret_leaf, NULL, 0 },
   { "b-key", aarch64_handle_pac_ret_b_key, NULL, 0 },
@@ -1198,6 +1208,7 @@ static const struct aarch64_branch_protec_type aarch64_branch_protec_types[] = {
   { "standard", aarch64_handle_standard_branch_protection, NULL, 0 },
   { "pac-ret", aarch64_handle_pac_ret_protection, aarch64_pac_ret_subtypes,
 sizeof (aarch64_pac_ret_subtypes) / sizeof (aarch64_branch_protec_type) },
+  { "bti", aarch64_handle_bti_protection, NULL, 0 },
   { NULL, NULL, NULL, 0 }
 };
 
@@ -4581,6 +4592,13 @@ aarch64_return_address_signing_enabled (void)
 	  && cfun->machine->frame.reg_offset[LR_REGNUM] >= 0));
 }
 
+/* Return TRUE if Branch Target Identification Mechanism is enabled.  */
+bool
+aarch64_bti_enabled (void)
+{
+  return (aarch64_enable_bti == 1);
+}
+
 /* Emit code to save the callee-saved registers from register number START
to LIMIT to the stack at the location starting at offset START_OFFSET,
skipping any write-back candidates if SKIP_WB is true.  */
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 9460636d93b67af1525f028176aa78e6fed4e45f..fc2064bd688490765b977eca777245986274d268 100644

[PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2018-11-02 Thread Sudakshina Das

Hi

This patch is part of a series that enables ARMv8.5-A in GCC and
adds Branch Target Identification Mechanism.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)

This patch adds a new pass called "bti" which is triggered by the
command line argument -mbranch-protection whenever "bti" is turned on.

The pass iterates through the instructions and adds appropriated BTI 
instructions based on the following:
* Add a new "BTI C" at the beginning of a function, unless its already
  protected by a "PACIASP/PACIBSP". We exempt the functions that are
  only called directly.
* Add a new "BTI J" for every target of an indirect jump, jump table
  targets, non-local goto targets or labels that might be referenced
  by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL)

Since we have already changed the use of indirect tail calls to only x16 
and x17, we do not have to use "BTI JC".
(check patch 3/6).

Bootstrapped and regression tested with aarch64-none-linux-gnu. Added 
new tests.
Is this ok for trunk?

Thanks
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  
Ramana Radhakrishnan  

* config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
* gcc/config/aarch64/aarch64.h: Update comment for
TRAMPOLINE_SIZE.
* config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
Update if bti is enabled.
* config/aarch64/aarch64-bti-insert.c: New file.
* config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
bti pass.
* config/aarch64/aarch64-protos.h (make_pass_insert_bti):
Declare the new bti pass.
* config/aarch64/aarch64.md (bti_nop): Define.
* config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* gcc.target/aarch64/bti-1.c: New test.
* gcc.target/aarch64/bti-2.c: New test.
* lib/target-supports.exp
(check_effective_target_aarch64_bti_hw): Add new check for
BTI hw.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b108697cfc7b1c9c6dc1f30cca6fd1158182c29e..3e77f9df6ad6ca55fccca50387eab4b2501af647 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -317,7 +317,7 @@ aarch64*-*-*)
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
-	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o"
+	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-bti-insert.c b/gcc/config/aarch64/aarch64-bti-insert.c
new file mode 100644
index ..efd57620d8803302e03ca643b9f2495e188dc19b
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-bti-insert.c
@@ -0,0 +1,195 @@
+/* Branch Target Identification for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#define INCLUDE_STRING
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "gimple.h"
+#include "tm_p.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "emit-rtl.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "dumpfile.h"
+#include "rtl-iter.h"
+#include "cfgrtl.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+
+namespace {
+
+const pass_data pass_data_insert_bti =
+{
+  RTL_PASS, /* type.  */
+  "bti", /* name.  */
+  OPTGROUP_NONE, /* optinfo_flags.  */
+  TV_MACH_DEP, /* tv_id.  */
+  0, /* properties_required.  */
+  0, /* properties_provided.  */
+  0, /* properties_destroyed.  */
+

[PATCH, GCC, AARCH64, 3/6] Restrict indirect tail calls to x16 and x17

2018-11-02 Thread Sudakshina Das

Hi

This patch is part of a series that enables ARMv8.5-A in GCC and
adds Branch Target Identification Mechanism.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)

This patch changes the registers that are allowed for indirect tail
calls. We are choosing to restrict these to only x16 or x17.

Indirect tail calls are special in a way that they convert a call
statement (BLR instruction) to a jump statement (BR instruction). For
the best possible use of Branch Target Identification Mechanism, we 
would like to place a "BTI C" (call) at the beginning of the function
which is only compatible with BLRs and BR X16/X17. In order to make
indirect tail calls compatible with this scenario, we are restricting 
the TAILCALL_ADDR_REGS.

In order to use x16/x17 for this purpose, we also had to change the use
of these registers in the epilogue/prologue handling. For this purpose
we are now using x12 and x13 named as EP0_REGNUM and EP1_REGNUM as
scratch registers for epilogue and prologue.

Bootstrapped and regression tested with aarch64-none-linux-gnu. Updated
test. Ran Spec2017 and no performance hit.

Is this ok for trunk?

Thanks
Sudi


*** gcc/ChangeLog***

2018-xx-xx  Sudakshina Das  

  * config/aarch64/aarch64.c (aarch64_expand_prologue): Use new
  epilogue/prologue scratch registers EP0_REGNUM and EP1_REGNUM.
  (aarch64_expand_epilogue): Likewise.
  (aarch64_output_mi_thunk): Likewise
  * config/aarch64/aarch64.h (REG_CLASS_CONTENTS): Change
TAILCALL_ADDR_REGS
  to x16 and x17.
  * config/aarch64/aarch64.md: Define EP0_REGNUM and EP1_REGNUM.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das  

  * gcc.target/aarch64/test_frame_17.c: Update to check for
EP0_REGNUM instead of IP0_REGNUM and add test case.

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 94184049c9c77d858fd5b3e2a8970a48b70f7529..8e7a8d54351cf7eb1774a474bfbfbebf58070e31 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -579,7 +579,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS		\
 {	\
   { 0x, 0x, 0x },	/* NO_REGS */		\
-  { 0x0004, 0x, 0x },	/* TAILCALL_ADDR_REGS */\
+  { 0x0003, 0x, 0x },	/* TAILCALL_ADDR_REGS */\
   { 0x7fff, 0x, 0x0003 },	/* GENERAL_REGS */	\
   { 0x8000, 0x, 0x },	/* STACK_REG */		\
   { 0x, 0x, 0x0003 },	/* POINTER_REGS */	\
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 27f81b654a2bae3ddd87b99e4b7926cc588a95f5..f9a81f1734e6885662f6a9e6c97bdbcdac24211b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5317,8 +5317,8 @@ aarch64_expand_prologue (void)
 	aarch64_emit_probe_stack_range (get_stack_check_protect (), frame_size);
 }
 
-  rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
-  rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
+  rtx tmp0_rtx = gen_rtx_REG (Pmode, EP0_REGNUM);
+  rtx tmp1_rtx = gen_rtx_REG (Pmode, EP1_REGNUM);
 
   /* In theory we should never have both an initial adjustment
  and a callee save adjustment.  Verify that is the case since the
@@ -5328,7 +5328,7 @@ aarch64_expand_prologue (void)
   /* Will only probe if the initial adjustment is larger than the guard
  less the amount of the guard reserved for use by the caller's
  outgoing args.  */
-  aarch64_allocate_and_probe_stack_space (ip0_rtx, ip1_rtx, initial_adjust,
+  aarch64_allocate_and_probe_stack_space (tmp0_rtx, tmp1_rtx, initial_adjust,
 	  true, false);
 
   if (callee_adjust != 0)
@@ -5346,7 +5346,7 @@ aarch64_expand_prologue (void)
 	}
   aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
 			  stack_pointer_rtx, callee_offset,
-			  ip1_rtx, ip0_rtx, frame_pointer_needed);
+			  tmp1_rtx, tmp0_rtx, frame_pointer_needed);
   if (frame_pointer_needed && !frame_size.is_constant ())
 	{
 	  /* Variable-sized frames need to describe the save slot
@@ -5388,7 +5388,7 @@ aarch64_expand_prologue (void)
 
   /* We may need to probe the final adjustment if it is larger than the guard
  that is assumed by the called.  */
-  aarch64_allocate_and_probe_stack_space (ip1_rtx, ip0_rtx, final_adjust,
+  aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, final_adjust,
 	  !frame_pointer_needed, true);
 }
 
@@ -5426,8 +5426,8 @@ aarch64_expand_epilogue (bool for_sibcall)
   unsigned reg2 = cfun->machine->frame.wb_candidate2;
   rtx cfi_ops = NULL;
   rtx_insn *insn;
-  /* A stack clash protection prologue may not have left IP0_REGNUM or
- IP1_REGNUM in a usable state.  The same is true for allocations
+  /* A stack clash protection prologue may not have left EP0_REGNUM or
+ EP1_REGNUM in a usable state.  The same is true for allocations
  with an SVE component, since we then need both temporary

[PATCH, GCC, AARCH64, 2/6] Add new arch command line feaures from ARMv8.5-A

2018-11-02 Thread Sudakshina Das

Hi

This patch is part of a series that enables ARMv8.5-A in GCC and
adds Branch Target Identification Mechanism.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)

This patch add all the command line feature that are added by ARMv8.5.
Optional extensions to armv8.5-a:
+rng : Random number Generation Instructions.
+memtag : Memory Tagging Extension.

ARMv8.5-A features that are optional to older arch:
+sb : Speculation barrier instruction.
+ssbs: Speculative Store Bypass Safe instruction.
+predres: Execution and Data Prediction Restriction instructions.

All of the above only effect the assembler and have already (or almost
for a couple of cases) gone in the trunk of binutils.

Bootstrapped and regression tested with aarch64-none-linux-gnu.

Is this ok for trunk?

Thanks
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/aarch64/aarch64-option-extensions.def: Define
AARCH64_OPT_EXTENSION for memtag, rng, sb, ssbs and predres.
* gcc/config/aarch64/aarch64.h (AARCH64_FL_RNG): New.
(AARCH64_FL_MEMTAG, ARCH64_FL_SB, AARCH64_FL_SSBS): New.
(AARCH64_FL_PREDRES): New.
(AARCH64_FL_FOR_ARCH8_5): Add AARCH64_FL_SB, AARCH64_FL_SSBS and
AARCH64_FL_PREDRES by default.
* gcc/doc/invoke.texi: Document rng, memtag, sb, ssbs and
predres.

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 69ab796a4e1a959b89ebb55b599919c442cfb088..ed669a63061ba5e1595840943176077af7e69988 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -108,4 +108,19 @@ AARCH64_OPT_EXTENSION("sve", AARCH64_FL_SVE, AARCH64_FL_FP | AARCH64_FL_SIMD | A
 /* Enabling/Disabling "profile" does not enable/disable any other feature.  */
 AARCH64_OPT_EXTENSION("profile", AARCH64_FL_PROFILE, 0, 0, "")
 
+/* Enabling/Disabling "rng" only changes "rng".  */
+AARCH64_OPT_EXTENSION("rng", AARCH64_FL_RNG, 0, 0, "")
+
+/* Enabling/Disabling "memtag" only changes "memtag".  */
+AARCH64_OPT_EXTENSION("memtag", AARCH64_FL_MEMTAG, 0, 0, "")
+
+/* Enabling/Disabling "sb" only changes "sb".  */
+AARCH64_OPT_EXTENSION("sb", AARCH64_FL_SB, 0, 0, "")
+
+/* Enabling/Disabling "ssbs" only changes "ssbs".  */
+AARCH64_OPT_EXTENSION("ssbs", AARCH64_FL_SSBS, 0, 0, "")
+
+/* Enabling/Disabling "predres" only changes "predres".  */
+AARCH64_OPT_EXTENSION("predres", AARCH64_FL_PREDRES, 0, 0, "")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b324cdd2fede33af13c03362750401f9eb1c9a90..60325bb1b16c71e951ef18319872e8b0911e8d12 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -172,10 +172,22 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_RCPC8_4(1 << 20)  /* Has ARMv8.4-a RCPC extensions.  */
 /* ARMv8.5-A architecture extensions.  */
 #define AARCH64_FL_V8_5	  (1 << 22)  /* Has ARMv8.5-A features.  */
+#define AARCH64_FL_RNG	  (1 << 23)  /* ARMv8.5-A Random Number Insns.  */
+#define AARCH64_FL_MEMTAG (1 << 24)  /* ARMv8.5-A Memory Tagging
+	Extensions.  */
 
 /* Statistical Profiling extensions.  */
 #define AARCH64_FL_PROFILE(1 << 21)
 
+/* Speculation Barrier instruction supported.  */
+#define AARCH64_FL_SB	  (1 << 25)
+
+/* Speculative Store Bypass Safe instruction supported.  */
+#define AARCH64_FL_SSBS	  (1 << 26)
+
+/* Execution and Data Prediction Restriction instructions supported.  */
+#define AARCH64_FL_PREDRES(1 << 27)
+
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
 
@@ -195,7 +207,8 @@ extern unsigned aarch64_architecture_version;
   (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \
| AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4)
 #define AARCH64_FL_FOR_ARCH8_5			\
-  (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5)
+  (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5	\
+   | AARCH64_FL_SB | AARCH64_FL_SSBS | AARCH64_FL_PREDRES)
 
 /* Macros to test ISA flags.  */
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0cf568b60dfb0fb260ca3708ea2d7e081d20cc8b..cc7420f3a84f9cd527c582114a9a96f406b63699 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15287,6 +15287,27 @@ Use of this option with architectures prior to Armv8.2-A is not supported.
 @item profile
 Enable the Statistical Profiling extension.  This option is only to enable the
 extension at the assembler level and does not affect code generation.
+@item rng
+Enable the Armv8.5-a Random Number instructions.  This option is only to
+enable the ex

[PATCH, GCC, AARCH64, 1/6] Enable ARMv8.5-A in gcc

2018-11-02 Thread Sudakshina Das

Hi

This patch is part of a series that enables ARMv8.5-A in GCC and
adds Branch Target Identification Mechanism.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)

This patch add the march option for armv8.5-a.

Bootstrapped and regression tested with aarch64-none-linux-gnu.
Is this ok for trunk?

Thanks
Sudi


*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/aarch64/aarch64-arches.def: Define AARCH64_ARCH for
ARMv8.5-A.
* gcc/config/aarch64/aarch64.h (AARCH64_FL_V8_5): New.
(AARCH64_FL_FOR_ARCH8_5, AARCH64_ISA_V8_5): New.
* gcc/doc/invoke.texi: Document ARMv8.5-A.

diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index a37a5553894d6ab1d629017ea204478f69d8773d..7d05cd604093d15f27e5b197803a50c45a260e6e 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -35,5 +35,6 @@ AARCH64_ARCH("armv8.1-a", generic,	 8_1A,	8,  AARCH64_FL_FOR_ARCH8_1)
 AARCH64_ARCH("armv8.2-a", generic,	 8_2A,	8,  AARCH64_FL_FOR_ARCH8_2)
 AARCH64_ARCH("armv8.3-a", generic,	 8_3A,	8,  AARCH64_FL_FOR_ARCH8_3)
 AARCH64_ARCH("armv8.4-a", generic,	 8_4A,	8,  AARCH64_FL_FOR_ARCH8_4)
+AARCH64_ARCH("armv8.5-a", generic,	 8_5A,	8,  AARCH64_FL_FOR_ARCH8_5)
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index fa9af26fd40fd23b1c9cd6da9b6300fd77089103..b324cdd2fede33af13c03362750401f9eb1c9a90 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -170,6 +170,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_SHA3	  (1 << 18)  /* Has ARMv8.4-a SHA3 and SHA512.  */
 #define AARCH64_FL_F16FML (1 << 19)  /* Has ARMv8.4-a FP16 extensions.  */
 #define AARCH64_FL_RCPC8_4(1 << 20)  /* Has ARMv8.4-a RCPC extensions.  */
+/* ARMv8.5-A architecture extensions.  */
+#define AARCH64_FL_V8_5	  (1 << 22)  /* Has ARMv8.5-A features.  */
 
 /* Statistical Profiling extensions.  */
 #define AARCH64_FL_PROFILE(1 << 21)
@@ -192,6 +194,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8_4			\
   (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \
| AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4)
+#define AARCH64_FL_FOR_ARCH8_5			\
+  (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5)
 
 /* Macros to test ISA flags.  */
 
@@ -213,6 +217,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_SHA3	   (aarch64_isa_flags & AARCH64_FL_SHA3)
 #define AARCH64_ISA_F16FML	   (aarch64_isa_flags & AARCH64_FL_F16FML)
 #define AARCH64_ISA_RCPC8_4	   (aarch64_isa_flags & AARCH64_FL_RCPC8_4)
+#define AARCH64_ISA_V8_5	   (aarch64_isa_flags & AARCH64_FL_V8_5)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 06a00a29de73aa509b6a15ebb34dfc182cf94cd2..c76c4fc223f9c46e517213eb6ad292c70aa1c89f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15097,8 +15097,11 @@ more feature modifiers.  This option has the form
 @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
 
 The permissible values for @var{arch} are @samp{armv8-a},
-@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @samp{armv8.4-a}
-or @var{native}.
+@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a}, @samp{armv8.4-a},
+@samp{armv8.5-a} or @var{native}.
+
+The value @samp{armv8.5-a} implies @samp{armv8.4-a} and enables compiler
+support for the ARMv8.5-A architecture extensions.
 
 The value @samp{armv8.4-a} implies @samp{armv8.3-a} and enables compiler
 support for the ARMv8.4-A architecture extensions.

Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.

2018-08-14 Thread Sudakshina Das


Hi Tamar

On 13/08/18 17:27, Tamar Christina wrote:

Hi Thomas,

Thanks for the review.

I’ll correct the typo before committing if I have no other changes required by 
a maintainer.

Regards,
Tamar.



I am not a maintainer but I would like to point out something in your
patch. I think you test case will fail with -mabi=ilp32

FAIL: gcc.target/aarch64/large_struct_copy_2.c (test for excess errors)
Excess errors:
/work/trunk/src/gcc/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c:18:27: 
warning: overflow in conversion from 'long

long int' to 'long int' changes value from '4073709551611' to
'2080555003' [-Woverflow]

We have had more such recent failures and James gave a very neat
way to make sure the mode comes out what you intend it to here:
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00233.html

I would just ask you to change the data types accordingly and test it
with -mabi=ilp32.

Thanks
Sudi


From: Thomas Preudhomme 
Sent: Monday, August 13, 2018 14:37
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh 
; Richard Earnshaw ; Marcus Shawcroft 

Subject: Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.

Hi Tamar,

Thanks for your patch.

Just one comment about your ChangeLog entry for the testsuiet change: shouldn't 
it mention that it is a new testcase? The patch you attached seems to create 
the file.

Best regards,

Thomas

On Mon, 13 Aug 2018 at 10:33, Tamar Christina 
mailto:tamar.christ...@arm.com>> wrote:
Hi All,

On AArch64 we have integer modes larger than TImode, and while we can generate
moves for these they're not as efficient.

So instead make sure we limit the maximum we can copy to TImode.  This means
copying a 16 byte struct will issue 1 TImode copy, which will be done using a
single STP as we expect but an CImode sized copy won't issue CImode operations.

Bootstrapped and regtested on aarch4-none-linux-gnu and no issues.
Crosstested aarch4_be-none-elf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-13  Tamar Christina  
mailto:tamar.christ...@arm.com>>

 * config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max.

gcc/testsuite/
2018-08-13  Tamar Christina  
mailto:tamar.christ...@arm.com>>

 * gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan.

--

Re: [PATCH][GCC][AARCH64] Use STLUR for atomic_store

2018-08-03 Thread Sudakshina Das


Hi Matthew

On 02/08/18 17:26, matthew.malcom...@arm.com wrote:

Use the STLUR instruction introduced in Armv8.4-a.
This insruction has the store-release semantic like STLR but can take a
9-bit unscaled signed immediate offset.

Example test case:
```
void
foo ()
{
 int32_t *atomic_vals = calloc (4, sizeof (int32_t));
 atomic_store_explicit (atomic_vals + 1, 2, memory_order_release);
}
```

Before patch generates
```
foo:
stp x29, x30, [sp, -16]!
mov x1, 4
mov x0, x1
mov x29, sp
bl  calloc
mov w1, 2
add x0, x0, 4
stlrw1, [x0]
ldp x29, x30, [sp], 16
ret
```

After patch generates
```
foo:
stp x29, x30, [sp, -16]!
mov x1, 4
mov x0, x1
mov x29, sp
bl  calloc
mov w1, 2
stlur   w1, [x0, 4]
ldp x29, x30, [sp], 16
ret
```

Full bootstrap and regression test done on aarch64.

Ok for trunk?

gcc/
2018-07-26  Matthew Malcomson  

 * config/aarch64/aarch64-protos.h
 (aarch64_offset_9bit_signed_unscaled_p): New declaration.
 * config/aarch64/aarch64.c
 (aarch64_offset_9bit_signed_unscaled_p): Rename from
 offset_9bit_signed_unscaled_p.
 * config/aarch64/aarch64.h (TARGET_ARMV8_4): Add feature macro.
 * config/aarch64/atomics.md (atomic_store): Allow offset
 and use stlur.
 * config/aarch64/constraints.md (Ust): New constraint.
 * config/aarch64/predicates.md.
 (aarch64_sync_or_stlur_memory_operand): New predicate.

gcc/testsuite/
2018-07-26  Matthew Malcomson  

* gcc.target/aarch64/atomic-store.c: New.



Thank you for doing this. I am not a maintainer but I have a few nits on
this patch:

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
af5db9c595385f7586692258f750b6aceb3ed9c8..630a75bf776fcdc374aa9ffa4bb020fea3719320 
100644

--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -393,6 +393,7 @@ void aarch64_split_add_offset (scalar_int_mode, rtx, 
rtx, rtx, rtx, rtx);

 bool aarch64_mov_operand_p (rtx, machine_mode);
...
-static inline bool
-offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
+bool
+aarch64_offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
   poly_int64 offset)

This needs to be aligned with the first argument

...

@@ -5837,7 +5837,7 @@ aarch64_classify_address (struct 
aarch64_address_info *info,

 ldr/str instructions (only big endian will get here).  */
  if (mode == CImode)
return (aarch64_offset_7bit_signed_scaled_p (TImode, offset)
-   && (offset_9bit_signed_unscaled_p (V16QImode, offset + 32)
+   && (aarch64_offset_9bit_signed_unscaled_p (V16QImode, 
offset + 32)

This is not less that 80 characters

...

+;; STLUR instruction constraint requires Armv8.4
+(define_special_memory_constraint "Ust"
+ "@internal
+ A memory address suitable for use with an stlur instruction."
+  (and (match_operand 0 "aarch64_sync_or_stlur_memory_operand")
+   (match_test "TARGET_ARMV8_4")))
+

You are already checking for TARGET_ARMV8_4 inside
aarch64_sync_or_stlur_memory_operand. Also see my comment below for this
function.

...

+;; True if the operand is memory reference valid for one of a str or stlur
+;; operation.
+(define_predicate "aarch64_sync_or_stlur_memory_operand"
+  (ior (match_operand 0 "aarch64_sync_memory_operand")
+   (and (match_operand 0 "memory_operand")
+   (match_code "plus" "0")
+   (match_code "reg" "00")
+   (match_code "const_int" "01")))
+{
+  if (aarch64_sync_memory_operand (op, mode))
+return true;
+
+  if (!TARGET_ARMV8_4)
+return false;
+
+  rtx mem_op = XEXP (op, 0);
+  rtx plus_op0 = XEXP (mem_op, 0);
+  rtx plus_op1 = XEXP (mem_op, 1);
+
+  if (GET_MODE (plus_op0) != DImode)
+return false;
+
+  poly_int64 offset;
+  poly_int_rtx_p (plus_op1, );
+  return aarch64_offset_9bit_signed_unscaled_p (mode, offset);
+})
+

This predicate body makes it a bit mixed up with the two type of
operands that you want to test especially looking at it from the
constraint check perspective. I am assuming you would not want to use
the non-immediate form of stlur and instead only use it in the
form:
STLUR , [, #]
and use stlr for no immediate alternative. Thus the constraint does not
need to check for aarch64_sync_memory_operand.

My suggestion would be to make this operand check separate. Something
like:

+(define_predicate "aarch64_sync_or_stlur_memory_operand"
+  (ior (match_operand 0 "aarch64_sync_memory_operand")
+   (match_operand 0 "aarch64_stlur_memory_operand")))

Where you define aarch64_stlur_memory_operand as

+bool aarch64_stlur_memory_operand (rtx op)
+{
+  if (!TARGET_ARMV8_4)
+return false;
+
+  rtx mem_op = XEXP

Re: [PATCH][GCC] Correct name of file in ChangeLog

2018-08-02 Thread Sudakshina Das


Hi Matthew

On 01/08/18 10:25, matthew.malcom...@arm.com wrote:

My first patch included an incorrect ChangeLog entry -- the filename was
misspelt. This corrects it.



I think this counts as an obvious change. I have committed this on your 
behalf.


Thanks
Sudi

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-08-01 Thread Sudakshina Das


Hi

On 31/07/18 22:48, Andrew Pinski wrote:

On Tue, Jul 31, 2018 at 2:43 PM James Greenhalgh
 wrote:


On Thu, Jul 12, 2018 at 12:01:09PM -0500, Sudakshina Das wrote:

Hi Eric

On 27/06/18 12:22, Wilco Dijkstra wrote:

Eric Botcazou wrote:


This test can easily be changed not to use optimize since it doesn't look
like it needs it. We really need to tests these builtins properly,
otherwise they will continue to fail on most targets.


As far as I can see PR target/84521 has been reported only for Aarch64 so I'd
just leave the other targets alone (and avoid propagating FUD if possible).


It's quite obvious from PR84521 that this is an issue affecting all targets.
Adding better generic tests for __builtin_setjmp can only be a good thing.

Wilco



This conversation seems to have died down and I would like to
start it again. I would agree with Wilco's suggestion about
keeping the test in the generic folder. I have removed the
optimize attribute and the effect is still the same. It passes
on AArch64 with this patch and it currently fails on x86
trunk (gcc version 9.0.0 20180712 (experimental) (GCC))
on -O1 and above.



I don't see where the FUD comes in here; either this builtin has a defined
semantics across targets and they are adhered to, or the builtin doesn't have
well defined semantics, or the targets fail to implement those semantics.


The problem comes from the fact the builtins are not documented at all.
See PR59039 for the issue on them not being documented.



Thanks @James for bringing this up again.
I tried to revive the conversation on PR59039 while working on this as
well but that conversation mainly focused on documenting if we are
allowed to use __builtin_setjmp and __builtin_longjmp on the same
function and with the same jmp buffer or not. This patch and this test
case however does not involve that issue. There are other holes in the
documentation/implementation of these builtins. For now as advised by
James, I have posted the test case on the PR. I personally don't see why
this test case should go on the AArch64 tests when it clearly fails on
other targets as well. But if we can not come to an agreement on that, I
am willing to move it to AArch64 tests and maybe open a new bug report
which is not marked as "target" with the same test case.

Thanks
Sudi


Thanks,
Andrew




I think this should go in as is. If other targets are unhappy with the
failing test they should fix their target or skip the test if it is not
appropriate.

You may want to CC some of the maintainers of platforms you know to fail as
a courtesy on the PR (add your testcase, and add failing targets and their
maintainers to that PR) before committing so it doesn't come as a complete
surprise.

This is OK with some attempt to get target maintainers involved in the
conversation before commit.

Thanks,
James


diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index f284e74..9792d28 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version;
  #define EH_RETURN_STACKADJ_RTX   gen_rtx_REG (Pmode, R4_REGNUM)
  #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()

-/* Don't use __builtin_setjmp until we've defined it.  */
+/* Don't use __builtin_setjmp until we've defined it.
+   CAUTION: This macro is only used during exception unwinding.
+   Don't fall for its name.  */
  #undef DONT_USE_BUILTIN_SETJMP
  #define DONT_USE_BUILTIN_SETJMP 1

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 01f35f8..4266a3d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3998,7 +3998,7 @@ static bool
  aarch64_needs_frame_chain (void)
  {
/* Force a frame chain for EH returns so the return address is at FP+8.  */
-  if (frame_pointer_needed || crtl->calls_eh_return)
+  if (frame_pointer_needed || crtl->calls_eh_return || 
cfun->has_nonlocal_label)
  return true;

/* A leaf function cannot have calls or write LR.  */
@@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
nextarg ATTRIBUTE_UNUSED)
expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
  }

+/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
+static rtx
+aarch64_builtin_setjmp_frame_value (void)
+{
+  return hard_frame_pointer_rtx;
+}
+
  /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */

  static tree
@@ -17744,6 +17751,9 @@ aarch64_run_selftests (void)
  #undef TARGET_FOLD_BUILTIN
  #define TARGET_FOLD_BUILTIN aarch64_fold_builtin

+#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
+#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
+
  #undef TARGET_FUNCTION_ARG
  #define TARGET_FUNCTION_ARG aarch64_function_arg

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..d5f33d8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6087,6 +6087,30 @@
DONE;
  })

Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-08-01 Thread Sudakshina Das


Hi Sam

On 01/08/18 10:12, Sam Tebbs wrote:



On 07/31/2018 11:16 PM, James Greenhalgh wrote:

On Thu, Jul 26, 2018 at 11:52:15AM -0500, Sam Tebbs wrote:




Thanks for making the changes and adding more test cases. I do however
see that you are only covering 2 out of 4 new
*aarch64_get_lane_zero_extenddi<> patterns. The
*aarch64_get_lane_zero_extendsi<> were already existing. I don't mind
those tests. I would just ask you to add the other two new patterns
as well. Also since the different versions of the instruction generate
same instructions (like foo_16qi and foo_8qi both give out the same
instruction), I would suggest using a -fdump-rtl-final (or any relevant
rtl dump) with the dg-options and using a scan-rtl-dump to scan the
pattern name. Something like:
/* { dg-do compile } */
/* { dg-options "-O3 -fdump-rtl-final" } */
...
...
/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi"
"final" } } */

Thanks
Sudi

Hi Sudi,

Thanks again. Here's an update that adds 4 more tests, so all 8 patterns
generated are now tested for!
This is OK for trunk, thanks for the patch (and thanks Sudi for the 
review!)


Thanks,
James


Thank you James! I'd appreciate it if someone could commit it as I don't 
have commit rights yet.




I have committed this on your behalf as r263200.

Thanks
Sudi


Sam




Below is the updated changelog

gcc/
2018-07-26  Sam Tebbs  

      * config/aarch64/aarch64-simd.md
      (*aarch64_get_lane_zero_extendsi):
      Rename to...
(*aarch64_get_lane_zero_extend): ... This.
      Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-26  Sam Tebbs  

      * gcc.target/aarch64/extract_zero_extend.c: New file

Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-27 Thread Sudakshina Das


Hi Sam

On 25/07/18 14:08, Sam Tebbs wrote:

On 07/23/2018 05:01 PM, Sudakshina Das wrote:

Hi Sam


On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:

Hi all,

This patch extends the aarch64_get_lane_zero_extendsi instruction 
definition to

also cover DI mode. This prevents a redundant AND instruction from being
generated due to the pattern failing to be matched.

Example:

typedef char v16qi __attribute__ ((vector_size (16)));

unsigned long long
foo (v16qi a)
{
  return a[0];
}

Previously generated:

foo:
    umov    w0, v0.b[0]
    and x0, x0, 255
    ret

And now generates:

foo:
    umov    w0, v0.b[0]
    ret

Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf 
with no

regressions.

gcc/
2018-07-23  Sam Tebbs 

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-23  Sam Tebbs 

    * gcc.target/aarch64/extract_zero_extend.c: New file

You will need an approval from a maintainer, but I would only add one 
request to this:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
   [(set_attr "type" "neon_to_gp")]
 )

-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-    (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI

Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for 
each of them.


Thanks
Sudi


Hi Sudi,

Thanks for the feedback. Here is an updated patch that adds more 
testcases to cover the patterns generated by the different mode 
combinations. The changelog and description from my original email still 
apply.




Thanks it looks good to me! You will still need a maintainer to approve.

Sudi



   (vec_select:
     (match_operand:VDQQH 1 "register_operand" "w")
     (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));

+    operands[2] = aarch64_endian_lane_rtx (mode,
+                       INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]

Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-25 Thread Sudakshina Das


Hi Sam

On 25/07/18 14:08, Sam Tebbs wrote:

On 07/23/2018 05:01 PM, Sudakshina Das wrote:

Hi Sam


On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:

Hi all,

This patch extends the aarch64_get_lane_zero_extendsi instruction 
definition to

also cover DI mode. This prevents a redundant AND instruction from being
generated due to the pattern failing to be matched.

Example:

typedef char v16qi __attribute__ ((vector_size (16)));

unsigned long long
foo (v16qi a)
{
  return a[0];
}

Previously generated:

foo:
    umov    w0, v0.b[0]
    and x0, x0, 255
    ret

And now generates:

foo:
    umov    w0, v0.b[0]
    ret

Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf 
with no

regressions.

gcc/
2018-07-23  Sam Tebbs 

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-23  Sam Tebbs 

    * gcc.target/aarch64/extract_zero_extend.c: New file

You will need an approval from a maintainer, but I would only add one 
request to this:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
   [(set_attr "type" "neon_to_gp")]
 )

-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-    (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI

Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for 
each of them.


Thanks
Sudi


Hi Sudi,

Thanks for the feedback. Here is an updated patch that adds more 
testcases to cover the patterns generated by the different mode 
combinations. The changelog and description from my original email still 
apply.




Thanks for making the changes and adding more test cases. I do however
see that you are only covering 2 out of 4 new
*aarch64_get_lane_zero_extenddi<> patterns. The
*aarch64_get_lane_zero_extendsi<> were already existing. I don't mind
those tests. I would just ask you to add the other two new patterns
as well. Also since the different versions of the instruction generate
same instructions (like foo_16qi and foo_8qi both give out the same
instruction), I would suggest using a -fdump-rtl-final (or any relevant
rtl dump) with the dg-options and using a scan-rtl-dump to scan the
pattern name. Something like:
/* { dg-do compile } */
/* { dg-options "-O3 -fdump-rtl-final" } */
...
...
/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" 
"final" } } */


Thanks
Sudi



   (vec_select:
     (match_operand:VDQQH 1 "register_operand" "w")
     (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));

+    operands[2] = aarch64_endian_lane_rtx (mode,
+                       INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]

Re: [PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64

2018-07-23 Thread Sudakshina Das


Hi Vlad


On Friday 20 July 2018 10:37 AM, Vlad Lazar wrote:

Hi,

The patch adds implementations for the NEON intrinsics vabsd_s64 and 
vnegd_s64.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification) 



Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
regressions.


OK for trunk?


Thanks for doing this. This looks good to me but you will a maintainer's 
approval.


Thanks
Sudi


Thanks,
Vlad

gcc/
2018-07-02  Vlad Lazar  

* config/aarch64/arm_neon.h (vabsd_s64, vnegd_s64): New.

gcc/testsuite/
2018-07-02  Vlad Lazar  

* gcc.target/aarch64/scalar_intrinsics.c (test_vabsd_s64, 
test_vabsd_s64): New.


---

diff --git a/gcc/config/aarch64/arm_neon.h 
b/gcc/config/aarch64/arm_neon.h
index 
2d18400040f031dfcdaf60269ad484647804e1be..19e22431a85bcd09d0ea759b42b0a52420b6c43c 
100644

--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11822,6 +11822,13 @@ vabsq_s64 (int64x2_t __a)
   return __builtin_aarch64_absv2di (__a);
 }

+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vabsd_s64 (int64_t __a)
+{
+  return __builtin_aarch64_absdi (__a);
+}
+
 /* vadd */

 __extension__ extern __inline int64_t
@@ -22907,6 +22914,12 @@ vneg_s64 (int64x1_t __a)
   return -__a;
 }

+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vnegd_s64 (int64_t __a)
+{
+  return -__a;
+}
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vnegq_f32 (float32x4_t __a)
diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c 
b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
index 
ea29066e369b967d0781d31c8a5208bda9e4f685..45afeec373971838e0cd107038b4aa51a2d4998f 
100644

--- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
+++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
@@ -603,6 +603,14 @@ test_vsqaddd_u64 (uint64_t a, int64_t b)
   return vsqaddd_u64 (a, b);
 }

+/* { dg-final { scan-assembler-times "\\tabs\\td\[0-9\]+" 1 } } */
+
+int64_t
+test_vabsd_s64 (int64_t a)
+{
+  return vabsd_s64 (a);
+}
+
 /* { dg-final { scan-assembler-times "\\tsqabs\\tb\[0-9\]+" 1 } } */

 int8_t
@@ -627,6 +635,14 @@ test_vqabss_s32 (int32_t a)
   return vqabss_s32 (a);
 }

+/* { dg-final { scan-assembler-times "\\tneg\\tx\[0-9\]+" 1 } } */
+
+int64_t
+test_vnegd_s64 (int64_t a)
+{
+  return vnegd_s64 (a);
+}
+
 /* { dg-final { scan-assembler-times "\\tsqneg\\tb\[0-9\]+" 1 } } */

 int8_t

Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-23 Thread Sudakshina Das


Hi Sam


On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:

Hi all,

This patch extends the aarch64_get_lane_zero_extendsi instruction 
definition to

also cover DI mode. This prevents a redundant AND instruction from being
generated due to the pattern failing to be matched.

Example:

typedef char v16qi __attribute__ ((vector_size (16)));

unsigned long long
foo (v16qi a)
{
  return a[0];
}

Previously generated:

foo:
    umov    w0, v0.b[0]
    and x0, x0, 255
    ret

And now generates:

foo:
    umov    w0, v0.b[0]
    ret

Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf 
with no

regressions.

gcc/
2018-07-23  Sam Tebbs 

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-23  Sam Tebbs 

    * gcc.target/aarch64/extract_zero_extend.c: New file

You will need an approval from a maintainer, but I would only add one 
request to this:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
   [(set_attr "type" "neon_to_gp")]
 )

-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-    (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI

Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for each 
of them.


Thanks
Sudi

   (vec_select:
     (match_operand:VDQQH 1 "register_operand" "w")
     (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));

+    operands[2] = aarch64_endian_lane_rtx (mode,
+                       INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]

Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-07-16 Thread Sudakshina Das


Hi Sam

On 13/07/18 17:09, Sam Tebbs wrote:

Hi all,

This patch adds an optimisation that exploits the AArch64 BFXIL instruction
when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0x) | (b & 0x)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long b) {
   return (a & 0xll) | (b & 0xll);
}

void read2(unsigned long long a, unsigned long long b, unsigned long long *c,
   unsigned long long *d) {
   *c = combine(a, b); *d = combine(b, a);
}

When compiled with -O2, read2 would result in:

read2:
   and   x5, x1, #0x
   and   x4, x0, #0x
   orr   x4, x4, x5
   and   x1, x1, #0x
   and   x0, x0, #0x
   str   x4, [x2]
   orr   x0, x0, x1
   str   x0, [x3]
   ret

But with this patch results in:

read2:
   mov   x4, x1
   bfxil x4, x0, 0, 32
   str   x4, [x2]
   bfxil x0, x1, 0, 32
   str   x0, [x3]
   ret
   
Bootstrapped and regtested on aarch64-none-linux-gnu and aarch64-none-elf with no regressions.


I am not a maintainer but I have a question about this patch. I may be 
missing something or reading

it wrong. So feel free to point it out:

+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+            (match_operand 3 "const_int_operand"))
+        (and:DI (match_operand:DI 2 "register_operand" "0")
+            (match_operand 4 "const_int_operand"]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+    && aarch64_is_left_consecutive (INTVAL (operands[3]))"
+  {
+    HOST_WIDE_INT op4 = INTVAL (operands[4]);
+    operands[3] = GEN_INT (64 - ceil_log2 (op4));
+    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);

In the BFXIL you are reading %3 LSB bits from operand 1 and putting it 
in the LSBs of %0.

This means that the pattern should be masking the 32-%3 MSB of %0 and
%3 LSB of %1. So shouldn't operand 4 is LEFT_CONSECUTIVE>

Can you please compare a simpler version of the above example you gave to
make sure the generated assembly is equivalent before and after the patch:

void read2(unsigned long long a, unsigned long long b, unsigned long long *c) {
  *c = combine(a, b);
}


From the above text

read2:
  and   x5, x1, #0x
  and   x4, x0, #0x
  orr   x4, x4, x5

read2:
  mov   x4, x1
  bfxil x4, x0, 0, 32

This does not seem equivalent to me.

Thanks
Sudi

+    return "";
+  }
+  [(set_attr "type" "bfx")]
+)

gcc/
2018-07-11  Sam Tebbs  

     * config/aarch64/aarch64.md (*aarch64_bfxil, *aarch64_bfxil_alt):
     Define.
     * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
     Define.
     * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New function.

gcc/testsuite
2018-07-11  Sam Tebbs  

     * gcc.target/aarch64/combine_bfxil.c: New file.
     * gcc.target/aarch64/combine_bfxil_2.c: New file.

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-07-12 Thread Sudakshina Das


Hi Eric

On 27/06/18 12:22, Wilco Dijkstra wrote:

Eric Botcazou wrote:


This test can easily be changed not to use optimize since it doesn't look
like it needs it. We really need to tests these builtins properly,
otherwise they will continue to fail on most targets.


As far as I can see PR target/84521 has been reported only for Aarch64 so I'd
just leave the other targets alone (and avoid propagating FUD if possible).


It's quite obvious from PR84521 that this is an issue affecting all targets.
Adding better generic tests for __builtin_setjmp can only be a good thing.

Wilco



This conversation seems to have died down and I would like to
start it again. I would agree with Wilco's suggestion about
keeping the test in the generic folder. I have removed the
optimize attribute and the effect is still the same. It passes
on AArch64 with this patch and it currently fails on x86
trunk (gcc version 9.0.0 20180712 (experimental) (GCC))
on -O1 and above.

Thanks
Sudi
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index f284e74..9792d28 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version;
 #define EH_RETURN_STACKADJ_RTX	gen_rtx_REG (Pmode, R4_REGNUM)
 #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
 
-/* Don't use __builtin_setjmp until we've defined it.  */
+/* Don't use __builtin_setjmp until we've defined it.
+   CAUTION: This macro is only used during exception unwinding.
+   Don't fall for its name.  */
 #undef DONT_USE_BUILTIN_SETJMP
 #define DONT_USE_BUILTIN_SETJMP 1
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 01f35f8..4266a3d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3998,7 +3998,7 @@ static bool
 aarch64_needs_frame_chain (void)
 {
   /* Force a frame chain for EH returns so the return address is at FP+8.  */
-  if (frame_pointer_needed || crtl->calls_eh_return)
+  if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label)
 return true;
 
   /* A leaf function cannot have calls or write LR.  */
@@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
   expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
 }
 
+/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
+static rtx
+aarch64_builtin_setjmp_frame_value (void)
+{
+  return hard_frame_pointer_rtx;
+}
+
 /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
 
 static tree
@@ -17744,6 +17751,9 @@ aarch64_run_selftests (void)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN aarch64_fold_builtin
 
+#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
+#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
+
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG aarch64_function_arg
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..d5f33d8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6087,6 +6087,30 @@
   DONE;
 })
 
+;; This is broadly similar to the builtins.c except that it uses
+;; temporaries to load the incoming SP and FP.
+(define_expand "nonlocal_goto"
+  [(use (match_operand 0 "general_operand"))
+   (use (match_operand 1 "general_operand"))
+   (use (match_operand 2 "general_operand"))
+   (use (match_operand 3 "general_operand"))]
+  ""
+{
+rtx label_in = copy_to_reg (operands[1]);
+rtx fp_in = copy_to_reg (operands[3]);
+rtx sp_in = copy_to_reg (operands[2]);
+
+emit_move_insn (hard_frame_pointer_rtx, fp_in);
+emit_stack_restore (SAVE_NONLOCAL, sp_in);
+
+emit_use (hard_frame_pointer_rtx);
+emit_use (stack_pointer_rtx);
+
+emit_indirect_jump (label_in);
+
+DONE;
+})
+
 ;; Helper for aarch64.c code.
 (define_expand "set_clobber_cc"
   [(parallel [(set (match_operand 0)
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
new file mode 100644
index 000..564ef14
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target indirect_jumps } */
+
+#include 
+#include 
+#include 
+
+jmp_buf buf;
+
+int uses_longjmp (void)
+{
+  jmp_buf buf2;
+  memcpy (buf2, buf, sizeof (buf));
+  __builtin_longjmp (buf2, 1);
+}
+
+int gl;
+void after_longjmp (void)
+{
+  gl = 5;
+}
+
+int
+test_1 (int n)
+{
+  volatile int *p = alloca (n);
+  if (__builtin_setjmp (buf))
+{
+  after_longjmp ();
+}
+  else
+{
+  uses_longjmp ();
+}
+
+  return 0;
+}
+
+int
+test_2 (int n)
+{
+  int i;
+  int *ptr = (int *)__builtin_alloca (sizeof (int) * n);
+  for (i = 0; i < n; i++)
+ptr[i] = i;
+  test_1 (n);
+  return 0;
+}
+
+int main (int argc, const char **argv)
+{
+  __builtin_memset (, 0xaf, sizeof (buf));
+  test_2 (100);
+}

Re: [PATCH][GCC][AARCH64] Canonicalize aarch64 widening simd plus insns

2018-07-12 Thread Sudakshina Das


Hi Matthew

On 12/07/18 11:18, Richard Sandiford wrote:

Looks good to me FWIW (not a maintainer), just a minor formatting thing:

Matthew Malcomson  writes:

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
aac5fa146ed8dde4507a0eb4ad6a07ce78d2f0cd..67b29cbe2cad91e031ee23be656ec61a403f2cf9
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3302,38 +3302,78 @@
DONE;
  })
  
-(define_insn "aarch64_w"

+(define_insn "aarch64_subw"
[(set (match_operand: 0 "register_operand" "=w")
-(ADDSUB: (match_operand: 1 "register_operand" "w")
-   (ANY_EXTEND:
- (match_operand:VD_BHSI 2 "register_operand" "w"]
+   (minus:
+(match_operand: 1 "register_operand" "w")
+(ANY_EXTEND:
+  (match_operand:VD_BHSI 2 "register_operand" "w"]


The (minus should be under the "(match_operand":

(define_insn "aarch64_subw"
   [(set (match_operand: 0 "register_operand" "=w")
(minus: (match_operand: 1 "register_operand" "w")
   (ANY_EXTEND:
 (match_operand:VD_BHSI 2 "register_operand" "w"]

Same for the other patterns.

Thanks,
Richard



You will need a maintainer's approval but this looks good to me.
Thanks for doing this. I would only point out one other nit which you
can choose to ignore:

+/* Ensure
+   saddw2 and one saddw for the function add()
+   ssubw2 and one ssubw for the function subtract()
+   uaddw2 and one uaddw for the function uadd()
+   usubw2 and one usubw for the function usubtract() */
+
+/* { dg-final { scan-assembler-times "\[ \t\]ssubw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]ssubw\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]saddw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]saddw\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]usubw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]usubw\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]uaddw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]uaddw\[ \t\]+" 1 } } */

The scan-assembly directives for the different
functions can be placed right below each of them and that would
make it easier to read the expected results in the test and you
can get rid of the comments saying the same.

Thanks
Sudi

Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]

2018-07-12 Thread Sudakshina Das


Hi Jackson

On 11/07/18 17:48, Jackson Woodruff wrote:

Hi Sudi,

On 07/10/2018 02:29 PM, Sudakshina Das wrote:

Hi Jackson


On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote:

Hi all,

This patch resolves PR86014.  It does so by noticing that the last 
load may clobber the address register without issue (regardless of 
where it exists in the final ldp/stp sequence). That check has been 
changed so that the last register may be clobbered and the testcase 
(gcc.target/aarch64/ldp_stp_10.c) now passes.


Bootstrap and regtest OK.

OK for trunk?

Jackson

Changelog:

gcc/

2018-06-25  Jackson Woodruff  

    PR target/86014
    * config/aarch64/aarch64.c 
(aarch64_operands_adjust_ok_for_ldpstp):

    Remove address clobber check on last register.

This looks good to me but you will need a maintainer to approve it. 
The only
thing I would add is that if you could move the comment on top of the 
for loop

to this patch. That is, keep the original
/* Check if the addresses are clobbered by load.  */
in your [1/2] and make the comment change in [2/2].

Thanks, change made.  OK for trunk?



Looks good to me but you will need approval from
a maintainer to commit it!

Thanks
Sudi


Thanks,

Jackson

Re: [AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]

2018-07-12 Thread Sudakshina Das


Hi Jackson

On 11/07/18 17:48, Jackson Woodruff wrote:

Hi Sudi,

Thanks for the review.


On 07/10/2018 10:56 AM, Sudakshina wrote:

Hi Jackson


-  if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
+  if (!MEM_P (mem[1]) || aarch64_mem_pair_operand (mem[1], mode))

mem_1 == mem[1]?

Oops, yes... That should be mem[0].


 return false;

-  /* The mems cannot be volatile.  */
...

/* If we have SImode and slow unaligned ldp,
  check the alignment to be at least 8 byte. */
   if (mode == SImode
   && (aarch64_tune_params.extra_tuning_flags
-  & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
+      & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
   && !optimize_size
-  && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
+  && MEM_ALIGN (mem[1]) < 8 * BITS_PER_UNIT)

Likewise

Done

...
   /* Check if the registers are of same class.  */
-  if (rclass_1 != rclass_2 || rclass_2 != rclass_3 || rclass_3 != 
rclass_4)

-    return false;
+  for (int i = 0; i < 3; i++)

num_instructions -1 instead of 3 would be more consistent.

Done


+    if (rclass[i] != rclass[i + 1])
+  return false;

It looks good otherwise.

Thanks
Sudi


Re-regtested and boostrapped.

OK for trunk?


Looks good to me but you will need approval from
a maintainer to commit it!

Thanks
Sudi



Thanks,

Jackson

Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]

2018-07-10 Thread Sudakshina Das


Hi Jackson


On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote:

Hi all,

This patch resolves PR86014.  It does so by noticing that the last 
load may clobber the address register without issue (regardless of 
where it exists in the final ldp/stp sequence).  That check has been 
changed so that the last register may be clobbered and the testcase 
(gcc.target/aarch64/ldp_stp_10.c) now passes.


Bootstrap and regtest OK.

OK for trunk?

Jackson

Changelog:

gcc/

2018-06-25  Jackson Woodruff  

    PR target/86014
    * config/aarch64/aarch64.c 
(aarch64_operands_adjust_ok_for_ldpstp):

    Remove address clobber check on last register.


This looks good to me but you will need a maintainer to approve it. The only
thing I would add is that if you could move the comment on top of the 
for loop

to this patch. That is, keep the original
/* Check if the addresses are clobbered by load.  */
in your [1/2] and make the comment change in [2/2].

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
d0e9b2d464183eecc8cc7639ca3e981d2ff243ba..feffe8ebdbd4efd0ffc09834547767ceec46f4e4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17074,7 +17074,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, 
bool load,
   /* Only the last register in the order in which they occur
  may be clobbered by the load.  */
   if (load)
-for (int i = 0; i < num_instructions; i++)
+for (int i = 0; i < num_instructions - 1; i++)
   if (reg_mentioned_p (reg[i], mem[i]))
return false;


Thanks
Sudi

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-25 Thread Sudakshina Das


PING!

On 14/06/18 12:10, Sudakshina Das wrote:

Hi Eric

On 07/06/18 16:33, Eric Botcazou wrote:

Sorry this fell off my radar. I have reg-tested it on x86 and tried it
on the sparc machine from the gcc farm but I think I couldn't finished
the run and now its showing to he unreachable.


The patch is a no-op for SPARC because it defines the nonlocal_goto 
pattern.


But I would nevertheless strongly suggest _not_ fiddling with the 
generic code
like that and just defining the nonlocal_goto pattern for Aarch64 
instead.




Thank you for the suggestion, I have edited the patch accordingly and
defined the nonlocal_goto pattern for AArch64. This has also helped take
care of the issue with __builtin_longjmp that Wilco had mentioned in his
comment on the PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521#c19).

I have also modified the test case according to Wilco's comment to add 
an extra jump buffer. This test case passes with AArch64 but fails on

x86 trunk as follows (It may fail on other targets as well):

FAIL: gcc.c-torture/execute/pr84521.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test

Testing: Bootstrapped and regtested on aarch64-none-linux-gnu.

Is this ok for trunk?

Sudi

*** gcc/ChangeLog ***

2018-06-14  Sudakshina Das  

 PR target/84521
 * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment.
 * config/aarch64/aarch64.c (aarch64_needs_frame_chain): Add
 cfun->has_nonlocal_label to force frame chain.
 (aarch64_builtin_setjmp_frame_value): New.
 (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define.
 * config/aarch64/aarch64.md (nonlocal_goto): New.

*** gcc/testsuite/ChangeLog ***

2018-06-14  Sudakshina Das  

 PR target/84521
 * gcc.c-torture/execute/pr84521.c: New test.

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-14 Thread Sudakshina Das


Hi Eric

On 07/06/18 16:33, Eric Botcazou wrote:

Sorry this fell off my radar. I have reg-tested it on x86 and tried it
on the sparc machine from the gcc farm but I think I couldn't finished
the run and now its showing to he unreachable.


The patch is a no-op for SPARC because it defines the nonlocal_goto pattern.

But I would nevertheless strongly suggest _not_ fiddling with the generic code
like that and just defining the nonlocal_goto pattern for Aarch64 instead.



Thank you for the suggestion, I have edited the patch accordingly and
defined the nonlocal_goto pattern for AArch64. This has also helped take
care of the issue with __builtin_longjmp that Wilco had mentioned in his
comment on the PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521#c19).

I have also modified the test case according to Wilco's comment to add 
an extra jump buffer. This test case passes with AArch64 but fails on

x86 trunk as follows (It may fail on other targets as well):

FAIL: gcc.c-torture/execute/pr84521.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test

Testing: Bootstrapped and regtested on aarch64-none-linux-gnu.

Is this ok for trunk?

Sudi

*** gcc/ChangeLog ***

2018-06-14  Sudakshina Das  

PR target/84521
* config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment.
* config/aarch64/aarch64.c (aarch64_needs_frame_chain): Add
cfun->has_nonlocal_label to force frame chain.
(aarch64_builtin_setjmp_frame_value): New.
(TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define.
* config/aarch64/aarch64.md (nonlocal_goto): New.

*** gcc/testsuite/ChangeLog ***

2018-06-14  Sudakshina Das  

PR target/84521
* gcc.c-torture/execute/pr84521.c: New test.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 976f9af..f042def 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version;
 #define EH_RETURN_STACKADJ_RTX	gen_rtx_REG (Pmode, R4_REGNUM)
 #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
 
-/* Don't use __builtin_setjmp until we've defined it.  */
+/* Don't use __builtin_setjmp until we've defined it.
+   CAUTION: This macro is only used during exception unwinding.
+   Don't fall for its name.  */
 #undef DONT_USE_BUILTIN_SETJMP
 #define DONT_USE_BUILTIN_SETJMP 1
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bd0ac2f..95f7fe3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3998,7 +3998,7 @@ static bool
 aarch64_needs_frame_chain (void)
 {
   /* Force a frame chain for EH returns so the return address is at FP+8.  */
-  if (frame_pointer_needed || crtl->calls_eh_return)
+  if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label)
 return true;
 
   /* A leaf function cannot have calls or write LR.  */
@@ -12213,6 +12213,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
   expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
 }
 
+/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
+static rtx
+aarch64_builtin_setjmp_frame_value (void)
+{
+  return hard_frame_pointer_rtx;
+}
+
 /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
 
 static tree
@@ -17829,6 +17836,9 @@ aarch64_run_selftests (void)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN aarch64_fold_builtin
 
+#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
+#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
+
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG aarch64_function_arg
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 830f976..381fd83 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6081,6 +6081,30 @@
   DONE;
 })
 
+;; This is broadly similar to the builtins.c except that it uses
+;; temporaries to load the incoming SP and FP.
+(define_expand "nonlocal_goto"
+  [(use (match_operand 0 "general_operand"))
+   (use (match_operand 1 "general_operand"))
+   (use (match_operand 2 "general_operand"))
+   (use (match_operand 3 "general_operand"))]
+  ""
+{
+rtx label_in = copy_to_reg (operands[1]);
+rtx fp_in = copy_to_reg (operands[3]);
+rtx sp_in = copy_to_reg (operands[2]);
+
+emit_move_insn (hard_frame_pointer_rtx, fp_in);
+emit_stack_restore (SAVE_N

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-07 Thread Sudakshina Das


On 02/05/18 18:28, Jeff Law wrote:

On 03/14/2018 11:40 AM, Sudakshina Das wrote:

Hi

This patch is another partial fix for PR 84521. This is adding a
definition to one of the target hooks used in the SJLJ implemetation so
that AArch64 defines the hard_frame_pointer_rtx as the
TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is
still a lot more work to be done for these builtins in the future.

Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added
new test.

Is this ok for trunk?

Sudi


*** gcc/ChangeLog ***

2018-03-14  Sudakshina Das  

 * builtins.c (expand_builtin_setjmp_receiver): Update condition
 to restore frame pointer.
 * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update
 comment.
 * config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value):
 New.
 (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define.

*** gcc/testsuite/ChangeLog ***

2018-03-14  Sudakshina Das  

 * gcc.c-torture/execute/pr84521.c: New test.

So just to be clear, you do _not_ want the frame pointer restored here?
Right?

aarch64_builtin_setjmp_frame_value always returns hard_frame_pointer_rtx
which will cause the generic code in builtins.c to not restore the frame
pointer.

Have you looked at other targets which define builtin_setjmp_frame_value
to determine if they'll do the right thing.  x86 and sparc are the most
important.  I see that arc, vax and avr also define that hook, but are
obviously harder to test.



Sorry this fell off my radar. I have reg-tested it on x86 and tried it
on the sparc machine from the gcc farm but I think I couldn't finished
the run and now its showing to he unreachable.

Sudi


jeff

Re: C++ PATCHes to xvalue handling

2018-05-25 Thread Sudakshina Das


On 23/05/18 18:21, Jason Merrill wrote:

The first patch implements the adjustments from core issues 616 and
1213 to the value category of subobjects of class prvalues: they were
considered prvalues themselves, but that was kind of nonsensical.  Now
they are considered xvalues.  Along with this, I've removed the
diagnostic distinction between xvalues and prvalues when trying to use
one or the other as an lvalue; the important thing is that they are
rvalues.

The second patch corrects various issues with casts and xvalues/rvalue
references: we were treating an xvalue operand to dynamic_cast as an
lvalue, and we were objecting to casts from prvalue to rvalue
reference type.



With the second patch:
commit f7d2790049fd1e59af4b69ee12f7c101cfe4cdab
Author: jason 
Date:   Wed May 23 17:21:39 2018 +

Fix cast to rvalue reference from prvalue.

* cvt.c (diagnose_ref_binding): Handle rvalue reference.
* rtti.c (build_dynamic_cast_1): Don't try to build a reference to
non-class type.  Handle xvalue argument.
* typeck.c (build_reinterpret_cast_1): Allow cast from prvalue to
rvalue reference.
* semantics.c (finish_compound_literal): Do direct-initialization,
not cast, to initialize a reference.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@260622 
138bc75d-0d04-0410-961f-82ee72b054a4


I have observed the following failure in Spec2017 while building 
510.parest_r on aarch64-none-linux-gnu


aarch64-none-linux-gnu-g++ -c -o 
source/numerics/matrices.all_dimensions.o -DSPEC -DNDEBUG -Iinclude -I. 
-DSPEC_AUTO_SUPPRESS_OPENMP  -mcpu=cortex-a57+crypto -Ofast 
-fomit-frame-pointer -fpermissive-DSPEC_LP64 
source/numerics/matrices.all_dimensions.cc


source/numerics/matrices.all_dimensions.cc: In static member function 
'static void dealii::MatrixTools::apply_boundary_values(const 
std::map&, dealii::BlockSparseMatrix&, 
dealii::BlockVector&, dealii::BlockVector&, bool)':


source/numerics/matrices.all_dimensions.cc:469:50: error: lvalue 
required as unary '&' operand


[this_sparsity.get_rowstart_indices()[row]];

  ^

source/numerics/matrices.all_dimensions.cc:472:55: error: lvalue 
required as unary '&' operand


   [this_sparsity.get_rowstart_indices()[row]+1],

   ^

source/numerics/matrices.all_dimensions.cc:474:55: error: lvalue 
required as unary '&' operand


   [this_sparsity.get_rowstart_indices()[row+1]],

   ^

source/numerics/matrices.all_dimensions.cc:479:49: error: lvalue 
required as unary '&' operand


   [this_sparsity.get_rowstart_indices()[row]],

 ^

source/numerics/matrices.all_dimensions.cc:481:51: error: lvalue 
required as unary '&' operand


   [this_sparsity.get_rowstart_indices()[row+1]],

   ^

source/numerics/matrices.all_dimensions.cc:510:50: error: lvalue 
required as unary '&' operand


  [this_sparsity.get_rowstart_indices()[0]]);

Sudi


Tested x86_64-pc-linux-gnu, applying to trunk.

Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

2018-05-25 Thread Sudakshina Das


Hi Martin

On 25/05/18 10:45, Martin Liška wrote:

On 05/21/2018 04:42 PM, Sudakshina Das wrote:

On 21/05/18 15:00, Rainer Orth wrote:

Hi Martin,


Thanks for opened eyes, following patch will fix that.
It's quite obvious, I'll install it right after tests will finish.


unfortunately, it didn't fix either issue:

* The switchlower -> switchlower1 renames in the dg-final* lines
    (attached) are still necessary to avoid the UNRESOLVED errors.
    Although obvious, I haven't installed them since ...

* ... even so

FAIL: gcc.dg/tree-prof/update-loopch.c scan-tree-dump switchlower1 "Removing basic 
block"

    remains.

[...]

You are right, it's using -O2, thus your patch is right. Please install the
patch
after testing. It's obvious fix.


But what about the remaining FAIL?



Sorry to add to this, but I have also observed the following failures on  
aarch64-none-elf, aarch64-none-linux-gnu and aarch64_be-none-elf targets 
bisected to this commit:

FAIL: gcc.dg/sancov/cmp0.c   -O0   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp" 7

FAIL: gcc.dg/sancov/cmp0.c   -O0   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_switch \\(" 2

FAIL: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp" 7

FAIL: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_switch \\(" 2


Hi.

I've just tested sancov tests on my aarch64 and cmp0.c looks fine. Can you 
please tell me which -march, -mtune does
your board have?



FAIL: gcc.dg/tree-ssa/pr77445-2.c scan-tree-dump-not thread3 "not considered"

FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps 
threaded"


I can confirm these 2. It's kind of expected, I will clean it up before next 
release. Jeff is aware
of that..

Martin



From my today's build, I only see the following remaining now:

FAIL: gcc.dg/tree-prof/update-loopch.c scan-tree-dump switchlower1 
"Removing basic block"


FAIL: gcc.dg/tree-ssa/pr77445-2.c scan-tree-dump-not thread3 "not 
considered"


FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps 
threaded"



Sudi



Sudi


 Rainer

Re: [PATCH][AARCH64][PR target/84882] Add mno-strict-align

2018-05-23 Thread Sudakshina Das


Hi Richard

On 18/05/18 15:48, Richard Earnshaw (lists) wrote:

On 27/03/18 13:58, Sudakshina Das wrote:

Hi

This patch adds the no variant to -mstrict-align and the corresponding
function attribute. To enable the function attribute, I have modified
aarch64_can_inline_p () to allow checks even when the callee function
has no attribute. The need for this is shown by the new test
target_attr_18.c.

Testing: Bootstrapped, regtested and added new tests that are copies
of earlier tests checking -mstrict-align with opposite scan directives.

Is this ok for trunk?

Sudi


*** gcc/ChangeLog ***

2018-03-27  Sudakshina Das  <sudi@arm.com>

 * common/config/aarch64/aarch64-common.c (aarch64_handle_option):
 Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags.
 * config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative.
 * config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg
 as true for strict-align.
 (aarch64_can_inline_p): Perform checks even when callee has no
 attributes to check for strict alignment.
 * doc/extend.texi (AArch64 Function Attributes): Document
 no-strict-align.
 * doc/invoke.texi: (AArch64 Options): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-27  Sudakshina Das  <sudi@arm.com>

 * gcc.target/aarch64/pr84882.c: New test.
 * gcc.target/aarch64/target_attr_18.c: Likewise.

strict-align.diff


diff --git a/gcc/common/config/aarch64/aarch64-common.c 
b/gcc/common/config/aarch64/aarch64-common.c
index 7fd9305..d5655a0 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -97,7 +97,10 @@ aarch64_handle_option (struct gcc_options *opts,
return true;
  
  case OPT_mstrict_align:

-  opts->x_target_flags |= MASK_STRICT_ALIGN;
+  if (val)
+   opts->x_target_flags |= MASK_STRICT_ALIGN;
+  else
+   opts->x_target_flags &= ~MASK_STRICT_ALIGN;
return true;
  
  case OPT_momit_leaf_frame_pointer:

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4b5183b..4f35a6c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11277,7 +11277,7 @@ static const struct aarch64_attribute_info 
aarch64_attributes[] =
{ "fix-cortex-a53-843419", aarch64_attr_bool, true, NULL,
   OPT_mfix_cortex_a53_843419 },
{ "cmodel", aarch64_attr_enum, false, NULL, OPT_mcmodel_ },
-  { "strict-align", aarch64_attr_mask, false, NULL, OPT_mstrict_align },
+  { "strict-align", aarch64_attr_mask, true, NULL, OPT_mstrict_align },
{ "omit-leaf-frame-pointer", aarch64_attr_bool, true, NULL,
   OPT_momit_leaf_frame_pointer },
{ "tls-dialect", aarch64_attr_enum, false, NULL, OPT_mtls_dialect_ },
@@ -11640,16 +11640,13 @@ aarch64_can_inline_p (tree caller, tree callee)
tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
  
-  /* If callee has no option attributes, then it is ok to inline.  */

-  if (!callee_tree)
-return true;


I think it's still useful to spot the case where both callee_tree and
caller_tree are NULL.  In that case both options will pick up
target_option_default_node and will always be compatible; so you can
short-circuit that case, which is the most likely scenario.


-
struct cl_target_option *caller_opts
= TREE_TARGET_OPTION (caller_tree ? caller_tree
   : target_option_default_node);
  
-  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);

-
+  struct cl_target_option *callee_opts
+   = TREE_TARGET_OPTION (callee_tree ? callee_tree
+  : target_option_default_node);
  
/* Callee's ISA flags should be a subset of the caller's.  */

if ((caller_opts->x_aarch64_isa_flags & callee_opts->x_aarch64_isa_flags)
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 52eaf8c..1426b45 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -85,7 +85,7 @@ Target RejectNegative Joined Enum(cmodel) 
Var(aarch64_cmodel_var) Init(AARCH64_C
  Specify the code model.
  
  mstrict-align

-Target Report RejectNegative Mask(STRICT_ALIGN) Save
+Target Report Mask(STRICT_ALIGN) Save
  Don't assume that unaligned accesses are handled by the system.
  
  momit-leaf-frame-pointer

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 93a0ebc..dcda216 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3605,8 +3605,10 @@ for the command line option @option{-mcmodel=}.
  @item strict-align


Other targets add an @itemx for the no-variant.


  @cindex @code{strict-align} function attribute, AArch64
  Indicates that the compiler should not assume that unaligned memory references
-are handled by the system.  The behavi

Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

2018-05-21 Thread Sudakshina Das


On 21/05/18 15:00, Rainer Orth wrote:

Hi Martin,


Thanks for opened eyes, following patch will fix that.
It's quite obvious, I'll install it right after tests will finish.


unfortunately, it didn't fix either issue:

* The switchlower -> switchlower1 renames in the dg-final* lines
   (attached) are still necessary to avoid the UNRESOLVED errors.
   Although obvious, I haven't installed them since ...

* ... even so

FAIL: gcc.dg/tree-prof/update-loopch.c scan-tree-dump switchlower1 "Removing basic 
block"

   remains.

[...]

You are right, it's using -O2, thus your patch is right. Please install the
patch
after testing. It's obvious fix.


But what about the remaining FAIL?



Sorry to add to this, but I have also observed the following failures on 
 aarch64-none-elf, aarch64-none-linux-gnu and aarch64_be-none-elf 
targets bisected to this commit:


FAIL: gcc.dg/sancov/cmp0.c   -O0   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp" 7


FAIL: gcc.dg/sancov/cmp0.c   -O0   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_switch \\(" 2


FAIL: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp" 7


FAIL: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_switch \\(" 2


FAIL: gcc.dg/tree-ssa/pr77445-2.c scan-tree-dump-not thread3 "not 
considered"


FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps 
threaded"


Sudi


Rainer

Re: [PATCH][AARCH64][PR target/84882] Add mno-strict-align

2018-05-10 Thread Sudakshina Das


Ping!

On 27/03/18 13:58, Sudakshina Das wrote:

Hi

This patch adds the no variant to -mstrict-align and the corresponding
function attribute. To enable the function attribute, I have modified
aarch64_can_inline_p () to allow checks even when the callee function
has no attribute. The need for this is shown by the new test
target_attr_18.c.

Testing: Bootstrapped, regtested and added new tests that are copies
of earlier tests checking -mstrict-align with opposite scan directives.

Is this ok for trunk?

Sudi


*** gcc/ChangeLog ***

2018-03-27  Sudakshina Das  <sudi@arm.com>

 * common/config/aarch64/aarch64-common.c (aarch64_handle_option):
 Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags.
 * config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative.
 * config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg
 as true for strict-align.
 (aarch64_can_inline_p): Perform checks even when callee has no
 attributes to check for strict alignment.
 * doc/extend.texi (AArch64 Function Attributes): Document
 no-strict-align.
 * doc/invoke.texi: (AArch64 Options): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-27  Sudakshina Das  <sudi@arm.com>

 * gcc.target/aarch64/pr84882.c: New test.
 * gcc.target/aarch64/target_attr_18.c: Likewise.

Re: [AARCH64] Neon vld1__x3, vst1__x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sudakshina Das


Hi Sameera

On 11/04/18 13:05, Sameera Deshpande wrote:

On 11 April 2018 at 15:53, Sudakshina Das <sudi@arm.com> wrote:

Hi Sameera


On 11/04/18 09:04, Sameera Deshpande wrote:


On 10 April 2018 at 20:07, Sudakshina Das <sudi@arm.com> wrote:


Hi Sameera


On 10/04/18 11:20, Sameera Deshpande wrote:



On 7 April 2018 at 01:25, Christophe Lyon <christophe.l...@linaro.org>
wrote:



Hi,

2018-04-06 12:15 GMT+02:00 Sameera Deshpande
<sameera.deshpa...@linaro.org>:



Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?




Thanks for the update.

Since the new intrinsics are only available on aarch64, you want to
prevent the tests from running on arm.
Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
targets.
There are several examples on how to do that in that directory.

I have also noticed that the tests fail at execution on aarch64_be.

I didn't look at the patch in details.

Christophe




- Thanks and regards,
 Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon
<christophe.l...@linaro.org>:



2017-12-14 9:29 GMT+01:00 Sameera Deshpande
<sameera.deshpa...@linaro.org>:



Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
 Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  <sameera.deshpa...@linaro.org>


   * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
   (st1x2): Likewise.
   (st1x3): Likewise.
   * config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
   (aarch64_ld1_x3_): Likewise
   (aarch64_st1x2): Likewise
   (aarch64_st1_x2_): Likewise
   (aarch64_st1x3): Likewise
   (aarch64_st1_x3_): Likewise
   * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
   (vld1_s8_x3): Likewise.
   (vld1_u16_x3): Likewise.
   (vld1_s16_x3): Likewise.
   (vld1_u32_x3): Likewise.
   (vld1_s32_x3): Likewise.
   (vld1_u64_x3): Likewise.
   (vld1_s64_x3): Likewise.
   (vld1_fp16_x3): Likewise.
   (vld1_f32_x3): Likewise.
   (vld1_f64_x3): Likewise.
   (vld1_p8_x3): Likewise.
   (vld1_p16_x3): Likewise.
   (vld1_p64_x3): Likewise.
   (vld1q_u8_x3): Likewise.
   (vld1q_s8_x3): Likewise.
   (vld1q_u16_x3): Likewise.
   (vld1q_s16_x3): Likewise.
   (vld1q_u32_x3): Likewise.
   (vld1q_s32_x3): Likewise.
   (vld1q_u64_x3): Likewise.
   (vld1q_s64_x3): Likewise.
   (vld1q_f16_x3): Likewise.
   (vld1q_f32_x3): Likewise.
   (vld1q_f64_x3): Likewise.
   (vld1q_p8_x3): Likewise.
   (vld1q_p16_x3): Likewise.
   (vld1q_p64_x3): Likewise.
   (vst1_s64_x2): Likewise.
   (vst1_u64_x2): Likewise.
   (vst1_f64_x2):

Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3



patchname=armv8_2-fp16-scalar-2.patch3
refrev=259064
email_to=christophe.l...@linaro.org


   (vst1_s8_x2): Likewise.
   (vst1_p8_x2): Likewise.
   (vst1_s16_x2): Likewise.
   (vst1_p16_x2): Likewise.
   (vst1_s32_x2): Likewise.
   (vst1_u8_x2): Likewise.
   (vst1_u16_x2): Likewise.
   (vst1_u32_x2): Likewise.
   (vst1_f16_x2): Likewise.
   (vst1_f32_x2): Likewise.
   (vst1_p64_x2): Likewise.
   (vst1q_s8_x2): Likewise.
   (vst1q_p8_x2): Likewise.
   (vst1q_s16_x2): Likewise.
   (vst1q_p16_x2): Likewise.
   (vst1q_s32_x2): Likewise.
   (vst1q_s64_x2): Likewise.
   (vst1q_u8_x2): Likewise.
   (vst1q_u16_x2): Likewise.
   (vst1q_u32_x2): Likewise.
   (vst1q_u64_x2): Likewise.
   (vst1q_f16_x2): Likewise.
   (vst1q_f32_x2): Likewise.
   (vst1q_f64_x2): Likewise.
   (vst1q_p64_x2): Likewise.
   (vst1_s64_x3): Likewise.
   (vst1_u64_x3): Likewise.
   (vst1_f64_x3): Likewise.
   (vst1_s8_x3): Likewise.
   (vst1_p8_x3): Likewise.
   (vst1_s16_x3): Likewise.
   (vst1_p16_x3): Likewise.
   (vst1_s32_x3): Likewise.
   (vst1_u8_x3): Likewise.
   (vst1_u16_x3): Likewise.
   (vst1_u32_x3): Likewise.
   (vst1_f16_x3): Likewise.
   (vst1_f32_x3): Likewise.
   (vst1_p64_x3): Likewise.
   (vst1q_s8_x3): Likewise.
   (vst1q_p8_x3): Likewise.
   (vst1q_s16_x3): Likewise.
   (vst1q_p16_x3): Likewise.
   (vst1q_s32_x3): Likewise.
   (vst1q_s64_x3): Likewise.
   (vst1q_u8_x3): Likewise.
   (vst1q_u16_x3): Likewise.
   (vst1q_u32_x3): Likewise.
   (vst1q_u64_x3): Likewise.
   (vst1q_f16_x3): Lik

Re: [AARCH64] Neon vld1__x3, vst1__x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sudakshina Das


Hi Sameera

On 11/04/18 09:04, Sameera Deshpande wrote:

On 10 April 2018 at 20:07, Sudakshina Das <sudi@arm.com> wrote:

Hi Sameera


On 10/04/18 11:20, Sameera Deshpande wrote:


On 7 April 2018 at 01:25, Christophe Lyon <christophe.l...@linaro.org>
wrote:


Hi,

2018-04-06 12:15 GMT+02:00 Sameera Deshpande
<sameera.deshpa...@linaro.org>:


Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?



Thanks for the update.

Since the new intrinsics are only available on aarch64, you want to
prevent the tests from running on arm.
Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
targets.
There are several examples on how to do that in that directory.

I have also noticed that the tests fail at execution on aarch64_be.

I didn't look at the patch in details.

Christophe




- Thanks and regards,
Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon <christophe.l...@linaro.org>:


2017-12-14 9:29 GMT+01:00 Sameera Deshpande
<sameera.deshpa...@linaro.org>:


Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  <sameera.deshpa...@linaro.org>


  * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
  (st1x2): Likewise.
  (st1x3): Likewise.
  * config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
  (aarch64_ld1_x3_): Likewise
  (aarch64_st1x2): Likewise
  (aarch64_st1_x2_): Likewise
  (aarch64_st1x3): Likewise
  (aarch64_st1_x3_): Likewise
  * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
  (vld1_s8_x3): Likewise.
  (vld1_u16_x3): Likewise.
  (vld1_s16_x3): Likewise.
  (vld1_u32_x3): Likewise.
  (vld1_s32_x3): Likewise.
  (vld1_u64_x3): Likewise.
  (vld1_s64_x3): Likewise.
  (vld1_fp16_x3): Likewise.
  (vld1_f32_x3): Likewise.
  (vld1_f64_x3): Likewise.
  (vld1_p8_x3): Likewise.
  (vld1_p16_x3): Likewise.
  (vld1_p64_x3): Likewise.
  (vld1q_u8_x3): Likewise.
  (vld1q_s8_x3): Likewise.
  (vld1q_u16_x3): Likewise.
  (vld1q_s16_x3): Likewise.
  (vld1q_u32_x3): Likewise.
  (vld1q_s32_x3): Likewise.
  (vld1q_u64_x3): Likewise.
  (vld1q_s64_x3): Likewise.
  (vld1q_f16_x3): Likewise.
  (vld1q_f32_x3): Likewise.
  (vld1q_f64_x3): Likewise.
  (vld1q_p8_x3): Likewise.
  (vld1q_p16_x3): Likewise.
  (vld1q_p64_x3): Likewise.
  (vst1_s64_x2): Likewise.
  (vst1_u64_x2): Likewise.
  (vst1_f64_x2):
Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3


patchname=armv8_2-fp16-scalar-2.patch3
refrev=259064
email_to=christophe.l...@linaro.org


  (vst1_s8_x2): Likewise.
  (vst1_p8_x2): Likewise.
  (vst1_s16_x2): Likewise.
  (vst1_p16_x2): Likewise.
  (vst1_s32_x2): Likewise.
  (vst1_u8_x2): Likewise.
  (vst1_u16_x2): Likewise.
  (vst1_u32_x2): Likewise.
  (vst1_f16_x2): Likewise.
  (vst1_f32_x2): Likewise.
  (vst1_p64_x2): Likewise.
  (vst1q_s8_x2): Likewise.
  (vst1q_p8_x2): Likewise.
  (vst1q_s16_x2): Likewise.
  (vst1q_p16_x2): Likewise.
  (vst1q_s32_x2): Likewise.
  (vst1q_s64_x2): Likewise.
  (vst1q_u8_x2): Likewise.
  (vst1q_u16_x2): Likewise.
  (vst1q_u32_x2): Likewise.
  (vst1q_u64_x2): Likewise.
  (vst1q_f16_x2): Likewise.
  (vst1q_f32_x2): Likewise.
  (vst1q_f64_x2): Likewise.
  (vst1q_p64_x2): Likewise.
  (vst1_s64_x3): Likewise.
  (vst1_u64_x3): Likewise.
  (vst1_f64_x3): Likewise.
  (vst1_s8_x3): Likewise.
  (vst1_p8_x3): Likewise.
  (vst1_s16_x3): Likewise.
  (vst1_p16_x3): Likewise.
  (vst1_s32_x3): Likewise.
  (vst1_u8_x3): Likewise.
  (vst1_u16_x3): Likewise.
  (vst1_u32_x3): Likewise.
  (vst1_f16_x3): Likewise.
  (vst1_f32_x3): Likewise.
  (vst1_p64_x3): Likewise.
  (vst1q_s8_x3): Likewise.
  (vst1q_p8_x3): Likewise.
  (vst1q_s16_x3): Likewise.
  (vst1q_p16_x3): Likewise.
  (vst1q_s32_x3): Likewise.
  (vst1q_s64_x3): Likewise.
  (vst1q_u8_x3): Likewise.
  (vst1q_u16_x3): Likewise.
  (vst1q_u32_x3): Likewise.
  (vst1q_u64_x3): Likewise.
  (vst1q_f16_x3): Likewise.
  (vst1q_f32_x3): Likewise.
  (vst1q_f64_x3): Likewise.
  (vst1q_p64_x3): Likewise.



Hi,
I'm not a maintainer, but I suspect you should add some tests.

Christophe





--
- Thanks and regards,

Re: [AARCH64] Neon vld1__x3, vst1__x2 and vst1_*_x3 intrinsics

2018-04-10 Thread Sudakshina Das


Hi Sameera

On 10/04/18 11:20, Sameera Deshpande wrote:

On 7 April 2018 at 01:25, Christophe Lyon  wrote:

Hi,

2018-04-06 12:15 GMT+02:00 Sameera Deshpande :

Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?


Thanks for the update.

Since the new intrinsics are only available on aarch64, you want to
prevent the tests from running on arm.
Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two targets.
There are several examples on how to do that in that directory.

I have also noticed that the tests fail at execution on aarch64_be.

I didn't look at the patch in details.

Christophe




- Thanks and regards,
   Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon :

2017-12-14 9:29 GMT+01:00 Sameera Deshpande :

Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
   Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  


 * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
 (st1x2): Likewise.
 (st1x3): Likewise.
 * config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
 (aarch64_ld1_x3_): Likewise
 (aarch64_st1x2): Likewise
 (aarch64_st1_x2_): Likewise
 (aarch64_st1x3): Likewise
 (aarch64_st1_x3_): Likewise
 * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
 (vld1_s8_x3): Likewise.
 (vld1_u16_x3): Likewise.
 (vld1_s16_x3): Likewise.
 (vld1_u32_x3): Likewise.
 (vld1_s32_x3): Likewise.
 (vld1_u64_x3): Likewise.
 (vld1_s64_x3): Likewise.
 (vld1_fp16_x3): Likewise.
 (vld1_f32_x3): Likewise.
 (vld1_f64_x3): Likewise.
 (vld1_p8_x3): Likewise.
 (vld1_p16_x3): Likewise.
 (vld1_p64_x3): Likewise.
 (vld1q_u8_x3): Likewise.
 (vld1q_s8_x3): Likewise.
 (vld1q_u16_x3): Likewise.
 (vld1q_s16_x3): Likewise.
 (vld1q_u32_x3): Likewise.
 (vld1q_s32_x3): Likewise.
 (vld1q_u64_x3): Likewise.
 (vld1q_s64_x3): Likewise.
 (vld1q_f16_x3): Likewise.
 (vld1q_f32_x3): Likewise.
 (vld1q_f64_x3): Likewise.
 (vld1q_p8_x3): Likewise.
 (vld1q_p16_x3): Likewise.
 (vld1q_p64_x3): Likewise.
 (vst1_s64_x2): Likewise.
 (vst1_u64_x2): Likewise.
 (vst1_f64_x2): 
Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3

patchname=armv8_2-fp16-scalar-2.patch3
refrev=259064
email_to=christophe.l...@linaro.org


 (vst1_s8_x2): Likewise.
 (vst1_p8_x2): Likewise.
 (vst1_s16_x2): Likewise.
 (vst1_p16_x2): Likewise.
 (vst1_s32_x2): Likewise.
 (vst1_u8_x2): Likewise.
 (vst1_u16_x2): Likewise.
 (vst1_u32_x2): Likewise.
 (vst1_f16_x2): Likewise.
 (vst1_f32_x2): Likewise.
 (vst1_p64_x2): Likewise.
 (vst1q_s8_x2): Likewise.
 (vst1q_p8_x2): Likewise.
 (vst1q_s16_x2): Likewise.
 (vst1q_p16_x2): Likewise.
 (vst1q_s32_x2): Likewise.
 (vst1q_s64_x2): Likewise.
 (vst1q_u8_x2): Likewise.
 (vst1q_u16_x2): Likewise.
 (vst1q_u32_x2): Likewise.
 (vst1q_u64_x2): Likewise.
 (vst1q_f16_x2): Likewise.
 (vst1q_f32_x2): Likewise.
 (vst1q_f64_x2): Likewise.
 (vst1q_p64_x2): Likewise.
 (vst1_s64_x3): Likewise.
 (vst1_u64_x3): Likewise.
 (vst1_f64_x3): Likewise.
 (vst1_s8_x3): Likewise.
 (vst1_p8_x3): Likewise.
 (vst1_s16_x3): Likewise.
 (vst1_p16_x3): Likewise.
 (vst1_s32_x3): Likewise.
 (vst1_u8_x3): Likewise.
 (vst1_u16_x3): Likewise.
 (vst1_u32_x3): Likewise.
 (vst1_f16_x3): Likewise.
 (vst1_f32_x3): Likewise.
 (vst1_p64_x3): Likewise.
 (vst1q_s8_x3): Likewise.
 (vst1q_p8_x3): Likewise.
 (vst1q_s16_x3): Likewise.
 (vst1q_p16_x3): Likewise.
 (vst1q_s32_x3): Likewise.
 (vst1q_s64_x3): Likewise.
 (vst1q_u8_x3): Likewise.
 (vst1q_u16_x3): Likewise.
 (vst1q_u32_x3): Likewise.
 (vst1q_u64_x3): Likewise.
 (vst1q_f16_x3): Likewise.
 (vst1q_f32_x3): Likewise.
 (vst1q_f64_x3): Likewise.
 (vst1q_p64_x3): Likewise.


Hi,
I'm not a maintainer, but I suspect you should add some tests.

Christophe




--
- Thanks and regards,
   Sameera D.


Hi Christophe,

Please find attached the updated patch. Similar to the testcase
vld1x2.c, I have updated the testcases to mark them XFAIL for ARM, as
the intrinsics are not implemented yet. I have also added required
target to be little endian.


I am not a

Re: [Aarch64] Fix conditional branches with target far away.

2018-03-29 Thread Sudakshina Das


Hi Sameera

On 29/03/18 11:44, Sameera Deshpande wrote:

Hi Sudakshina,

Thanks for pointing that out. Updated the conditions for attribute
length to take care of boundary conditions for offset range.

Please find attached the updated patch.

I have tested it for gcc testsuite and the failing testcase. Ok for trunk?


Thank you so much for fixing the length as well along with you patch.
You mention a failing testcase? Maybe it would be helpful to add that
to the patch for the gcc testsuite.

Sudi



On 22 March 2018 at 19:06, Sudakshina Das <sudi@arm.com> wrote:

Hi Sameera

On 22/03/18 02:07, Sameera Deshpande wrote:


Hi Sudakshina,

As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
far branch instruction offset is inclusive of both the offsets. Hence,
I am using <=||=> and not <||>= as it was in previous implementation.



I have to admit earlier I was only looking at the patch mechanically and
found a difference with the previous implementation in offset comparison.
After you pointed out, I looked up the ARMv8 ARM and I have a couple of
doubts:

1. My understanding is that any offset in [-1048576 ,1048572] both inclusive
qualifies as an 'in range' offset. However, the code for both attribute
length and far_branch has been using [-1048576 ,1048572), that is, ( >= && <
). If the far_branch was incorrectly calculated, then maybe the length
calculations with similar magic numbers should also be corrected? Of course,
I am not an expert in this and maybe this was a conscience decision so I
would ask Ramana to maybe clarify if he remembers.

2. Now to come back to your patch, if my understanding is correct, I think a
far_branch would be anything outside of this range, that is,
(offset < -1048576 || offset > 1048572), anything that can not be
represented in the 21-bit range.

Thanks
Sudi




On 16 March 2018 at 00:51, Sudakshina Das <sudi@arm.com> wrote:


On 15/03/18 15:27, Sameera Deshpande wrote:



Ping!

On 28 February 2018 at 16:18, Sameera Deshpande
<sameera.deshpa...@linaro.org> wrote:



On 27 February 2018 at 18:25, Ramana Radhakrishnan
<ramana@googlemail.com> wrote:



On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
<sameera.deshpa...@linaro.org> wrote:



Hi!

Please find attached the patch to fix bug in branches with offsets
over
1MiB.
There has been an attempt to fix this issue in commit
050af05b9761f1979f11c151519e7244d5becd7c

However, the far_branch attribute defined in above patch used
insn_length - which computes incorrect offset. Hence, eliminated the
attribute completely, and computed the offset from insn_addresses
instead.

Ok for trunk?

gcc/Changelog

2018-02-13 Sameera Deshpande <sameera.deshpa...@linaro.org>
   * config/aarch64/aarch64.md (far_branch): Remove attribute.
Eliminate
   all the dependencies on the attribute from RTL patterns.



I'm not a maintainer but this looks good to me modulo notes about how
this was tested. What would be nice is a testcase for the testsuite as
well as ensuring that the patch has been bootstrapped and regression
tested. AFAIR, the original patch was put in because match.pd failed
when bootstrap in another context.


regards
Ramana


--
- Thanks and regards,
 Sameera D.




The patch is tested with GCC testsuite and bootstrapping successfully.
Also tested for spec benchmark.



I am not a maintainer either. I noticed that the range check you do for
the offset has a (<= || >=). The "far_branch" however did (< || >=) for a
positive value. Was that also part of the incorrect offset calculation?

@@ -692,7 +675,11 @@
  {
if (get_attr_length (insn) =3D=3D 8)
  {
-   if (get_attr_far_branch (insn) =3D=3D 1)
+   long long int offset;
+   offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
+ - INSN_ADDRESSES (INSN_UID (insn));
+
+   if (offset <=3D -1048576 || offset >=3D 1048572)
 return aarch64_gen_far_branch (operands, 2, "Ltb",
"\\t%0, %1, ");
   else
@@ -709,12 +696,7 @@
   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
-32768))
  (lt (minus (match_dup 2) (pc)) (const_int
32764)))
 (const_int 4)
- (const_int 8)))
-   (set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
-1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int
1048572)))
- (const_int 0)
- (const_int 1)))]
+ (const_int 8)))]

)

Thanks
Sudi


--
- Thanks and regards,
 Sameera D.

Re: [PATCH, GCC-7, GCC-6][ARM][PR target/84826] Backport Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabihf

2018-03-29 Thread Sudakshina Das


Hi Kyrill

On 29/03/18 09:41, Kyrill Tkachov wrote:

Hi Sudi,

On 28/03/18 15:04, Sudakshina Das wrote:

Hi

This patch is a request to backport r258777 and r258805 to gcc-7-branch
and gcc-6-branch. The same ICE occurs in both the branches with
-fstack-check. Thus the test case directive has been changed.

The discussion on the patch that went into trunk is:
https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01120.html

Testing : Regtested on both the branches with arm-none-linux-gnueabihf

Is this ok for gcc-7 and gcc-6?



Ok.
Thanks,
Kyrill


Thanks! Committed to gcc-7-branch as r258948 and gcc-6-branch as r258949.

Sudi




Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2018-03-28  Sudakshina Das  <sudi@arm.com>

    Backport from mainline
    2018-03-22  Sudakshina Das  <sudi@arm.com>

    PR target/84826
    * config/arm/arm.h (machine_function): Add 
static_chain_stack_bytes.

    * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid
    re-computing once computed.
    (arm_expand_prologue): Compute machine->static_chain_stack_bytes.
    (arm_init_machine_status): Initialize
    machine->static_chain_stack_bytes.

*** gcc/testsuite/ChangeLog ***

2018-03-28  Sudakshina Das  <sudi@arm.com>

    * gcc.target/arm/pr84826.c: Change dg-option to -fstack-check.

    Backport from mainline
    2018-03-23  Sudakshina Das  <sudi@arm.com>

    PR target/84826
    * gcc.target/arm/pr84826.c: Add dg directive.

    Backport from mainline
        2018-03-22  Sudakshina Das  <sudi@arm.com>

    PR target/84826
    * gcc.target/arm/pr84826.c: New test.

1 2 >

1 - 100 of 159 matches

Mail list logo