Re: [PATCH, v2] Recognize a missed usage of a sbfiz instruction

2018-05-15 Thread Kyrill Tkachov

Hi Luis,

On 14/05/18 21:41, Luis Machado wrote:

On 05/11/2018 06:46 AM, Kyrill Tkachov wrote:

Hi Luis,

On 10/05/18 11:31, Luis Machado wrote:


On 05/09/2018 10:44 AM, Kyrill Tkachov wrote:


On 09/05/18 13:30, Luis Machado wrote:

Hi Kyrill,

On 05/08/2018 11:09 AM, Kyrill Tkachov wrote:

Hi Luis,

On 07/05/18 15:28, Luis Machado wrote:

Hi,

On 02/08/2018 10:45 AM, Luis Machado wrote:

Hi Kyrill,

On 02/08/2018 09:48 AM, Kyrill Tkachov wrote:

Hi Luis,

On 06/02/18 15:04, Luis Machado wrote:

Thanks for the feedback Kyrill. I've adjusted the v2 patch based on your
suggestions and re-tested the changes. Everything is still sane.


Thanks! This looks pretty good to me.


Since this is ARM-specific and fairly specific, i wonder if it would be
reasonable to consider it for inclusion at the current stage.


It is true that the target maintainers can choose to take
such patches at any stage. However, any patch at this stage increases
the risk of regressions being introduced and these regressions
can come bite us in ways that are very hard to anticipate.

Have a look at some of the bugs in bugzilla (or a quick scan of the gcc-bugs 
list)
for examples of the ways that things can go wrong with any of the myriad of GCC 
components
and the unexpected ways in which they can interact.

For example, I am now working on what I initially thought was a one-liner fix 
for
PR 84164 but it has expanded into a 3-patch series with a midend component and
target-specific changes for 2 ports.

These issues are very hard to catch during review and normal testing, and can 
sometimes take months of deep testing by
fuzzing and massive codebase rebuilds to expose, so the closer the commit is to 
a release
the higher the risk is that an obscure edge case will be unnoticed and unfixed 
in the release.

So the priority at this stage is to minimise the risk of destabilising the 
codebase,
as opposed to taking in new features and desirable performance improvements 
(like your patch!)

That is the rationale for delaying committing such changes until the start
of GCC 9 development. But again, this is up to the aarch64 maintainers.
I'm sure the patch will be a perfectly fine and desirable commit for GCC 9.
This is just my perspective as maintainer of the arm port.


Thanks. Your explanation makes the situation pretty clear and it sounds very 
reasonable. I'll put the patch on hold until development is open again.

Regards,
Luis


With GCC 9 development open, i take it this patch is worth considering again?



Yes, I believe the latest version is at:
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00239.html ?

+(define_insn "*ashift_extv_bfiz"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ashift:GPI (sign_extract:GPI (match_operand:GPI 1 "register_operand" "r")
+  (match_operand 2 "aarch64_simd_shift_imm_offset_" 
"n")
+  (match_operand 3 "aarch64_simd_shift_imm_" "n"))
+ (match_operand 4 "aarch64_simd_shift_imm_" "n")))]
+  ""
+  "sbfiz\\t%0, %1, %4, %2"
+  [(set_attr "type" "bfx")]
+)
+


Indeed.



Can you give a bit more information about what are the values for operands 2,3 
and 4 in your example testcases?


For sbfiz32 we have 3, 0 and 19 respectively. For sbfiz64 we have 6, 0 and 38.


I'm trying to understand why the value of operand 3 (the bit position the 
sign-extract starts from) doesn't get validated
in any way and doesn't play any role in the output...


This may be an oversight. It seems operand 3 will always be 0 in this particular 
case i'm covering. It starts from 0, gets shifted x bits to the left and then y 
< x bits to the right). The operation is essentially an ashift of the bitfield 
followed by a sign-extension of the msb of the bitfield being extracted.

Having a non-zero operand 3 from RTL means the shift amount won't translate 
directly to operand 3 of sbfiz (the position). Then we'd need to do a 
calculation where we take into account operand 3 from RTL.

I'm wondering when such a RTL pattern, with a non-zero operand 3, would be 
generated though.


I think it's best to enforce that operand 3 is a zero. Maybe just match 
const_int 0 here directly.
Better safe than sorry with these things.


Indeed. I've updated the original patch with that change now.

Bootstrapped and regtested on aarch64-linux.



I think you might also want to enforce that the sum of operands 2 and 3 doesn't 
exceed the width of the mode
(see other patterns in aarch64.md that generate bfx-style instructions for 
similar restrictions).


Updated now in the attached patch. Everything still sane with bootstrap and 
testsuite on aarch64-linux.



Thanks!
This looks good to me now.
You'll need a final approval from a maintainer.

Kyrill



Otherwise the patch looks good to me (it will still need approval from a 
maintainer).

For the future I think there's also an opportunity to match the SImode version 
of this pattern that zero-extends to DImode,
making use of the implicit zero

Re: [PR63185][RFC] Improve DSE with branches

2018-05-15 Thread Richard Biener
On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:

> Hi,
> 
> Attached patch handles PR63185 when we reach PHI with temp != NULLL.
> We could see the PHI and if there isn't any uses for PHI that is
> interesting, we could ignore that ?
> 
> Bootstrapped and regression tested on x86_64-linux-gnu.
> Is this OK?

No, as Jeff said we can't do it this way.

If we end up with multiple VDEFs in the walk of defvar immediate
uses we know we are dealing with a CFG fork.  We can't really
ignore any of the paths but we have to

 a) find the merge point (and the associated VDEF)
 b) verify for each each chain of VDEFs with associated VUSEs
up to that merge VDEF that we have no uses of the to classify
store and collect (partial) kills
 c) intersect kill info and continue walking from the merge point

in b) there's the optional possibility to find sinking opportunities
in case we have kills on some paths but uses on others.  This is why
DSE should be really merged with (store) sinking.

So if we want to enhance DSEs handling of branches then we need
to refactor the simple dse_classify_store function.  Let me take
an attempt at this today.

Richard.

> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2018-05-14  Kugan Vivekanandarajah  
> 
> * tree-ssa-dse.c (phi_dosent_define_nor_use_p): New.
> (dse_classify_store): Use phi_dosent_define_nor_use_p.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-05-14  Kugan Vivekanandarajah  
> 
> * gcc.dg/tree-ssa/ssa-dse-33.c: New test.
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][AArch64] Implement usadv16qi and ssadv16qi standard names

2018-05-15 Thread Kyrill Tkachov

I realised I had forgotten to copy the maintainers...

https://gcc.gnu.org/ml/gcc-patches/2018-05/msg00613.html

Thanks,
Kyrill

On 14/05/18 14:38, Kyrill Tkachov wrote:

Hi all,

This patch implements the usadv16qi and ssadv16qi standard names.
See the thread at on g...@gcc.gnu.org [1] for background.

The V16QImode variant is important to get right as it is the most commonly used 
pattern:
reducing vectors of bytes into an int.
The midend expects the optab to compute the absolute differences of operands 1 
and 2 and
reduce them while widening along the way up to SImode. So the inputs are 
V16QImode and
the output is V4SImode.

I've tried out a few different strategies for that, the one I settled with is 
to emit:
UABDL2tmp.8h, op1.16b, op2.16b
UABALtmp.8h, op1.16b, op2.16b
UADALPop3.4s, tmp.8h

To work through the semantics let's say operands 1 and 2 are:
op1 { a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 }
op2 { b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15 }
op3 { c0, c1, c2, c3 }

The UABDL2 takes the upper V8QI elements, computes their absolute differences, 
widens them and stores them into the V8HImode tmp:

tmp { ABS(a[8]-b[8]), ABS(a[9]-b[9]), ABS(a[10]-b[10]), ABS(a[11]-b[11]), 
ABS(a[12]-b[12]), ABS(a[13]-b[13]), ABS(a[14]-b[14]), ABS(a[15]-b[15]) }

The UABAL after that takes the lower V8QI elements, computes their absolute 
differences, widens them and accumulates them into the V8HImode tmp from the 
previous step:

tmp { ABS(a[8]-b[8])+ABS (a[0]-b[0]), ABS(a[9]-b[9])+ABS(a[1]-b[1]), 
ABS(a[10]-b[10])+ABS(a[2]-b[2]), ABS(a[11]-b[11])+ABS(a[3]-b[3]), 
ABS(a[12]-b[12])+ABS(a[4]-b[4]), ABS(a[13]-b[13])+ABS(a[5]-b[5]), 
ABS(a[14]-b[14])+ABS(a[6]-b[6]), ABS(a[15]-b[15])+ABS(a[7]-b[7]) }

Finally the UADALP does a pairwise widening reduction and accumulation into the 
V4SImode op3:
op3 { c0+ABS(a[8]-b[8])+ABS(a[0]-b[0])+ABS(a[9]-b[9])+ABS(a[1]-b[1]), 
c1+ABS(a[10]-b[10])+ABS(a[2]-b[2])+ABS(a[11]-b[11])+ABS(a[3]-b[3]), 
c2+ABS(a[12]-b[12])+ABS(a[4]-b[4])+ABS(a[13]-b[13])+ABS(a[5]-b[5]), 
c3+ABS(a[14]-b[14])+ABS(a[6]-b[6])+ABS(a[15]-b[15])+ABS(a[7]-b[7]) }

(sorry for the text dump)

Remember, according to [1] the exact reduction sequence doesn't matter (for 
integer arithmetic at least).
I've considered other sequences as well (thanks Wilco), for example
* UABD + UADDLP + UADALP
* UABLD2 + UABDL + UADALP + UADALP

I ended up settling in the sequence in this patch as it's short (3 
instructions) and in the future we can potentially
look to optimise multiple occurrences of these into something even faster (for 
example accumulating into H registers for longer
before doing a single UADALP in the end to accumulate into the final S 
register).

If your microarchitecture has some some strong preferences for a particular 
sequence, please let me know or, even better, propose a patch
to parametrise the generation sequence by code (or the appropriate RTX cost).


This expansion allows the vectoriser to avoid unpacking the bytes in two steps 
and performing V4SI arithmetic on them.
So, for the code:

unsigned char pix1[N], pix2[N];

int foo (void)
{
  int i_sum = 0;
  int i;

  for (i = 0; i < 16; i++)
i_sum += __builtin_abs (pix1[i] - pix2[i]);

  return i_sum;
}

we now generate on aarch64:
foo:
adrpx1, pix1
add x1, x1, :lo12:pix1
moviv0.4s, 0
adrpx0, pix2
add x0, x0, :lo12:pix2
ldr q2, [x1]
ldr q3, [x0]
uabdl2  v1.8h, v2.16b, v3.16b
uabal   v1.8h, v2.8b, v3.8b
uadalp  v0.4s, v1.8h
addvs0, v0.4s
umovw0, v0.s[0]
ret


instead of:
foo:
adrpx1, pix1
adrpx0, pix2
add x1, x1, :lo12:pix1
add x0, x0, :lo12:pix2
ldr q0, [x1]
ldr q4, [x0]
ushll   v1.8h, v0.8b, 0
ushll2  v0.8h, v0.16b, 0
ushll   v2.8h, v4.8b, 0
ushll2  v4.8h, v4.16b, 0
usubl   v3.4s, v1.4h, v2.4h
usubl2  v1.4s, v1.8h, v2.8h
usubl   v2.4s, v0.4h, v4.4h
usubl2  v0.4s, v0.8h, v4.8h
abs v3.4s, v3.4s
abs v1.4s, v1.4s
abs v2.4s, v2.4s
abs v0.4s, v0.4s
add v1.4s, v3.4s, v1.4s
add v1.4s, v2.4s, v1.4s
add v0.4s, v0.4s, v1.4s
addvs0, v0.4s
umovw0, v0.s[0]
ret

So I expect this new expansion to be better than the status quo in any case.
Bootstrapped and tested on aarch64-none-linux-gnu.
This gives about 8% on 525.x264_r from SPEC2017 on a Cortex-A72.

Ok for trunk?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc/2018-05/msg00070.html


2018-05-11  Kyrylo Tkachov  

* config/aarch64/aarch64.md ("unspec"): Define UNSPEC_SABAL,
UNSPEC_SABDL2, UNSPEC_SADALP, UNSPEC_UABAL, UNSPEC_UABDL2,
UNSPEC_UADALP values.
* config/aarch64/iterators.md (ABAL): New int iterator.
(ABDL2): Likewise.
(ADALP): Like

[patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2018-05-15 Thread Kyrill Tkachov

Hi all,

This is a respin of James's patch from: 
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
The original patch was approved and committed but was later reverted because of 
failures on big-endian.
This tweaked version fixes the big-endian failures in 
aarch64_expand_vector_init by picking the right
element of VALS to move into the low part of the vector register depending on 
endianness. The rest of the patch
stays the same. I'm looking for approval on the aarch64 parts, as they are the 
ones that have changed
since the last approved version of the patch.

---

In the testcase in this patch we create an SLP vector with only two
elements. Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.

This duplication can be clunky and wasteful.

Better would be to simply use the fact that we will always be
overwriting
the remaining bits, and simply move the first element to the corrcet
place
(implicitly zeroing all other bits).

This reduces the code generation for this case, and can allow more
efficient addressing modes, and other second order benefits for AArch64
code which has been vectorized to V2DI mode.

Note that the change is generic enough to catch the case for any vector
mode, but is expected to be most useful for 2x64-bit vectorization.

Unfortunately, on its own, this would cause failures in
gcc.target/aarch64/load_v2vec_lanes_1.c and
gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
vec_merge and vec_duplicate for their simplifications to apply. To fix
this,
add a special case to the AArch64 code if we are loading from two memory
addresses, and use the load_pair_lanes patterns directly.

We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
, to
catch:

  (vec_merge:OUTER
 (vec_duplicate:OUTER x:INNER)
 (subreg:OUTER y:INNER 0)
 (const_int N))

And simplify it to:

  (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)

This is similar to the existing patterns which are tested in this
function,
without requiring the second operand to also be a vec_duplicate.

Bootstrapped and tested on aarch64-none-linux-gnu and tested on
aarch64-none-elf.

Note that this requires
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
if we don't want to ICE creating broken vector zero extends.

Are the non-AArch64 parts OK?

Thanks,
James

---
2018-05-15  James Greenhalgh  
Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify
code generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify
vec_merge across a vec_duplicate and a paradoxical subreg forming a 
vector
mode to a vec_concat.

2018-05-15  James Greenhalgh  

* gcc.target/aarch64/vect-slp-dup.c: New.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4b5183b602b8786307deb8e3d8056323028b50a2..562eb315f881a1e0d8aa3ba946c99b4c6f25949b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13901,9 +13901,54 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	maxv = matches[i][1];
 	  }
 
-  /* Create a duplicate of the most common element.  */
-  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
-  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
+  /* Create a duplicate of the most common element, unless all elements
+	 are equally useless to us, in which case just immediately set the
+	 vector register using the first element.  */
+
+  if (maxv == 1)
+	{
+	  /* For vectors of two 64-bit elements, we can do even better.  */
+	  if (n_elts == 2
+	  && (inner_mode == E_DImode
+		  || inner_mode == E_DFmode))
+
+	{
+	  rtx x0 = XVECEXP (vals, 0, 0);
+	  rtx x1 = XVECEXP (vals, 0, 1);
+	  /* Combine can pick up this case, but handling it directly
+		 here leaves clearer RTL.
+
+		 This is load_pair_lanes, and also gives us a clean-up
+		 for store_pair_lanes.  */
+	  if (memory_operand (x0, inner_mode)
+		  && memory_operand (x1, inner_mode)
+		  && !STRICT_ALIGNMENT
+		  && rtx_equal_p (XEXP (x1, 0),
+  plus_constant (Pmode,
+		 XEXP (x0, 0),
+		 GET_MODE_SIZE (inner_mode
+		{
+		  rtx t;
+		  if (inner_mode == DFmode)
+		t = gen_load_pair_lanesdf (target, x0, x1);
+		  else
+		t = gen_load_pair_lanesdi (target, x0, x1);
+		  emit_insn (t);
+		  return;
+		}
+	}
+	  /* The subreg-move sequence below will move into lane zero of the
+	 vector register.  For big-endian we want that position to hold
+	 the last element of VALS.  */
+	  maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
+	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+	  aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
+	}
+  else
+

[PATCH][AArch64] Unify vec_set patterns, support floating-point vector modes properly

2018-05-15 Thread Kyrill Tkachov

Hi all,

We've a deficiency in our vec_set family of patterns.
We don't support directly loading a vector lane using LD1 for V2DImode and all 
the vector floating-point modes.
We do do it correctly for the other integer vector modes (V4SI, V8HI etc) 
though.

The alternatives on the relative floating-point patterns only allow a 
register-to-register INS instruction.
That means if we want to load a value into a vector lane we must first load it 
into a scalar register and then
perform an INS, which is wasteful.

There is also an explicit V2DI vec_set expander dangling around for no reason 
that I can see. It seems to do the
exact same things as the other vec_set expanders. This patch removes that.
It now unifies all vec_set expansions into a single "vec_set" 
define_expand using the catch-all VALL_F16 iterator.
I decided to leave two aarch64_simd_vec_set define_insns. One for the 
integer vector modes (that now include V2DI)
and one for the floating-point vector modes. That is so that we can avoid specifying 
"w,r" alternatives for floating-point
modes in case the register-allocator gets confused and starts gratuitously 
moving registers between the two banks.
So the floating-point pattern only two alternatives, one for SIMD-to-SIMD INS 
and one for LD1.

With this patch we avoid loading values into scalar registers and then doing an 
explicit INS on them to move them into
the desired vector lanes. For example for:

typedef float v4sf __attribute__ ((vector_size (16)));
typedef long long v2di __attribute__ ((vector_size (16)));

v2di
foo_v2di (long long *a, long long *b)
{
  v2di res = { *a, *b };
  return res;
}

v4sf
foo_v4sf (float *a, float *b, float *c, float *d)
{
  v4sf res = { *a, *b, *c, *d };
  return res;
}

we currently generate:

foo_v2di:
ldr d0, [x0]
ldr x0, [x1]
ins v0.d[1], x0
ret

foo_v4sf:
ldr s0, [x0]
ldr s3, [x1]
ldr s2, [x2]
ldr s1, [x3]
ins v0.s[1], v3.s[0]
ins v0.s[2], v2.s[0]
ins v0.s[3], v1.s[0]
ret

but with this patch we generate the much cleaner:
foo_v2di:
ldr d0, [x0]
ld1 {v0.d}[1], [x1]
ret

foo_v4sf:
ldr s0, [x0]
ld1 {v0.s}[1], [x1]
ld1 {v0.s}[2], [x2]
ld1 {v0.s}[3], [x3]
ret


Bootstrapped and tested on aarch64-none-linux-gnu and also tested on 
aarch64_be-none-elf.

Ok for trunk?
Thanks,
Kyrill

2018-05-15  Kyrylo Tkachov  

* config/aarch64/aarch64-simd.md (vec_set): Use VALL_F16 mode
iterator.  Delete separate integer-mode vec_set expander.
(aarch64_simd_vec_setv2di): Delete.
(vec_setv2di): Delete.
(aarch64_simd_vec_set, VQDF_F16): Move earlier in file.
Add second memory alternative to emit an LD1.
(aarch64_simd_vec_set, VDQ_BHSI): Change mode iterator to
VDQ_I.  Use vwcore mode attribute in first alternative for operand 1
constraint.

2018-05-15  Kyrylo Tkachov  

* gcc.target/aarch64/vect-init-ld1.c: New test.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 1154fc3d58deaa33413ea3050ff7feec37f092a6..df3fad2d71ed4096accdfdf725e194bf555d40d2 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -694,11 +694,11 @@ (define_insn "one_cmpl2"
 )
 
 (define_insn "aarch64_simd_vec_set"
-  [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w,w,w")
-(vec_merge:VDQ_BHSI
-	(vec_duplicate:VDQ_BHSI
+  [(set (match_operand:VDQ_I 0 "register_operand" "=w,w,w")
+	(vec_merge:VDQ_I
+	(vec_duplicate:VDQ_I
 		(match_operand: 1 "aarch64_simd_general_operand" "r,w,Utv"))
-	(match_operand:VDQ_BHSI 3 "register_operand" "0,0,0")
+	(match_operand:VDQ_I 3 "register_operand" "0,0,0")
 	(match_operand:SI 2 "immediate_operand" "i,i,i")))]
   "TARGET_SIMD"
   {
@@ -707,7 +707,7 @@ (define_insn "aarch64_simd_vec_set"
switch (which_alternative)
  {
  case 0:
-	return "ins\\t%0.[%p2], %w1";
+	return "ins\\t%0.[%p2], %1";
  case 1:
 	return "ins\\t%0.[%p2], %1.[0]";
  case 2:
@@ -719,6 +719,30 @@ (define_insn "aarch64_simd_vec_set"
   [(set_attr "type" "neon_from_gp, neon_ins, neon_load1_one_lane")]
 )
 
+(define_insn "aarch64_simd_vec_set"
+  [(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
+	(vec_merge:VDQF_F16
+	(vec_duplicate:VDQF_F16
+		(match_operand: 1 "aarch64_simd_general_operand" "w,Utv"))
+	(match_operand:VDQF_F16 3 "register_operand" "0,0")
+	(match_operand:SI 2 "immediate_operand" "i,i")))]
+  "TARGET_SIMD"
+  {
+   int elt = ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2])));
+   operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt);
+   switch (which_alternative)
+ {
+ case 0:
+	return "ins\\t%0.[%p2], %1.[0]";
+ case 1:
+	return "ld1\\t{%0.}[%p2], %1";
+ default:
+	gcc_unreachable ();
+ }
+  }
+  [(set_attr "type" "neon_ins, neon_load1_one_lane")]
+)
+
 (define_insn

Re: [PR63185][RFC] Improve DSE with branches

2018-05-15 Thread Richard Biener
On Tue, 15 May 2018, Richard Biener wrote:

> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
> 
> > Hi,
> > 
> > Attached patch handles PR63185 when we reach PHI with temp != NULLL.
> > We could see the PHI and if there isn't any uses for PHI that is
> > interesting, we could ignore that ?
> > 
> > Bootstrapped and regression tested on x86_64-linux-gnu.
> > Is this OK?
> 
> No, as Jeff said we can't do it this way.
> 
> If we end up with multiple VDEFs in the walk of defvar immediate
> uses we know we are dealing with a CFG fork.  We can't really
> ignore any of the paths but we have to
> 
>  a) find the merge point (and the associated VDEF)
>  b) verify for each each chain of VDEFs with associated VUSEs
> up to that merge VDEF that we have no uses of the to classify
> store and collect (partial) kills
>  c) intersect kill info and continue walking from the merge point
> 
> in b) there's the optional possibility to find sinking opportunities
> in case we have kills on some paths but uses on others.  This is why
> DSE should be really merged with (store) sinking.
> 
> So if we want to enhance DSEs handling of branches then we need
> to refactor the simple dse_classify_store function.  Let me take
> an attempt at this today.

First (baby) step is the following - it arranges to collect the
defs we need to continue walking from and implements trivial
reduction by stopping at (full) kills.  This allows us to handle
the new testcase (which was already handled in the very late DSE
pass with the help of sinking the store).

I took the opportunity to kill the use_stmt parameter of
dse_classify_store as the only user is only looking for whether
the kills were all clobbers which I added a new parameter for.

I didn't adjust the byte-tracking case fully (I'm not fully understanding
the code in the case of a use and I'm not sure whether it's worth
doing the def reduction with byte-tracking).

Your testcase can be handled by reducing the PHI and the call def
by seeing that the only use of a candidate def is another def
we have already processed.  Not sure if worth special-casing though,
I'd rather have a go at "recursing".  That will be the next
patch.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-05-15  Richard Biener  

* tree-ssa-dse.c (dse_classify_store): Remove use_stmt parameter,
add by_clobber_p one.  Change algorithm to collect all defs
representing uses we need to walk and try reducing them to
a single one before failing.
(dse_dom_walker::dse_optimize_stmt): Adjust.

* gcc.dg/tree-ssa/ssa-dse-31.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
new file mode 100644
index 000..9ea9268fe1d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1-details" } */
+
+void g();
+void f(int n, char *p)
+{
+  *p = 0;
+  if (n)
+{
+  *p = 1;
+  g();
+}
+  *p = 2;
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 9220fea7f2e..126592a1d13 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -516,18 +516,21 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap 
live)
 }
 
 /* A helper of dse_optimize_stmt.
-   Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate
-   statement *USE_STMT that may prove STMT to be dead.
-   Return TRUE if the above conditions are met, otherwise FALSE.  */
+   Given a GIMPLE_ASSIGN in STMT that writes to REF, classify it
+   according to downstream uses and defs.  Sets *BY_CLOBBER_P to true
+   if only clobber statements influenced the classification result.
+   Returns the classification.  */
 
 static dse_store_status
-dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
-   bool byte_tracking_enabled, sbitmap live_bytes)
+dse_classify_store (ao_ref *ref, gimple *stmt,
+   bool byte_tracking_enabled, sbitmap live_bytes,
+   bool *by_clobber_p = NULL)
 {
   gimple *temp;
   unsigned cnt = 0;
 
-  *use_stmt = NULL;
+  if (by_clobber_p)
+*by_clobber_p = true;
 
   /* Find the first dominated statement that clobbers (part of) the
  memory stmt stores to with no intermediate statement that may use
@@ -551,7 +554,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple 
**use_stmt,
   else
defvar = gimple_vdef (temp);
   defvar_def = temp;
-  temp = NULL;
+  auto_vec defs;
   FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
{
  cnt++;
@@ -569,9 +572,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple 
**use_stmt,
 See gcc.c-torture/execute/20051110-*.c.  */
  else if (gimple_code (use_stmt) == GIMPLE_PHI)
{
- if (temp
- /* Make sure we are not in a l

Re: [PATCH 1/2] Introduce prefetch-minimum stride option

2018-05-15 Thread Kyrill Tkachov

Hi Luis,

On 14/05/18 22:18, Luis Machado wrote:

Hi,

Here's an updated version of the patch (now reverted) that addresses the 
previous bootstrap problem (signedness and long long/int conversion).

I've checked that it bootstraps properly on both aarch64-linux and x86_64-linux 
and that tests look sane.

James, would you please give this one a try to see if you can still reproduce 
PR85682? I couldn't reproduce it in multiple attempts.



The patch doesn't hit the regressions in PR85682 from what I can see.
I have a comment on the patch below.


Thanks,
Luis

On 01/22/2018 11:46 AM, Luis Machado wrote:

This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:

* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.

* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.

The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux.

Ok?





--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
 static bool
 should_issue_prefetch_p (struct mem_ref *ref)
 {
+  /* Some processors may have a hardware prefetcher that may conflict with
+ prefetch hints for a range of strides.  Make sure we don't issue
+ prefetches for such cases if the stride is within this particular
+ range.  */
+  if (cst_and_fits_in_hwi (ref->group->step)
+  && abs_hwi (int_cst_value (ref->group->step)) <
+ (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
+{

The '<' should go on the line below together with PREFETCH_MINIMUM_STRIDE.

Thanks,
Kyrill




Re: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful

2018-05-15 Thread Richard Biener
On Tue, May 15, 2018 at 10:20 AM Kyrill Tkachov

wrote:

> Hi all,

> This is a respin of James's patch from:
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> The original patch was approved and committed but was later reverted
because of failures on big-endian.
> This tweaked version fixes the big-endian failures in
aarch64_expand_vector_init by picking the right
> element of VALS to move into the low part of the vector register
depending on endianness. The rest of the patch
> stays the same. I'm looking for approval on the aarch64 parts, as they
are the ones that have changed
> since the last approved version of the patch.

> ---

> In the testcase in this patch we create an SLP vector with only two
> elements. Our current vector initialisation code will first duplicate
> the first element to both lanes, then overwrite the top lane with a new
> value.

> This duplication can be clunky and wasteful.

> Better would be to simply use the fact that we will always be
> overwriting
> the remaining bits, and simply move the first element to the corrcet
> place
> (implicitly zeroing all other bits).

> This reduces the code generation for this case, and can allow more
> efficient addressing modes, and other second order benefits for AArch64
> code which has been vectorized to V2DI mode.

> Note that the change is generic enough to catch the case for any vector
> mode, but is expected to be most useful for 2x64-bit vectorization.

> Unfortunately, on its own, this would cause failures in
> gcc.target/aarch64/load_v2vec_lanes_1.c and
> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
> vec_merge and vec_duplicate for their simplifications to apply. To fix
> this,
> add a special case to the AArch64 code if we are loading from two memory
> addresses, and use the load_pair_lanes patterns directly.

> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
> , to
> catch:

> (vec_merge:OUTER
>(vec_duplicate:OUTER x:INNER)
>(subreg:OUTER y:INNER 0)
>(const_int N))

> And simplify it to:

> (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)

> This is similar to the existing patterns which are tested in this
> function,
> without requiring the second operand to also be a vec_duplicate.

> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
> aarch64-none-elf.

> Note that this requires
> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> if we don't want to ICE creating broken vector zero extends.

> Are the non-AArch64 parts OK?

Is (vec_merge (subreg ..) (vec_duplicate)) canonicalized to the form
you handle?  I see the (vec_merge (vec_duplicate...) (vec_concat)) case
also doesn't handle the swapped operand case.

Otherwise the middle-end parts looks ok.

Thanks,
Richard.

> Thanks,
> James

> ---
> 2018-05-15  James Greenhalgh  
>   Kyrylo Tkachov  

>   * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify
>   code generation for cases where splatting a value is not useful.
>   * simplify-rtx.c (simplify_ternary_operation): Simplify
>   vec_merge across a vec_duplicate and a paradoxical subreg
forming a vector
>   mode to a vec_concat.

> 2018-05-15  James Greenhalgh  

>   * gcc.target/aarch64/vect-slp-dup.c: New.


Re: Handle vector boolean types when calculating the SLP unroll factor

2018-05-15 Thread Richard Biener
On Thu, May 10, 2018 at 8:31 AM Richard Sandiford <
richard.sandif...@linaro.org> wrote:

> Richard Biener  writes:
> > On Wed, May 9, 2018 at 1:29 PM, Richard Sandiford
> >  wrote:
> >> Richard Biener  writes:
> >>> On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford
> >>>  wrote:
>  The SLP unrolling factor is calculated by finding the smallest
>  scalar type for each SLP statement and taking the number of required
>  lanes from the vector versions of those scalar types.  E.g. for an
>  int32->int64 conversion, it's the vector of int32s rather than the
>  vector of int64s that determines the unroll factor.
> 
>  We rely on tree-vect-patterns.c to replace boolean operations like:
> 
> bool a, b, c;
> a = b & c;
> 
>  with integer operations of whatever the best size is in context.
>  E.g. if b and c are fed by comparisons of ints, a, b and c will
become
>  the appropriate size for an int comparison.  For most targets this
means
>  that a, b and c will end up as int-sized themselves, but on targets
like
>  SVE and AVX512 with packed vector booleans, they'll instead become a
>  small bitfield like :1, padded to a byte for memory purposes.
>  The SLP code would then take these scalar types and try to calculate
>  the vector type for them, causing the unroll factor to be much higher
>  than necessary.
> 
>  This patch makes SLP use the cached vector boolean type if that's
>  appropriate.  Tested on aarch64-linux-gnu (with and without SVE),
>  aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?
> 
>  Richard
> 
> 
>  2018-05-09  Richard Sandiford  
> 
>  gcc/
>  * tree-vect-slp.c (get_vectype_for_smallest_scalar_type):
New function.
>  (vect_build_slp_tree_1): Use it when calculating the unroll
factor.
> 
>  gcc/testsuite/
>  * gcc.target/aarch64/sve/vcond_10.c: New test.
>  * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
>  * gcc.target/aarch64/sve/vcond_11.c: Likewise.
>  * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
> 
>  Index: gcc/tree-vect-slp.c
>  ===
>  --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100
>  +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100
>  @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,
> return true;
>   }
> 
>  +/* Return the vector type associated with the smallest scalar type
in STMT.  */
>  +
>  +static tree
>  +get_vectype_for_smallest_scalar_type (gimple *stmt)
>  +{
>  +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>  +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>  +  if (vectype != NULL_TREE
>  +  && VECTOR_BOOLEAN_TYPE_P (vectype))
> >>>
> >>> Hum.  At this point you can't really rely on vector types being set...
> >>
> >> Not for everything, but here we only care about the result of the
> >> pattern replacements, and pattern replacements do set the vector type
> >> up-front.  vect_determine_vectorization_factor (which runs earlier
> >> for loop vectorisation) also relies on this.
> >>
>  +{
>  +  /* The result of a vector boolean operation has the smallest
scalar
>  +type unless the statement is extending an even narrower
boolean.  */
>  +  if (!gimple_assign_cast_p (stmt))
>  +   return vectype;
>  +
>  +  tree src = gimple_assign_rhs1 (stmt);
>  +  gimple *def_stmt;
>  +  enum vect_def_type dt;
>  +  tree src_vectype = NULL_TREE;
>  +  if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt,
>  + &src_vectype)
>  + && src_vectype
>  + && VECTOR_BOOLEAN_TYPE_P (src_vectype))
>  +   {
>  + if (TYPE_PRECISION (TREE_TYPE (src_vectype))
>  + < TYPE_PRECISION (TREE_TYPE (vectype)))
>  +   return src_vectype;
>  + return vectype;
>  +   }
>  +}
>  +  HOST_WIDE_INT dummy;
>  +  tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
&dummy);
>  +  return get_vectype_for_scalar_type (scalar_type);
>  +}
>  +
>   /* Verify if the scalar stmts STMTS are isomorphic, require data
>  permutation or are of unsupported types of operation.  Return
>  true if they are, otherwise return false and indicate in *MATCHES
>  @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,
> enum tree_code first_cond_code = ERROR_MARK;
> tree lhs;
> bool need_same_oprnds = false;
>  -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
>  +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
> optab optab;
> int icode;
> machine_mode optab_op2_mode;
>   

Re: Replace FMA_EXPR with one internal fn per optab

2018-05-15 Thread Richard Biener
On Fri, May 11, 2018 at 7:15 PM Richard Sandiford <
richard.sandif...@linaro.org> wrote:

> There are four optabs for various forms of fused multiply-add:
> fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
> representation.  For the other three we relied on special pattern-
> matching during expand, although tree-ssa-math-opts.c did have
> some code to try to second-guess what expand would do.

> This patch removes the old FMA_EXPR representation of fma and
> introduces four new internal functions, one for each optab.
> IFN_FMA is tied to BUILT_IN_FMA* while the other three are
> independent directly-mapped internal functions.  It's then
> possible to do the pattern-matching in match.pd and
> tree-ssa-math-opts.c (via folding) can select the exact
> FMA-based operation.

> The patch removes the gimple FE support for __FMA rather than mapping
> it to the internal function.  There's no reason now to treat it
> differently from other internal functions (although the FE doesn't
> handle those yet).

> The BRIG & HSA parts are a best guess, but seem relatively simple.

> The genmatch.c changes are structured to allow ternary ops in which
> the second two rather than the first two operands are commutative.
> A later patch makes use of this.

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf,
> x86_64-linux-gnu and powerpc64le-linux-gnu.  OK to install?

Comment below

> Richard


> 2018-05-11  Richard Sandiford  

> gcc/
>  * doc/sourcebuild.texi (all_scalar_fma): Document.
>  * tree.def (FMA_EXPR): Delete.
>  * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
>  * internal-fn.c (ternary_direct): New macro.
>  (expand_ternary_optab_fn): Likewise.
>  (direct_ternary_optab_supported_p): Likewise.
>  * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
>  * builtins.c (fold_builtin_fma): Delete.
>  (fold_builtin_3): Don't call it.
>  * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
>  * expr.c (expand_expr_real_2): Likewise.
>  * fold-const.c (operand_equal_p): Likewise.
>  (fold_ternary_loc): Likewise.
>  * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>  * gimple.c (DEFTREECODE): Likewise.
>  * gimplify.c (gimplify_expr): Likewise.
>  * optabs-tree.c (optab_for_tree_code): Likewise.
>  * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>  * tree-eh.c (operation_could_trap_p): Likewise.
>  (stmt_could_throw_1_p): Likewise.
>  * tree-inline.c (estimate_operator_cost): Likewise.
>  * tree-pretty-print.c (dump_generic_node): Likewise.
>  (op_code_prio): Likewise.
>  * tree-ssa-loop-im.c (stmt_cost): Likewise.
>  * tree-ssa-operands.c (get_expr_operands): Likewise.
>  * tree.c (commutative_ternary_tree_code, add_expr): Likewise.
>  * fold-const-call.h (fold_fma): Delete.
>  * fold-const-call.c (fold_const_call_): Handle CFN_FMS,
>  CFN_FNMA and CFN_FNMS.
>  (fold_fma): Delete.
>  * genmatch.c (combined_fn): New enum.
>  (commutative_ternary_tree_code): Remove FMA_EXPR handling.
>  (commutative_op): New function.
>  (commutate): Use it.  Handle more than 2 operands.
>  (dt_operand::gen_gimple_expr): Use commutative_op.
>  (parser::parse_expr): Allow :c to be used with non-binary
>  operators if the commutative operand is known.
>  * gimple-ssa-backprop.c (backprop::process_builtin_call_use):
Handle
>  CFN_FMS, CFN_FNMA and CFN_FNMS.
>  (backprop::process_assign_use): Remove FMA_EXPR handling.
>  * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
>  (gen_hsa_fma): New function.
>  (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
>  IFN_FNMA and IFN_FNMS.
>  * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
>  * tree-ssa-math-opts.c (aggressive_valueize): New function.
>  (convert_mult_to_fma_1): Use the gimple_build interface and use
>  aggerssive_valueize to fold the result.
>  (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
>  instead of checking for optabs directly.
>  * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
>  rather than FMA_EXPRs.
>  * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
>  call to IFN_FMA instead of an FMA_EXPR.

> gcc/brig/
>  * brigfrontend/brig-function.cc
>  (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
>  for BRIG_OPCODE_FMA.
>  (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
>  as a call.

> gcc/c/
>  * gimple-parser.c (c_parser_gimple_postfix_expression): Remove
>  __FMA_EXPR handlng.

> gcc/cp/
>  * co

[PATCH ARM] Fix armv8-m multilib build failure with stdint.h

2018-05-15 Thread Jérôme Lambourg
Hello gcc-patch list,

While trying to build a x86_64-linux hosted cross arm bare metal compiler with
both --with-multilib-list=rmprofile and --without-headers, the libgcc build
fails while trying to build the armv8-m variant with the following message:

In file included from .../build/gcc/include/arm_cmse.h:38,
 from .../gcc/libgcc/config/arm/cmse.c:27:
.../build/gcc/include/stdint.h:9:16: fatal error: stdint.h: No such file or 
directory
 # include_next 
^~

Attached is a patch that removes the dependency over stdint.h (to respect
--without-headers) and replaces uintptr_t and UINTPTR_MAX in cmse.c with their
equivalent gcc predefined macros, respectively __UINTPTR_TYPE__ and
__UINTPTR_MAX__.

For the record, I configured gcc (r260250) with the following command:
../gcc/configure  --enable-languages=c --with-gnu-as --with-gnu-ld \
  --with-multilib-list=rmprofile --disable-nls --without-libiconv-prefix \
  --enable-multilib --target=arm-eabi --without-headers

Unfortunately, I don't have an armv8-m testing environment, so could only check
that the libgcc now builds fine. But the patch is small and simple enough that
I'm confident the generated code is equivalent to the original intent.

2018-05-15  Jerome Lambourg  

gcc/
* config/arm/arm_cmse.h: Remove #include 

libgcc/
* config/arm/cmse.c (cmse_check_address_range): Replace
UINTPTR_MAX with __UINTPTR_MAX__ and uintptr_t with __UINTPTR_TYPE__.



gcc.patch
Description: Binary data


Re: [PATCH ARM] Fix armv8-m multilib build failure with stdint.h

2018-05-15 Thread Kyrill Tkachov

Hi Jérôme,

On 15/05/18 11:53, Jérôme Lambourg wrote:

Hello gcc-patch list,

While trying to build a x86_64-linux hosted cross arm bare metal compiler with
both --with-multilib-list=rmprofile and --without-headers, the libgcc build
fails while trying to build the armv8-m variant with the following message:

In file included from .../build/gcc/include/arm_cmse.h:38,
 from .../gcc/libgcc/config/arm/cmse.c:27:
.../build/gcc/include/stdint.h:9:16: fatal error: stdint.h: No such file or 
directory
 # include_next 
^~

Attached is a patch that removes the dependency over stdint.h (to respect
--without-headers) and replaces uintptr_t and UINTPTR_MAX in cmse.c with their
equivalent gcc predefined macros, respectively __UINTPTR_TYPE__ and
__UINTPTR_MAX__.

For the record, I configured gcc (r260250) with the following command:
../gcc/configure  --enable-languages=c --with-gnu-as --with-gnu-ld \
  --with-multilib-list=rmprofile --disable-nls --without-libiconv-prefix \
  --enable-multilib --target=arm-eabi --without-headers

Unfortunately, I don't have an armv8-m testing environment, so could only check
that the libgcc now builds fine. But the patch is small and simple enough that
I'm confident the generated code is equivalent to the original intent.



Thanks for the patch! To validate it your changes you can also look at the 
disassembly
of the cmse.c binary in the build tree. If the binary changes with your patch 
then that
would indicate some trouble. I've got another comment below.


2018-05-15  Jerome Lambourg  

gcc/
* config/arm/arm_cmse.h: Remove #include 

libgcc/
* config/arm/cmse.c (cmse_check_address_range): Replace
UINTPTR_MAX with __UINTPTR_MAX__ and uintptr_t with __UINTPTR_TYPE__.




Index: gcc/config/arm/arm_cmse.h
===
--- gcc/config/arm/arm_cmse.h   (revision 260250)
+++ gcc/config/arm/arm_cmse.h   (working copy)
@@ -35,7 +35,6 @@
 #if __ARM_FEATURE_CMSE & 1
 
 #include 

-#include 
 


There are places in arm_cmse.h that use intptr_t. You should replace those as 
well.
Look for the cmse_nsfptr_create and cmse_is_nsfptr macros...

Thanks,
Kyrill




Re: [PATCH 1/2] Introduce prefetch-minimum stride option

2018-05-15 Thread Luis Machado

Hi,

On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:

Hi Luis,

On 14/05/18 22:18, Luis Machado wrote:

Hi,

Here's an updated version of the patch (now reverted) that addresses 
the previous bootstrap problem (signedness and long long/int conversion).


I've checked that it bootstraps properly on both aarch64-linux and 
x86_64-linux and that tests look sane.


James, would you please give this one a try to see if you can still 
reproduce PR85682? I couldn't reproduce it in multiple attempts.




The patch doesn't hit the regressions in PR85682 from what I can see.
I have a comment on the patch below.



Great. Thanks for checking Kyrill.


--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
  static bool
  should_issue_prefetch_p (struct mem_ref *ref)
  {
+  /* Some processors may have a hardware prefetcher that may conflict with
+ prefetch hints for a range of strides.  Make sure we don't issue
+ prefetches for such cases if the stride is within this particular
+ range.  */
+  if (cst_and_fits_in_hwi (ref->group->step)
+  && abs_hwi (int_cst_value (ref->group->step)) <
+  (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
+    {

The '<' should go on the line below together with PREFETCH_MINIMUM_STRIDE.


I've fixed this locally now.


Re: Replace FMA_EXPR with one internal fn per optab

2018-05-15 Thread Martin Jambor
Hi,

On Fri, May 11 2018, Richard Sandiford wrote:
> There are four optabs for various forms of fused multiply-add:
> fma, fms, fnma and fnms.  Of these, only fma had a direct gimple
> representation.  For the other three we relied on special pattern-
> matching during expand, although tree-ssa-math-opts.c did have
> some code to try to second-guess what expand would do.
>
> This patch removes the old FMA_EXPR representation of fma and
> introduces four new internal functions, one for each optab.
> IFN_FMA is tied to BUILT_IN_FMA* while the other three are
> independent directly-mapped internal functions.  It's then
> possible to do the pattern-matching in match.pd and
> tree-ssa-math-opts.c (via folding) can select the exact
> FMA-based operation.
>
> The patch removes the gimple FE support for __FMA rather than mapping
> it to the internal function.  There's no reason now to treat it
> differently from other internal functions (although the FE doesn't
> handle those yet).
>
> The BRIG & HSA parts are a best guess, but seem relatively simple.

Both parts are OK.

Thanks,

Martin


>
> The genmatch.c changes are structured to allow ternary ops in which
> the second two rather than the first two operands are commutative.
> A later patch makes use of this.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf,
> x86_64-linux-gnu and powerpc64le-linux-gnu.  OK to install?
>
> Richard
>
>
> 2018-05-11  Richard Sandiford  
>
> gcc/
>   * doc/sourcebuild.texi (all_scalar_fma): Document.
>   * tree.def (FMA_EXPR): Delete.
>   * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions.
>   * internal-fn.c (ternary_direct): New macro.
>   (expand_ternary_optab_fn): Likewise.
>   (direct_ternary_optab_supported_p): Likewise.
>   * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h.
>   * builtins.c (fold_builtin_fma): Delete.
>   (fold_builtin_3): Don't call it.
>   * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling.
>   * expr.c (expand_expr_real_2): Likewise.
>   * fold-const.c (operand_equal_p): Likewise.
>   (fold_ternary_loc): Likewise.
>   * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
>   * gimple.c (DEFTREECODE): Likewise.
>   * gimplify.c (gimplify_expr): Likewise.
>   * optabs-tree.c (optab_for_tree_code): Likewise.
>   * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>   * tree-eh.c (operation_could_trap_p): Likewise.
>   (stmt_could_throw_1_p): Likewise.
>   * tree-inline.c (estimate_operator_cost): Likewise.
>   * tree-pretty-print.c (dump_generic_node): Likewise.
>   (op_code_prio): Likewise.
>   * tree-ssa-loop-im.c (stmt_cost): Likewise.
>   * tree-ssa-operands.c (get_expr_operands): Likewise.
>   * tree.c (commutative_ternary_tree_code, add_expr): Likewise.
>   * fold-const-call.h (fold_fma): Delete.
>   * fold-const-call.c (fold_const_call_): Handle CFN_FMS,
>   CFN_FNMA and CFN_FNMS.
>   (fold_fma): Delete.
>   * genmatch.c (combined_fn): New enum.
>   (commutative_ternary_tree_code): Remove FMA_EXPR handling.
>   (commutative_op): New function.
>   (commutate): Use it.  Handle more than 2 operands.
>   (dt_operand::gen_gimple_expr): Use commutative_op.
>   (parser::parse_expr): Allow :c to be used with non-binary
>   operators if the commutative operand is known.
>   * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle
>   CFN_FMS, CFN_FNMA and CFN_FNMS.
>   (backprop::process_assign_use): Remove FMA_EXPR handling.
>   * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise.
>   (gen_hsa_fma): New function.
>   (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS,
>   IFN_FNMA and IFN_FNMS.
>   * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS.
>   * tree-ssa-math-opts.c (aggressive_valueize): New function.
>   (convert_mult_to_fma_1): Use the gimple_build interface and use
>   aggerssive_valueize to fold the result.
>   (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p
>   instead of checking for optabs directly.
>   * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls
>   rather than FMA_EXPRs.
>   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a
>   call to IFN_FMA instead of an FMA_EXPR.
>
> gcc/brig/
>   * brigfrontend/brig-function.cc
>   (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA
>   for BRIG_OPCODE_FMA.
>   (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA
>   as a call.
>
> gcc/c/
>   * gimple-parser.c (c_parser_gimple_postfix_expression): Remove
>   __FMA_EXPR handlng.
>
> gcc/cp/
>   * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling.
>   (potential_constant_expression_1): Likewise.
>
> gcc/testsuite/
>   * lib/target-supports.exp (check_effectiv

[PATCH] Qualify std::__invoke in to prevent ADL

2018-05-15 Thread Jonathan Wakely

Calling unqualified __invoke can't find the wrong function, because
users can't use that name in their own namespaces, but qualifying it
should make name lookup slightly faster.

* include/std/variant (__gen_vtable_impl::__visit_invoke): Qualify
__invoke to prevent ADL.

Tested powerpc64le-linux, committed to trunk.

commit c2f226ebd16d0f2f6cba92cd87f55282697bbd3a
Author: Jonathan Wakely 
Date:   Tue May 15 01:05:39 2018 +0100

Qualify std::__invoke in  to prevent ADL

* include/std/variant (__gen_vtable_impl::__visit_invoke): Qualify
__invoke to prevent ADL.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 40b3b566938..c0212404bb2 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -838,9 +838,8 @@ namespace __variant
   decltype(auto)
   static constexpr __visit_invoke(_Visitor&& __visitor, _Variants... 
__vars)
   {
-   return __invoke(std::forward<_Visitor>(__visitor),
-   std::get<__indices>(
-   std::forward<_Variants>(__vars))...);
+   return std::__invoke(std::forward<_Visitor>(__visitor),
+   std::get<__indices>(std::forward<_Variants>(__vars))...);
   }
 
   static constexpr auto


[PATCH] PR libstdc++/84159 fix appending strings to paths

2018-05-15 Thread Jonathan Wakely

The path::operator/=(const Source&) and path::append overloads were
still following the semantics of the Filesystem TS not C++17. Only
the path::operator/=(const path&) overload was correct.

This change adds more tests for path::operator/=(const path&) and adds
new tests to verify that the other append operations have equivalent
behaviour.

PR libstdc++/84159
* include/bits/fs_path.h (path::operator/=, path::append): Construct
temporary path before calling _M_append.
(path::_M_append): Change parameter to path and implement C++17
semantics.
* testsuite/27_io/filesystem/path/append/path.cc: Add helper function
and more examples from the standard.
* testsuite/27_io/filesystem/path/append/source.cc: New.
* testsuite/27_io/filesystem/path/decompose/filename.cc: Add comment.
* testsuite/27_io/filesystem/path/nonmember/append.cc: New.

Tested powerpc64le-linux, committed to trunk and will backport to
gcc-8-branch too.


commit 0dbff221ca95e7fa4260fded79373c36b90019de
Author: Jonathan Wakely 
Date:   Tue May 15 01:09:20 2018 +0100

PR libstdc++/84159 fix appending strings to paths

The path::operator/=(const Source&) and path::append overloads were
still following the semantics of the Filesystem TS not C++17. Only
the path::operator/=(const path&) overload was correct.

This change adds more tests for path::operator/=(const path&) and adds
new tests to verify that the other append operations have equivalent
behaviour.

PR libstdc++/84159
* include/bits/fs_path.h (path::operator/=, path::append): Construct
temporary path before calling _M_append.
(path::_M_append): Change parameter to path and implement C++17
semantics.
* testsuite/27_io/filesystem/path/append/path.cc: Add helper 
function
and more examples from the standard.
* testsuite/27_io/filesystem/path/append/source.cc: New.
* testsuite/27_io/filesystem/path/decompose/filename.cc: Add 
comment.
* testsuite/27_io/filesystem/path/nonmember/append.cc: New.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 6703e9167e4..53bf237b547 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -265,20 +265,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 template 
   _Path<_Source>&
   operator/=(_Source const& __source)
-  { return append(__source); }
+  { return _M_append(path(__source)); }
 
 template
   _Path<_Source>&
   append(_Source const& __source)
-  {
-   return _M_append(_S_convert(_S_range_begin(__source),
-   _S_range_end(__source)));
-  }
+  { return _M_append(path(__source)); }
 
 template
   _Path<_InputIterator, _InputIterator>&
   append(_InputIterator __first, _InputIterator __last)
-  { return _M_append(_S_convert(__first, __last)); }
+  { return _M_append(path(__first, __last)); }
 
 // concatenation
 
@@ -406,17 +403,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 enum class _Split { _Stem, _Extension };
 
-path& _M_append(string_type&& __str)
+path&
+_M_append(path __p)
 {
+  if (__p.is_absolute())
+   operator=(std::move(__p));
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-  operator/=(path(std::move(__str)));
-#else
-  if (!_M_pathname.empty() && !_S_is_dir_sep(_M_pathname.back())
- && (__str.empty() || !_S_is_dir_sep(__str.front(
-   _M_pathname += preferred_separator;
-  _M_pathname += __str;
-  _M_split_cmpts();
+  else if (__p.has_root_name() && __p.root_name() != root_name())
+   operator=(std::move(__p));
 #endif
+  else
+   operator/=(const_cast(__p));
   return *this;
 }
 
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/append/path.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/append/path.cc
index 3bc135c6cd2..0330bcf6c88 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/append/path.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/append/path.cc
@@ -19,7 +19,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// 30.10.7.4.3 path appends [fs.path.append]
+// C++17 30.10.8.4.3 path appends [fs.path.append]
 
 #include 
 #include 
@@ -28,56 +28,75 @@
 using std::filesystem::path;
 using __gnu_test::compare_paths;
 
+// path::operator/=(const path&)
+
+path append(path l, const path& r)
+{
+  l /= r;
+  return l;
+}
+
 void
 test01()
 {
-  const path p("/foo/bar");
+  compare_paths( append("/foo/bar", "/foo/"), "/foo/" );
 
-  path pp = p;
-  pp /= p;
-  compare_paths( pp, p );
+#ifndef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  compare_paths( append("baz", "baz"), "baz/baz" );
+#else
+  compare_paths( append("baz", "baz"), "baz\\baz" );
+#endif
+  compare_paths( append("baz/", "baz"), "baz/ba

[C++ Patch] Add DECL_MAYBE_IN_CHARGE_CDTOR_P

2018-05-15 Thread Paolo Carlini

Hi,

noticed a few times while working on various issues, maybe we want to 
add the macro now? Tested x86_64-linux.


Thanks, Paolo.



2018-05-15  Paolo Carlini  

* cp-tree.h (DECL_MAYBE_IN_CHARGE_CDTOR_P): New.
(FOR_EACH_CLONE): Update.
* decl.c (grokdeclarator): Use it.
* decl2.c (vague_linkage_p): Likewise.
* mangle.c (mangle_decl): Likewise.
* method.c (lazily_declare_fn): Likewise.
* optimize.c (can_alias_cdtor, maybe_clone_body): Likewise.
* repo.c (repo_emit_p): Likewise.
* tree.c (decl_linkage): Likewise.
Index: cp-tree.h
===
--- cp-tree.h   (revision 260252)
+++ cp-tree.h   (working copy)
@@ -2783,6 +2783,12 @@ struct GTY(()) lang_decl {
 #define DECL_DELETING_DESTRUCTOR_P(NODE)   \
   (DECL_NAME (NODE) == deleting_dtor_identifier)
 
+/* Nonzero if either DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P or
+   DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P is true of NODE.  */
+#define DECL_MAYBE_IN_CHARGE_CDTOR_P(NODE)  \
+  (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (NODE)\
+   || DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (NODE))
+
 /* Nonzero if NODE (a FUNCTION_DECL) is a cloned constructor or
destructor.  */
 #define DECL_CLONED_FUNCTION_P(NODE) (!!decl_cloned_function_p (NODE, true))
@@ -2800,8 +2806,7 @@ struct GTY(()) lang_decl {
   */
 #define FOR_EACH_CLONE(CLONE, FN)  \
   if (!(TREE_CODE (FN) == FUNCTION_DECL\
-   && (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (FN) \
-   || DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (FN\
+   && DECL_MAYBE_IN_CHARGE_CDTOR_P (FN)))  \
 ;  \
   else \
 for (CLONE = DECL_CHAIN (FN);  \
Index: decl.c
===
--- decl.c  (revision 260252)
+++ decl.c  (working copy)
@@ -11721,9 +11721,7 @@ grokdeclarator (const cp_declarator *declarator,
{
  if (!current_function_decl)
DECL_CONTEXT (decl) = FROB_CONTEXT (current_namespace);
- else if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (current_function_decl)
-  || (DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P
-  (current_function_decl)))
+ else if (DECL_MAYBE_IN_CHARGE_CDTOR_P (current_function_decl))
/* The TYPE_DECL is "abstract" because there will be
   clones of this constructor/destructor, and there will
   be copies of this TYPE_DECL generated in those
Index: decl2.c
===
--- decl2.c (revision 260252)
+++ decl2.c (working copy)
@@ -1933,8 +1933,7 @@ vague_linkage_p (tree decl)
 maybe-in-charge 'tor variants; in that case we need to check one of
 the "clones" for the real linkage.  But only in that case; before
 maybe_clone_body we haven't yet copied the linkage to the clones.  */
-  if ((DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (decl)
-  || DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (decl))
+  if (DECL_MAYBE_IN_CHARGE_CDTOR_P (decl)
  && !DECL_ABSTRACT_P (decl)
  && DECL_CHAIN (decl)
  && DECL_CLONED_FUNCTION_P (DECL_CHAIN (decl)))
Index: mangle.c
===
--- mangle.c(revision 260252)
+++ mangle.c(working copy)
@@ -3862,8 +3862,7 @@ mangle_decl (const tree decl)
   if (id != DECL_NAME (decl)
   /* Don't do this for a fake symbol we aren't going to emit anyway.  */
   && TREE_CODE (decl) != TYPE_DECL
-  && !DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (decl)
-  && !DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (decl))
+  && !DECL_MAYBE_IN_CHARGE_CDTOR_P (decl))
 {
   int save_ver = flag_abi_version;
   tree id2 = NULL_TREE;
Index: method.c
===
--- method.c(revision 260252)
+++ method.c(working copy)
@@ -2422,8 +2422,7 @@ lazily_declare_fn (special_function_kind sfk, tree
   fixup_type_variants (type);
 
   maybe_add_class_template_decl_list (type, fn, /*friend_p=*/0);
-  if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn)
-  || DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (fn))
+  if (DECL_MAYBE_IN_CHARGE_CDTOR_P (fn))
 /* Create appropriate clones.  */
 clone_function_decl (fn, /*update_methods=*/true);
 
Index: optimize.c
===
--- optimize.c  (revision 260252)
+++ optimize.c  (working copy)
@@ -194,8 +194,7 @@ can_alias_cdtor (tree fn)
   /* ??? Why not use aliases with -frepo?  */
   if (flag_use_repository)
 return false;
-  gcc_assert (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn)
- || DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (fn));
+  gcc_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (fn));
   /* 

Re: [PR63185][RFC] Improve DSE with branches

2018-05-15 Thread Richard Biener
On Tue, 15 May 2018, Richard Biener wrote:

> On Tue, 15 May 2018, Richard Biener wrote:
> 
> > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
> > 
> > > Hi,
> > > 
> > > Attached patch handles PR63185 when we reach PHI with temp != NULLL.
> > > We could see the PHI and if there isn't any uses for PHI that is
> > > interesting, we could ignore that ?
> > > 
> > > Bootstrapped and regression tested on x86_64-linux-gnu.
> > > Is this OK?
> > 
> > No, as Jeff said we can't do it this way.
> > 
> > If we end up with multiple VDEFs in the walk of defvar immediate
> > uses we know we are dealing with a CFG fork.  We can't really
> > ignore any of the paths but we have to
> > 
> >  a) find the merge point (and the associated VDEF)
> >  b) verify for each each chain of VDEFs with associated VUSEs
> > up to that merge VDEF that we have no uses of the to classify
> > store and collect (partial) kills
> >  c) intersect kill info and continue walking from the merge point
> > 
> > in b) there's the optional possibility to find sinking opportunities
> > in case we have kills on some paths but uses on others.  This is why
> > DSE should be really merged with (store) sinking.
> > 
> > So if we want to enhance DSEs handling of branches then we need
> > to refactor the simple dse_classify_store function.  Let me take
> > an attempt at this today.
> 
> First (baby) step is the following - it arranges to collect the
> defs we need to continue walking from and implements trivial
> reduction by stopping at (full) kills.  This allows us to handle
> the new testcase (which was already handled in the very late DSE
> pass with the help of sinking the store).
> 
> I took the opportunity to kill the use_stmt parameter of
> dse_classify_store as the only user is only looking for whether
> the kills were all clobbers which I added a new parameter for.
> 
> I didn't adjust the byte-tracking case fully (I'm not fully understanding
> the code in the case of a use and I'm not sure whether it's worth
> doing the def reduction with byte-tracking).
> 
> Your testcase can be handled by reducing the PHI and the call def
> by seeing that the only use of a candidate def is another def
> we have already processed.  Not sure if worth special-casing though,
> I'd rather have a go at "recursing".  That will be the next
> patch.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Applied.

Another intermediate one below, fixing the byte-tracking for
stmt with uses.  This also re-does the PHI handling by simply
avoiding recursion by means of a visited bitmap and stopping
at the DSE classify stmt when re-visiting it instead of failing.
This covers Pratamesh loop case for which I added ssa-dse-33.c.
For the do-while loop this still runs into the inability to
handle two defs to walk from.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-05-15  Richard Biener  

* tree-ssa-dse.c (dse_classify_store): Track cycles via a visited
bitmap of PHI defs and simplify handling of in-loop and across
loop dead stores.  Handle byte-tracking with multiple defs.

* gcc.dg/tree-ssa/ssa-dse-32.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-33.c: Likewise.

commit 7129756be201db1bd90e4191ed32084cbb22130d
Author: Richard Guenther 
Date:   Tue May 15 14:03:32 2018 +0200

dse#2

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c
new file mode 100644
index 000..eea0d8d5cf0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1-details" } */
+
+void f(int n)
+{
+  char *p = __builtin_malloc (1);
+  int i;
+  do
+*p = 0;
+  while (++i < n);
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c
new file mode 100644
index 000..02e3e6a582c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1-details" } */
+
+void f(char *p, int n)
+{
+  for (int i = 0; i < n; ++i)
+*p = 0;  /* Removed by DSE.  */
+  *p = 1;
+}
+
+void g(char *p, int n)
+{
+  int i = 0;
+  do
+*p = 0;  /* DSE cannot yet handle this case.  */
+  while (++i < n);
+  *p = 1;
+}
+
+
+/* { dg-final { scan-tree-dump-times "Deleted dead store" 2 "dse1" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 126592a1d13..8836e84dd63 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -528,6 +528,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
 {
   gimple *temp;
   unsigned cnt = 0;
+  auto_bitmap visited;
 
   if (by_clobber_p)
 *by_clobber_p = true;
@@ -539,7 +540,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
   temp = stmt;
   do
 {

[PATCH] Remove unused headers from tests

2018-05-15 Thread Jonathan Wakely

I think these tests #include  because I just copied an
existing test to a new file, and didn't remove the header.

* testsuite/27_io/filesystem/path/decompose/extension.cc: Remove
unused  header.
* testsuite/27_io/filesystem/path/query/empty.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_extension.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_filename.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_parent_path.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_relative_path.cc:
Likewise.
* testsuite/27_io/filesystem/path/query/has_root_directory.cc:
Likewise.
* testsuite/27_io/filesystem/path/query/has_root_name.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_root_path.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_stem.cc: Likewise.
* testsuite/27_io/filesystem/path/query/is_relative.cc: Likewise.
* testsuite/experimental/filesystem/path/decompose/extension.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/empty.cc: Likewise.
* testsuite/experimental/filesystem/path/query/has_extension.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_filename.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_parent_path.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_relative_path.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_root_directory.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_root_name.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_root_path.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_stem.cc: Likewise.
* testsuite/experimental/filesystem/path/query/is_relative.cc:
Likewise.

Tested x86_64-linux, committed to trunk and gcc-8-branch.


commit b42543f7227904503a7c308ce3c406861b88a05f
Author: Jonathan Wakely 
Date:   Tue May 15 13:16:18 2018 +0100

Remove unused headers from tests

* testsuite/27_io/filesystem/path/decompose/extension.cc: Remove
unused  header.
* testsuite/27_io/filesystem/path/query/empty.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_extension.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_filename.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_parent_path.cc: 
Likewise.
* testsuite/27_io/filesystem/path/query/has_relative_path.cc:
Likewise.
* testsuite/27_io/filesystem/path/query/has_root_directory.cc:
Likewise.
* testsuite/27_io/filesystem/path/query/has_root_name.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_root_path.cc: Likewise.
* testsuite/27_io/filesystem/path/query/has_stem.cc: Likewise.
* testsuite/27_io/filesystem/path/query/is_relative.cc: Likewise.
* testsuite/experimental/filesystem/path/decompose/extension.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/empty.cc: Likewise.
* testsuite/experimental/filesystem/path/query/has_extension.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_filename.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_parent_path.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_relative_path.cc:
Likewise.
* 
testsuite/experimental/filesystem/path/query/has_root_directory.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_root_name.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_root_path.cc:
Likewise.
* testsuite/experimental/filesystem/path/query/has_stem.cc: 
Likewise.
* testsuite/experimental/filesystem/path/query/is_relative.cc:
Likewise.

diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc
index 2a314f24548..9084318aac0 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc
@@ -22,7 +22,6 @@
 // 8.4.9 path decomposition [path.decompose]
 
 #include 
-#include 
 #include 
 #include 
 
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/query/empty.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/query/empty.cc
index dd04ac14c15..1b0118f79d2 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/query/empty.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/query/empty.cc
@@ -22,7 +22,6 @@
 // 8.4.9 path decomposition [path.decompose]
 
 #includ

Re: [PATCH][AArch64] Improve register allocation of fma

2018-05-15 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 04 January 2018 17:46
To: GCC Patches
Cc: nd
Subject: [PATCH][AArch64] Improve register allocation of fma
  

This patch improves register allocation of fma by preferring to update the
accumulator register.  This is done by adding fma insns with operand 1 as the
accumulator.  The register allocator considers copy preferences only in operand
order, so if the first operand is dead, it has the highest chance of being
reused as the destination.  As a result code using fma often has a better
register allocation.  Performance of SPECFP2017 improves by over 0.5% on some
implementations, while it had no effect on other implementations.  Fma is more
readable too, in a simple example we now generate:

    fmadd   s16, s2, s1, s16
    fmadd   s7, s17, s16, s7
    fmadd   s6, s16, s7, s6
    fmadd   s5, s7, s6, s5

instead of:

    fmadd   s16, s16, s2, s1
    fmadd   s7, s7, s16, s6
    fmadd   s6, s6, s7, s5
    fmadd   s5, s5, s6, s4

Bootstrap OK. OK for commit?

ChangeLog:
2018-01-04  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.md (fma4): Change into expand pattern.
    (fnma4): Likewise.
    (fms4): Likewise.
    (fnms4): Likewise.
    (aarch64_fma4): Rename insn, reorder accumulator operand.
    (aarch64_fnma4): Likewise.
    (aarch64_fms4): Likewise.
    (aarch64_fnms4): Likewise.
    (aarch64_fnmadd4): Likewise.
--

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
382953e6ec42ae4475d66143be1e25d22e48571f..e773ec0c41559e47cf38e719dcb8c42d5bb4da49
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4743,57 +4743,94 @@ (define_insn 
"*aarch64_fcvt2_mult"
   [(set_attr "type" "f_cvtf2i")]
 )
 
-;; fma - no throw
+;; fma - expand fma into patterns with the accumulator operand first since
+;; reusing the accumulator results in better register allocation.
+;; The register allocator considers copy preferences in operand order,
+;; so this prefers fmadd s0, s1, s2, s0 over fmadd s1, s1, s2, s0.
+
+(define_expand "fma4"
+  [(set (match_operand:GPF_F16 0 "register_operand")
+   (fma:GPF_F16 (match_operand:GPF_F16 1 "register_operand")
+    (match_operand:GPF_F16 2 "register_operand")
+    (match_operand:GPF_F16 3 "register_operand")))]
+  "TARGET_FLOAT"
+)
 
-(define_insn "fma4"
+(define_insn "*aarch64_fma4"
   [(set (match_operand:GPF_F16 0 "register_operand" "=w")
-    (fma:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")
-    (match_operand:GPF_F16 2 "register_operand" "w")
-    (match_operand:GPF_F16 3 "register_operand" "w")))]
+   (fma:GPF_F16 (match_operand:GPF_F16 2 "register_operand" "w")
+    (match_operand:GPF_F16 3 "register_operand" "w")
+    (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
-  "fmadd\\t%0, %1, %2, %3"
+  "fmadd\\t%0, %2, %3, %1"
   [(set_attr "type" "fmac")]
 )
 
-(define_insn "fnma4"
+(define_expand "fnma4"
+  [(set (match_operand:GPF_F16 0 "register_operand")
+   (fma:GPF_F16
+ (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand"))
+ (match_operand:GPF_F16 2 "register_operand")
+ (match_operand:GPF_F16 3 "register_operand")))]
+  "TARGET_FLOAT"
+)
+
+(define_insn "*aarch64_fnma4"
   [(set (match_operand:GPF_F16 0 "register_operand" "=w")
 (fma:GPF_F16
- (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w"))
- (match_operand:GPF_F16 2 "register_operand" "w")
- (match_operand:GPF_F16 3 "register_operand" "w")))]
+ (neg:GPF_F16 (match_operand:GPF_F16 2 "register_operand" "w"))
+ (match_operand:GPF_F16 3 "register_operand" "w")
+ (match_operand:GPF_F16 1 "register_operand" "w")))]
   "TARGET_FLOAT"
-  "fmsub\\t%0, %1, %2, %3"
+  "fmsub\\t%0, %2, %3, %1"
   [(set_attr "type" "fmac")]
 )
 
-(define_insn "fms4"
+
+(define_expand "fms4"
+  [(set (match_operand:GPF 0 "register_operand")
+   (fma:GPF (match_operand:GPF 1 "register_operand")
+    (match_operand:GPF 2 "register_operand")
+    (neg:GPF (match_operand:GPF 3 "register_operand"]
+  "TARGET_FLOAT"
+)
+
+(define_insn "*aarch64_fms4"
   [(set (match_operand:GPF 0 "register_operand" "=w")
-    (fma:GPF (match_operand:GPF 1 "register_operand" "w")
-    (match_operand:GPF 2 "register_operand" "w")
-    (neg:GPF (match_operand:GPF 3 "register_operand" "w"]
+   (fma:GPF (match_operand:GPF 2 "register_operand" "w")
+    (match_operand:GPF 3 "register_operand" "w")
+    (neg:GPF (match_operand:GPF 1 "register_operand" "w"]
   "TARGET_FLOAT"
-  "fnmsub\\t%0, %1, %2, %3"
+  "fnmsub\\t%0, %2, %3, %1"
   [(set_attr "type" "fmac")]
 )
 
-(define_insn "fnms4"
+(define_expand "fnms4"
+  [(set (match_operand:GPF 0 "register_operand")
+   (fma:GPF (neg:

Re: [PR63185][RFC] Improve DSE with branches

2018-05-15 Thread Richard Biener
On Tue, 15 May 2018, Richard Biener wrote:

> On Tue, 15 May 2018, Richard Biener wrote:
> 
> > On Tue, 15 May 2018, Richard Biener wrote:
> > 
> > > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
> > > 
> > > > Hi,
> > > > 
> > > > Attached patch handles PR63185 when we reach PHI with temp != NULLL.
> > > > We could see the PHI and if there isn't any uses for PHI that is
> > > > interesting, we could ignore that ?
> > > > 
> > > > Bootstrapped and regression tested on x86_64-linux-gnu.
> > > > Is this OK?
> > > 
> > > No, as Jeff said we can't do it this way.
> > > 
> > > If we end up with multiple VDEFs in the walk of defvar immediate
> > > uses we know we are dealing with a CFG fork.  We can't really
> > > ignore any of the paths but we have to
> > > 
> > >  a) find the merge point (and the associated VDEF)
> > >  b) verify for each each chain of VDEFs with associated VUSEs
> > > up to that merge VDEF that we have no uses of the to classify
> > > store and collect (partial) kills
> > >  c) intersect kill info and continue walking from the merge point
> > > 
> > > in b) there's the optional possibility to find sinking opportunities
> > > in case we have kills on some paths but uses on others.  This is why
> > > DSE should be really merged with (store) sinking.
> > > 
> > > So if we want to enhance DSEs handling of branches then we need
> > > to refactor the simple dse_classify_store function.  Let me take
> > > an attempt at this today.
> > 
> > First (baby) step is the following - it arranges to collect the
> > defs we need to continue walking from and implements trivial
> > reduction by stopping at (full) kills.  This allows us to handle
> > the new testcase (which was already handled in the very late DSE
> > pass with the help of sinking the store).
> > 
> > I took the opportunity to kill the use_stmt parameter of
> > dse_classify_store as the only user is only looking for whether
> > the kills were all clobbers which I added a new parameter for.
> > 
> > I didn't adjust the byte-tracking case fully (I'm not fully understanding
> > the code in the case of a use and I'm not sure whether it's worth
> > doing the def reduction with byte-tracking).
> > 
> > Your testcase can be handled by reducing the PHI and the call def
> > by seeing that the only use of a candidate def is another def
> > we have already processed.  Not sure if worth special-casing though,
> > I'd rather have a go at "recursing".  That will be the next
> > patch.
> > 
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> Applied.
> 
> Another intermediate one below, fixing the byte-tracking for
> stmt with uses.  This also re-does the PHI handling by simply
> avoiding recursion by means of a visited bitmap and stopping
> at the DSE classify stmt when re-visiting it instead of failing.
> This covers Pratamesh loop case for which I added ssa-dse-33.c.
> For the do-while loop this still runs into the inability to
> handle two defs to walk from.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Ok, loop handling doesn't work in general since we run into the
issue that SSA form across the backedge is not representing the
same values.  Consider

 
 # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)>
 # n_20 = PHI <0(2), n_7(4)>
 # .MEM_13 = VDEF <.MEM_22>
 bytes[n_20] = _4;
 if (n_20 > 7)
   goto ;

 
 n_7 = n_20 + 1;
 # .MEM_15 = VDEF <.MEM_13>
 bytes[n_20] = _5;
 goto ;

then when classifying the store in bb4, visiting the PHI node
gets us to the store in bb3 which appears to be killing.

   if (gimple_code (temp) == GIMPLE_PHI)
-   defvar = PHI_RESULT (temp);
+   {
+ /* If we visit this PHI by following a backedge then reset
+any info in ref that may refer to SSA names which we'd need
+to PHI translate.  */
+ if (gimple_bb (temp) == gimple_bb (stmt)
+ || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
+gimple_bb (temp)))
+   /* ???  ref->ref may not refer to SSA names or it may only
+  refer to SSA names that are invariant with respect to the
+  loop represented by this PHI node.  */
+   ref->ref = NULL_TREE;
+ defvar = PHI_RESULT (temp);
+ bitmap_set_bit (visited, SSA_NAME_VERSION (defvar));
+   }

should be a workable solution for that.  I'm checking that, but
eventually you can think of other things that might prevent us from
handling backedges.  Note the current code tries to allow
looking across loops but not handle backedges of loops the
original stmt belongs to.

Richard.


> Richard.
> 
> 2018-05-15  Richard Biener  
> 
>   * tree-ssa-dse.c (dse_classify_store): Track cycles via a visited
>   bitmap of PHI defs and simplify handling of in-loop and across
>   loop dead stores.  Handle byte-tracking with multiple defs.
> 
>   * gcc.dg/tree-ssa/ssa-dse-32.c: New testcase.
>   * gcc.dg/tree-ssa/ssa-dse-33.c: Likewise.
> 
> commit 71

Re: [PATCH][AArch64] Simplify frame pointer logic

2018-05-15 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 25 October 2017 16:29
To: GCC Patches
Cc: nd
Subject: [PATCH][AArch64] Simplify frame pointer logic
  

Simplify frame pointer logic based on review comments here
(https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01727.html).

This patch incrementally adds to these frame pointer cleanup patches:
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00377.html
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00381.html

Add aarch64_needs_frame_chain to decide when to emit the frame
chain using clearer logic. Introduce aarch64_use_frame_pointer
which contains the value of -fno-omit-frame-pointer
(flag_omit_frame_pointer is set to a magic value so that the mid-end
won't force the frame pointer in all cases, and leaf frame pointer
emission can't be supported).

OK for commit?

2017-10-25  Wilco Dijkstra  

    * config/aarch64/aarch64.c (aarch64_use_frame_pointer):
    Add new boolean.
    (aarch64_needs_frame_chain): New function.
    (aarch64_parse_override_string): Set aarch64_parse_override_string.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
0c0edc100ee4197d027e2d80da56a12b9129fca4..9abde0d66d52ac4fd17d1dc2193a6bbb5f9c65fc
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -163,6 +163,9 @@ unsigned long aarch64_tune_flags = 0;
 /* Global flag for PC relative loads.  */
 bool aarch64_pcrelative_literal_loads;
 
+/* Global flag for whether frame pointer is enabled.  */
+bool aarch64_use_frame_pointer;
+
 /* Support for command line parsing of boolean flags in the tuning
    structures.  */
 struct aarch64_flag_desc
@@ -2871,6 +2874,25 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
+/* Determine whether a frame chain needs to be generated.  */
+static bool
+aarch64_needs_frame_chain (void)
+{
+  /* Force a frame chain for EH returns so the return address is at FP+8.  */
+  if (frame_pointer_needed || crtl->calls_eh_return)
+    return true;
+
+  /* A leaf function cannot have calls or write LR.  */
+  bool is_leaf = crtl->is_leaf && !df_regs_ever_live_p (LR_REGNUM);
+
+  /* Don't use a frame chain in leaf functions if leaf frame pointers
+ are disabled.  */
+  if (flag_omit_leaf_frame_pointer && is_leaf)
+    return false;
+
+  return aarch64_use_frame_pointer;
+}
+
 /* Mark the registers that need to be saved by the callee and calculate
    the size of the callee-saved registers area and frame record (both FP
    and LR may be omitted).  */
@@ -2883,17 +2905,7 @@ aarch64_layout_frame (void)
   if (reload_completed && cfun->machine->frame.laid_out)
 return;
 
-  /* Force a frame chain for EH returns so the return address is at FP+8.  */
-  cfun->machine->frame.emit_frame_chain
-    = frame_pointer_needed || crtl->calls_eh_return;
-
-  /* Emit a frame chain if the frame pointer is enabled.
- If -momit-leaf-frame-pointer is used, do not use a frame chain
- in leaf functions which do not use LR.  */
-  if (flag_omit_frame_pointer == 2
-  && !(flag_omit_leaf_frame_pointer && crtl->is_leaf
-  && !df_regs_ever_live_p (LR_REGNUM)))
-    cfun->machine->frame.emit_frame_chain = true;
+  cfun->machine->frame.emit_frame_chain = aarch64_needs_frame_chain ();
 
 #define SLOT_NOT_REQUIRED (-2)
 #define SLOT_REQUIRED (-1)
@@ -9152,12 +9164,12 @@ aarch64_override_options_after_change_1 (struct 
gcc_options *opts)
   /* PR 70044: We have to be careful about being called multiple times for the
  same function.  This means all changes should be repeatable.  */
 
-  /* If the frame pointer is enabled, set it to a special value that behaves
- similar to frame pointer omission.  If we don't do this all leaf functions
- will get a frame pointer even if flag_omit_leaf_frame_pointer is set.
- If flag_omit_frame_pointer has this special value, we must force the
- frame pointer if not in a leaf function.  We also need to force it in a
- leaf function if flag_omit_frame_pointer is not set or if LR is used.  */
+  /* Set aarch64_use_frame_pointer based on -fno-omit-frame-pointer.
+ Disable the frame pointer flag so the mid-end will not use a frame
+ pointer in leaf functions in order to support -fomit-leaf-frame-pointer.
+ Set x_flag_omit_frame_pointer to the special value 2 to differentiate
+ between -fomit-frame-pointer (1) and -fno-omit-frame-pointer (2).  */
+  aarch64_use_frame_pointer = opts->x_flag_omit_frame_pointer != 1;
   if (opts->x_flag_omit_frame_pointer == 0)
 opts->x_flag_omit_frame_pointer = 2;
 


Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-15 Thread Wilco Dijkstra

ping




From: Wilco Dijkstra
Sent: 17 November 2017 15:21
To: GCC Patches
Cc: nd
Subject: [PATCH][AArch64] Set SLOW_BYTE_ACCESS
  

Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
bitfields by their declared type, which results in better codegeneration on 
practically
any target.

I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS
from GCC as it's confusing and useless.

OK for commit until we get rid of it?

ChangeLog:
2017-11-17  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
056110afb228fb919e837c04aa5e5552a4868ec3..d8f4d129a02fb89eb00d256aba8c4764d6026078
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -769,14 +769,9 @@ typedef struct
    if given data not on the nominal alignment.  */
 #define STRICT_ALIGNMENT    TARGET_STRICT_ALIGN
 
-/* Define this macro to be non-zero if accessing less than a word of
-   memory is no faster than accessing a word of memory, i.e., if such
-   accesses require more than one instruction or if there is no
-   difference in cost.
-   Although there's no difference in instruction count or cycles,
-   in AArch64 we don't want to expand to a sub-word to a 64-bit access
-   if we don't have to, for power-saving reasons.  */
-#define SLOW_BYTE_ACCESS   0
+/* Contrary to all documentation, this enables wide bitfield accesses,
+   which results in better code when accessing multiple bitfields.  */
+#define SLOW_BYTE_ACCESS   1
 
 #define NO_FUNCTION_CSE 1



[PATCH] PR libstdc++/83891 fix path::is_absolute() for non-POSIX targets

2018-05-15 Thread Jonathan Wakely

The correct definition seems to be has_root_directory() for all systems
we care about.

PR libstdc++/83891
* include/bits/fs_path.h (path::is_absolute()): Use same definition
for all operating systems.
* include/experimental/bits/fs_path.h (path::is_absolute()): Likewise.
* testsuite/27_io/filesystem/path/query/is_absolute.cc: New.
* testsuite/27_io/filesystem/path/query/is_relative.cc: Fix comment.
* testsuite/experimental/filesystem/path/query/is_absolute.cc: New.

Tested powerpc64le-linux and x86_64-linux, committed to trunk and
gcc-8-branch.


commit 82eb5d9bd748708bd0069e2045c2665f56c0e268
Author: Jonathan Wakely 
Date:   Tue May 15 14:32:43 2018 +0100

PR libstdc++/83891 fix path::is_absolute() for non-POSIX targets

The correct definition seems to be has_root_directory() for all systems
we care about.

PR libstdc++/83891
* include/bits/fs_path.h (path::is_absolute()): Use same definition
for all operating systems.
* include/experimental/bits/fs_path.h (path::is_absolute()): 
Likewise.
* testsuite/27_io/filesystem/path/query/is_absolute.cc: New.
* testsuite/27_io/filesystem/path/query/is_relative.cc: Fix comment.
* testsuite/experimental/filesystem/path/query/is_absolute.cc: New.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 53bf237b547..51af2891647 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -376,7 +376,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 bool has_filename() const;
 bool has_stem() const;
 bool has_extension() const;
-bool is_absolute() const;
+bool is_absolute() const { return has_root_directory(); }
 bool is_relative() const { return !is_absolute(); }
 
 // generation
@@ -1071,16 +1071,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 return ext.first && ext.second != string_type::npos;
   }
 
-  inline bool
-  path::is_absolute() const
-  {
-#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-return has_root_name();
-#else
-return has_root_directory();
-#endif
-  }
-
   inline path::iterator
   path::begin() const
   {
diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h 
b/libstdc++-v3/include/experimental/bits/fs_path.h
index 3b4011e6414..ada7c1791aa 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -368,7 +368,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 bool has_filename() const;
 bool has_stem() const;
 bool has_extension() const;
-bool is_absolute() const;
+bool is_absolute() const { return has_root_directory(); }
 bool is_relative() const { return !is_absolute(); }
 
 // iterators
@@ -999,16 +999,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 return ext.first && ext.second != string_type::npos;
   }
 
-  inline bool
-  path::is_absolute() const
-  {
-#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-return has_root_name();
-#else
-return has_root_directory();
-#endif
-  }
-
   inline path::iterator
   path::begin() const
   {
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/query/is_absolute.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/query/is_absolute.cc
new file mode 100644
index 000..6b5c098489a
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/query/is_absolute.cc
@@ -0,0 +1,62 @@
+// { dg-options "-std=gnu++17 -lstdc++fs" }
+// { dg-do run { target c++17 } }
+// { dg-require-filesystem-ts "" }
+
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// 30.11.7.4.9 path decomposition [fs.path.decompose]
+
+#include 
+#include 
+
+using std::filesystem::path;
+
+void
+test01()
+{
+  VERIFY( path("/").is_absolute() );
+  VERIFY( path("/foo").is_absolute() );
+  VERIFY( path("/foo/").is_absolute() );
+  VERIFY( path("/foo/bar").is_absolute() );
+  VERIFY( path("/foo/bar/").is_absolute() );
+  VERIFY( ! path("foo").is_absolute() );
+  VERIFY( ! path("foo/").is_absolute() );
+  VERIFY( ! path("foo/bar").is_absolute() );
+  VERIFY( ! path("foo/bar/").is_absolute() );
+  VERIFY( ! path("c:").is_absolute() );
+  VERIFY( ! path("c:foo").is_absolute() );
+  VERIFY( ! path("c:foo/").is_absolu

Re: [C++ Patch] Add DECL_MAYBE_IN_CHARGE_CDTOR_P

2018-05-15 Thread Jason Merrill
OK.

On Tue, May 15, 2018 at 8:07 AM, Paolo Carlini  wrote:
> Hi,
>
> noticed a few times while working on various issues, maybe we want to add
> the macro now? Tested x86_64-linux.
>
> Thanks, Paolo.
>
> 
>


Fix PR85782: C++ ICE with continue statements inside acc loops

2018-05-15 Thread Cesar Philippidis
This patch resolves the issue in PR85782, which involves a C++ ICE
caused by OpenACC loops which contain continue statements. The problem
is that genericize_continue_stmt expects a continue label for the loop,
but that wasn't getting set up acc loops. This patch fixes that by
calling genericize_omp_for_stmt for OACC_LOOP statements, because
OACC_LOOP and OMP_FOR statements are almost identical at this point of
parsing.

I regression tested it on x86_64-linux with nvptx offloading.

Is this ok for trunk and the stable branches?  The patch for the stable
branches is slightly different, because cp_genericize_r uses if
statements to check for statement types instead of a huge switch statement.

Cesar
2018-05-15  Cesar Philippidis  

	PR c++/85782

	gcc/cp/
	* cp-gimplify.c (cp_genericize_r): Call genericize_omp_for_stmt for
	OACC_LOOPs.

	gcc/testsuite/
	* c-c++-common/goacc/pr85782.c: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/pr85782.c: New test.


diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index eda5f05..55aef86 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -1463,6 +1463,7 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void *data)
 case OMP_FOR:
 case OMP_SIMD:
 case OMP_DISTRIBUTE:
+case OACC_LOOP:
   genericize_omp_for_stmt (stmt_p, walk_subtrees, data);
   break;
 
diff --git a/gcc/testsuite/c-c++-common/goacc/pr85782.c b/gcc/testsuite/c-c++-common/goacc/pr85782.c
new file mode 100644
index 000..f213a24
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/pr85782.c
@@ -0,0 +1,11 @@
+/* PR c++/85782 */
+
+void
+foo ()
+{
+  int i;
+  
+  #pragma acc parallel loop
+  for (i = 0; i < 100; i++)
+continue;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85782.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85782.c
new file mode 100644
index 000..6f84dfc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85782.c
@@ -0,0 +1,32 @@
+/* PR c++/85782 */
+
+#include 
+
+#define N 100
+
+int
+main ()
+{
+  int i, a[N];
+
+  for (i = 0; i < N; i++)
+a[i] = i+1;
+
+  #pragma acc parallel loop copy(a)
+  for (i = 0; i < N; i++)
+{
+  if (i % 2)
+	continue;
+  a[i] = 0;
+}
+
+  for (i = 0; i < N; i++)
+{
+  if (i % 2)
+	assert (a[i] == i+1);
+  else
+	assert (a[i] == 0);
+}
+
+return 0;
+}


Re: [PR63185][RFC] Improve DSE with branches

2018-05-15 Thread Richard Biener
On Tue, 15 May 2018, Richard Biener wrote:

> On Tue, 15 May 2018, Richard Biener wrote:
> 
> > On Tue, 15 May 2018, Richard Biener wrote:
> > 
> > > On Tue, 15 May 2018, Richard Biener wrote:
> > > 
> > > > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > Attached patch handles PR63185 when we reach PHI with temp != NULLL.
> > > > > We could see the PHI and if there isn't any uses for PHI that is
> > > > > interesting, we could ignore that ?
> > > > > 
> > > > > Bootstrapped and regression tested on x86_64-linux-gnu.
> > > > > Is this OK?
> > > > 
> > > > No, as Jeff said we can't do it this way.
> > > > 
> > > > If we end up with multiple VDEFs in the walk of defvar immediate
> > > > uses we know we are dealing with a CFG fork.  We can't really
> > > > ignore any of the paths but we have to
> > > > 
> > > >  a) find the merge point (and the associated VDEF)
> > > >  b) verify for each each chain of VDEFs with associated VUSEs
> > > > up to that merge VDEF that we have no uses of the to classify
> > > > store and collect (partial) kills
> > > >  c) intersect kill info and continue walking from the merge point
> > > > 
> > > > in b) there's the optional possibility to find sinking opportunities
> > > > in case we have kills on some paths but uses on others.  This is why
> > > > DSE should be really merged with (store) sinking.
> > > > 
> > > > So if we want to enhance DSEs handling of branches then we need
> > > > to refactor the simple dse_classify_store function.  Let me take
> > > > an attempt at this today.
> > > 
> > > First (baby) step is the following - it arranges to collect the
> > > defs we need to continue walking from and implements trivial
> > > reduction by stopping at (full) kills.  This allows us to handle
> > > the new testcase (which was already handled in the very late DSE
> > > pass with the help of sinking the store).
> > > 
> > > I took the opportunity to kill the use_stmt parameter of
> > > dse_classify_store as the only user is only looking for whether
> > > the kills were all clobbers which I added a new parameter for.
> > > 
> > > I didn't adjust the byte-tracking case fully (I'm not fully understanding
> > > the code in the case of a use and I'm not sure whether it's worth
> > > doing the def reduction with byte-tracking).
> > > 
> > > Your testcase can be handled by reducing the PHI and the call def
> > > by seeing that the only use of a candidate def is another def
> > > we have already processed.  Not sure if worth special-casing though,
> > > I'd rather have a go at "recursing".  That will be the next
> > > patch.
> > > 
> > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > 
> > Applied.
> > 
> > Another intermediate one below, fixing the byte-tracking for
> > stmt with uses.  This also re-does the PHI handling by simply
> > avoiding recursion by means of a visited bitmap and stopping
> > at the DSE classify stmt when re-visiting it instead of failing.
> > This covers Pratamesh loop case for which I added ssa-dse-33.c.
> > For the do-while loop this still runs into the inability to
> > handle two defs to walk from.
> > 
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> Ok, loop handling doesn't work in general since we run into the
> issue that SSA form across the backedge is not representing the
> same values.  Consider
> 
>  
>  # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)>
>  # n_20 = PHI <0(2), n_7(4)>
>  # .MEM_13 = VDEF <.MEM_22>
>  bytes[n_20] = _4;
>  if (n_20 > 7)
>goto ;
> 
>  
>  n_7 = n_20 + 1;
>  # .MEM_15 = VDEF <.MEM_13>
>  bytes[n_20] = _5;
>  goto ;
> 
> then when classifying the store in bb4, visiting the PHI node
> gets us to the store in bb3 which appears to be killing.
> 
>if (gimple_code (temp) == GIMPLE_PHI)
> -   defvar = PHI_RESULT (temp);
> +   {
> + /* If we visit this PHI by following a backedge then reset
> +any info in ref that may refer to SSA names which we'd need
> +to PHI translate.  */
> + if (gimple_bb (temp) == gimple_bb (stmt)
> + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
> +gimple_bb (temp)))
> +   /* ???  ref->ref may not refer to SSA names or it may only
> +  refer to SSA names that are invariant with respect to the
> +  loop represented by this PHI node.  */
> +   ref->ref = NULL_TREE;
> + defvar = PHI_RESULT (temp);
> + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar));
> +   }
> 
> should be a workable solution for that.  I'm checking that, but
> eventually you can think of other things that might prevent us from
> handling backedges.  Note the current code tries to allow
> looking across loops but not handle backedges of loops the
> original stmt belongs to.

Just to mention before I leave for the day I think I've identified
a latent issue where I just fail to produce a testcase

Re: [PR63185][RFC] Improve DSE with branches

2018-05-15 Thread Jeff Law
On 05/15/2018 02:15 AM, Richard Biener wrote:
> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
> 
>> Hi,
>>
>> Attached patch handles PR63185 when we reach PHI with temp != NULLL.
>> We could see the PHI and if there isn't any uses for PHI that is
>> interesting, we could ignore that ?
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu.
>> Is this OK?
> 
> No, as Jeff said we can't do it this way.
> 
> If we end up with multiple VDEFs in the walk of defvar immediate
> uses we know we are dealing with a CFG fork.  We can't really
> ignore any of the paths but we have to
> 
>  a) find the merge point (and the associated VDEF)
>  b) verify for each each chain of VDEFs with associated VUSEs
> up to that merge VDEF that we have no uses of the to classify
> store and collect (partial) kills
>  c) intersect kill info and continue walking from the merge point
Which makes me come back to "why is this structured as a forward walk?"
DSE seems much more naturally structured as a walk of the post-dominator
tree, walking instructions in reverse order.

Cases like 63185 would naturally fall out of that structure as the
clobber at the end of the function post-dominates the memset and there's
no intervening aliased loads.

In fact, I thought that's how I initially structured DSE.I'm not
sure when/why it changed.

Jeff


[PATCH] PR libstdc++/85749 constrain seed sequences for random number engines

2018-05-15 Thread Jonathan Wakely

Constrain constructors and member functions of random number engines so
that functions taking seed sequences can only be called with types that
meet the seed sequence requirements.

PR libstdc++/85749
* include/bits/random.h (__detail::__is_seed_seq): New SFINAE helper.
(linear_congruential_engine, mersenne_twister_engine)
(subtract_with_carry_engine, discard_block_engine)
(independent_bits_engine, shuffle_order_engine): Use __is_seed_seq to
constrain function templates taking seed sequences.
* include/bits/random.tcc (linear_congruential_engine::seed(_Sseq&))
(mersenne_twister_engine::seed(_Sseq&))
(subtract_with_carry_engine::seed(_Sseq&)): Change return types to
match declarations.
* include/ext/random (simd_fast_mersenne_twister_engine): Use
__is_seed_seq to constrain function templates taking seed sequences.
* include/ext/random.tcc (simd_fast_mersenne_twister_engine::seed):
Change return type to match declaration.
* testsuite/26_numerics/random/discard_block_engine/cons/seed_seq2.cc:
New.
* testsuite/26_numerics/random/independent_bits_engine/cons/
seed_seq2.cc: New.
* testsuite/26_numerics/random/linear_congruential_engine/cons/
seed_seq2.cc: New.
* testsuite/26_numerics/random/mersenne_twister_engine/cons/
seed_seq2.cc: New.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error lineno.
* testsuite/26_numerics/random/shuffle_order_engine/cons/seed_seq2.cc:
New.
* testsuite/26_numerics/random/subtract_with_carry_engine/cons/
seed_seq2.cc: New.
* testsuite/ext/random/simd_fast_mersenne_twister_engine/cons/
seed_seq2.cc: New.

Tested powerpc64le-linux, committed to trunk.


commit 2197343e0cb9439687e453f37bf9ea2908bd499d
Author: Jonathan Wakely 
Date:   Sat May 12 00:22:23 2018 +0100

PR libstdc++/85749 constrain seed sequences for random number engines

Constrain constructors and member functions of random number engines so
that functions taking seed sequences can only be called with types that
meet the seed sequence requirements.

PR libstdc++/85749
* include/bits/random.h (__detail::__is_seed_seq): New SFINAE helper.
(linear_congruential_engine, mersenne_twister_engine)
(subtract_with_carry_engine, discard_block_engine)
(independent_bits_engine, shuffle_order_engine): Use __is_seed_seq to
constrain function templates taking seed sequences.
* include/bits/random.tcc (linear_congruential_engine::seed(_Sseq&))
(mersenne_twister_engine::seed(_Sseq&))
(subtract_with_carry_engine::seed(_Sseq&)): Change return types to
match declarations.
* include/ext/random (simd_fast_mersenne_twister_engine): Use
__is_seed_seq to constrain function templates taking seed sequences.
* include/ext/random.tcc (simd_fast_mersenne_twister_engine::seed):
Change return type to match declaration.
* testsuite/26_numerics/random/discard_block_engine/cons/seed_seq2.cc:
New.
* testsuite/26_numerics/random/independent_bits_engine/cons/
seed_seq2.cc: New.
* testsuite/26_numerics/random/linear_congruential_engine/cons/
seed_seq2.cc: New.
* testsuite/26_numerics/random/mersenne_twister_engine/cons/
seed_seq2.cc: New.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error lineno.
* testsuite/26_numerics/random/shuffle_order_engine/cons/seed_seq2.cc:
New.
* testsuite/26_numerics/random/subtract_with_carry_engine/cons/
seed_seq2.cc: New.
* testsuite/ext/random/simd_fast_mersenne_twister_engine/cons/
seed_seq2.cc: New.

diff --git a/libstdc++-v3/include/bits/random.h b/libstdc++-v3/include/bits/random.h
index f812bbf18b1..b76cfbb558e 100644
--- a/libstdc++-v3/include/bits/random.h
+++ b/libstdc++-v3/include/bits/random.h
@@ -185,6 +185,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	_Engine& _M_g;
   };
 
+template
+  using __seed_seq_generate_t = decltype(
+	  std::declval<_Sseq&>().generate(std::declval(),
+	  std::declval()));
+
+// Detect whether _Sseq is a valid seed sequence for
+// a random number engine _Engine with result type _Res.
+template>
+  using __is_seed_seq = __and_<
+__not_, _Engine>>,
+	is_unsigned,
+	__not_>
+  >;
+
   } // namespace __detail
 
   /**
@@ -233,6 +248,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert(__m == 0u || (__a < __m && __c < __m),
 		"template argument substituting __m out of bounds");
 
+  template
+	using _If_seed_seq = typename enable_if<__detail::__is_seed_seq<
+	  _Sseq, linear_congruential_engine, _UIntType>::value>::type;
+
  

Re: [PATCH GCC][4/6]Support regional coalesce and live range computation

2018-05-15 Thread Bin.Cheng
On Fri, May 11, 2018 at 1:53 PM, Richard Biener
 wrote:
> On Fri, May 4, 2018 at 6:23 PM, Bin Cheng  wrote:
>> Hi,
>> Following Jeff's suggestion, I am now using existing tree-ssa-live.c and
>> tree-ssa-coalesce.c to compute register pressure, rather than inventing
>> another live range solver.
>>
>> The major change is to record region's basic blocks in var_map and use that
>> information in computation, rather than FOR_EACH_BB_FN.  For now only loop
>> and function type regions are supported.  The default one is function type
>> region which is used in out-of-ssa.  Loop type region will be used in next
>> patch to compute information for a loop.
>>
>> Bootstrap and test on x86_64 and AArch64 ongoing.  Any comments?
>
> I believe your changes to create_outofssa_var_map should be done differently
> by simply only calling it from the coalescing context and passing in the
> var_map rather than initializing it therein and returning it.
>
> This also means the coalesce_vars_p flag in the var_map structure looks
> somewhat out-of-place.  That is, it looks you could do with many less
> changes if you refactored what calls what slightly?  For example
> the extra arg to gimple_can_coalesce_p looks unneeded.
>
> Just as a note I do have a CFG helper pending that computes RPO order
> for SEME regions (attached).  loops are SEME regions, so your RTYPE_SESE
> is somewhat odd - I guess RTYPE_LOOP exists only because of the
> convenience of passing in a loop * to the "constructor".  I'd rather
> drop this region_type thing and always assume a SEME region - at least
> I didn't see anything in the patch that depends on any of the forms
> apart from the initial BB gathering.

Hi Richard,

Thanks for reviewing.  I refactored tree-ssa-live.c and
tree-ssa-coalesce.c following your comments.
Basically I did following changes:
1) Remove region_type and only support loop region live range computation.
Also I added one boolean field in var_map indicating whether we
are computing
loop region live range or out-of-ssa.
2) Refactored create_outofssa_var_map into create_coalesce_list_for_region and
populate_coalesce_list_for_outofssa.  Actually the original
function name doesn't
quite make sense because it has nothing to do with var_map.
3) Hoist init_var_map up in call stack.  Now it's called by caller
(out-of-ssa or live range)
and the returned var_map is passed to coalesce_* stuff.
4) Move global functions to tree-outof-ssa.c and make them static.
5) Save/restore flag_tree_coalesce_vars in order to avoid updating
checks on the flag.

So how is this one?  Patch attached.

Thanks,
bin
2018-05-15  Bin Cheng  

* tree-outof-ssa.c (tree-ssa.h, tree-dfa.h): Include header files.
(create_default_def, for_all_parms): Moved from tree-ssa-coalesce.c.
(parm_default_def_partition_arg): Ditto.
(set_parm_default_def_partition): Ditto.
(get_parm_default_def_partitions): Ditto and make it static.
(get_undefined_value_partitions): Ditto and make it static.
(remove_ssa_form): Refactor call to init_var_map here.
* tree-ssa-coalesce.c (build_ssa_conflict_graph): Support live range
computation for loop region.
(coalesce_partitions, compute_optimized_partition_bases): Ditto.
(register_default_def): Delete.
(for_all_parms, create_default_def): Move to tree-outof-ssa.c.
(parm_default_def_partition_arg): Ditto.
(set_parm_default_def_partition): Ditto.
(get_parm_default_def_partitions): Ditto and make it static.
(get_undefined_value_partitions): Ditto and make it static.
(coalesce_with_default, coalesce_with_default): Update comment.
(create_coalesce_list_for_region): New func factored out from
create_outofssa_var_map.
(populate_coalesce_list_for_outofssa): New func factored out from
create_outofssa_var_map and coalesce_ssa_name.
(create_outofssa_var_map): Delete.
(coalesce_ssa_name): Refactor to support live range computation.
* tree-ssa-coalesce.h (coalesce_ssa_name): Change decl.
(get_parm_default_def_partitions): Delete.
(get_undefined_value_partitions): Ditto.
* tree-ssa-live.c (init_var_map, delete_var_map): Support live range
computation for loop region.
(new_tree_live_info, loe_visit_block): Ditto.
(live_worklist, set_var_live_on_entry): Ditto.
(calculate_live_on_exit, verify_live_on_entry): Ditto.
* tree-ssa-live.h (struct _var_map): New fields.
(init_var_map): Change decl.
(region_contains_p): New.
From 1043540cd5325c65e60df25cad22c343e4312fd4 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 4 May 2018 09:39:17 +0100
Subject: [PATCH 4/6] liverange-support-region-20180504

---
 gcc/tree-outof-ssa.c| 102 +++-
 gcc/tree-ssa-coalesce.c | 317 +---
 gcc/tree-ssa-coalesce.h |   4 +-
 gcc/tree-ssa-live.c |  78 
 gcc/tree-ssa-live.h |  30 -
 5 files changed, 298 insertions(+), 233 deletions(-)

diff --git a/gcc/tree-

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-15 Thread Richard Earnshaw (lists)
On 15/05/18 14:24, Wilco Dijkstra wrote:
> 
> ping
> 
> 

I see nothing about you addressing James' comment from 17th November...


> 
> 
> From: Wilco Dijkstra
> Sent: 17 November 2017 15:21
> To: GCC Patches
> Cc: nd
> Subject: [PATCH][AArch64] Set SLOW_BYTE_ACCESS
>   
> 
> Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
> bitfields by their declared type, which results in better codegeneration on 
> practically
> any target.
> 
> I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS
> from GCC as it's confusing and useless.
> 
> OK for commit until we get rid of it?
> 
> ChangeLog:
> 2017-11-17  Wilco Dijkstra  
> 
>     gcc/
>     * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
> --
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> 056110afb228fb919e837c04aa5e5552a4868ec3..d8f4d129a02fb89eb00d256aba8c4764d6026078
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -769,14 +769,9 @@ typedef struct
>     if given data not on the nominal alignment.  */
>  #define STRICT_ALIGNMENT    TARGET_STRICT_ALIGN
>  
> -/* Define this macro to be non-zero if accessing less than a word of
> -   memory is no faster than accessing a word of memory, i.e., if such
> -   accesses require more than one instruction or if there is no
> -   difference in cost.
> -   Although there's no difference in instruction count or cycles,
> -   in AArch64 we don't want to expand to a sub-word to a 64-bit access
> -   if we don't have to, for power-saving reasons.  */
> -#define SLOW_BYTE_ACCESS   0
> +/* Contrary to all documentation, this enables wide bitfield accesses,
> +   which results in better code when accessing multiple bitfields.  */
> +#define SLOW_BYTE_ACCESS   1
>  
>  #define NO_FUNCTION_CSE 1
> 
> 
> 



Fix partitioning ICE

2018-05-15 Thread Jan Hubicka
Hi,
this patch silences sanity check that makes sure that everything is in
partition 0 and thus boundary cost is 0 when there is only one partition.
In the testcase there is external initializer which does not get partitioned
and thus we end up with cost 1.  Fixed by not accounting references from
external initializers.

Bootstrapped/regtsted x86_64-linux, will commit it after finishing ltobootstrap.

Honza

* torture/pr85583.C: New testcase.

* lto-partition.c (account_reference_p): Do not account
references from aliases; do not account refernces from
external initializers.
Index: testsuite/g++.dg/torture/pr85583.C
===
--- testsuite/g++.dg/torture/pr85583.C  (nonexistent)
+++ testsuite/g++.dg/torture/pr85583.C  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do link } */
+class b {
+public:
+  virtual ~b();
+};
+template  class c : b {};
+class B {
+  c d;
+};
+extern template class c;
+int
+main(void) { B a; return 0; }
+
Index: lto/lto-partition.c
===
--- lto/lto-partition.c (revision 260258)
+++ lto/lto-partition.c (working copy)
@@ -439,13 +439,28 @@
 {
   if (cgraph_node *cnode = dyn_cast  (n1))
 n1 = cnode;
+  /* Do not account references from aliases - they are never split across
+ partitions.  */
+  if (n1->alias)
+return false;
   /* Do not account recursion - the code below will handle it incorrectly
- otherwise.  Also do not account references to external symbols.
- They will never become local.  */
+ otherwise.  Do not account references to external symbols: they will
+ never become local.  Finally do not account references to duplicated
+ symbols: they will be always local.  */
   if (n1 == n2 
-  || DECL_EXTERNAL (n2->decl)
-  || !n2->definition)
+  || !n2->definition
+  || n2->get_partitioning_class () != SYMBOL_PARTITION)
 return false;
+  /* If referring node is external symbol do not account it to boundary
+ cost. Those are added into units only to enable possible constant
+ folding and devirtulization.
+
+ Here we do not know if it will ever be added to some partition
+ (this is decided by compute_ltrans_boundary) and second it is not
+ that likely that constant folding will actually use the reference.  */
+  if (contained_in_symbol (n1)
+   ->get_partitioning_class () == SYMBOL_EXTERNAL)
+return false;
   return true;
 }
 


Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-15 Thread Wilco Dijkstra
Hi,
 
> I see nothing about you addressing James' comment from 17th November...

I addressed that in a separate patch, see 
https://patchwork.ozlabs.org/patch/839126/

Wilco

Re: [wwwdocs] Describe how to validate wwwdocs changes

2018-05-15 Thread David Malcolm
On Sun, 2018-05-13 at 22:19 -0600, Gerald Pfeifer wrote:
> This is triggered by a report from Martin who rightfully pointed 
> out that it's not straightforward to validate wwwdocs changes before 
> committing a change.
> 
> Martin, what do you think?  Would that have avoided the challenges
> your ran into?  Anything to better clarify or otherwise improve?

Possibly a silly question, but why can't we just change all the headers
in the site source to be this?

Is is something to do with the conversion toolchain?

> Gerald
> 
> Index: about.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/about.html,v
> retrieving revision 1.28
> diff -u -r1.28 about.html
> --- about.html6 May 2018 16:19:24 -   1.28
> +++ about.html14 May 2018 04:11:25 -
> +Validating a change
> +
> +To validate any changes, you can use the  +href="https://validator.w3.org";>W3 Validator. Just replace the
> + tag at the beginning of the document with the following
> +snippet and use the "Validate by File Upload" functionality.
> +
> +
> +
> + +  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> +  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">;
> + lang="en">
> +
> +
>  Checking in a change
>  
>  We recommend you list files explicitly to avoid accidental
> checkins


Re: [PATCH] PR libstdc++/85749 constrain seed sequences for random number engines

2018-05-15 Thread Jonathan Wakely

On 15/05/18 16:36 +0100, Jonathan Wakely wrote:

Constrain constructors and member functions of random number engines so
that functions taking seed sequences can only be called with types that
meet the seed sequence requirements.

PR libstdc++/85749
* include/bits/random.h (__detail::__is_seed_seq): New SFINAE helper.
(linear_congruential_engine, mersenne_twister_engine)
(subtract_with_carry_engine, discard_block_engine)
(independent_bits_engine, shuffle_order_engine): Use __is_seed_seq to
constrain function templates taking seed sequences.
* include/bits/random.tcc (linear_congruential_engine::seed(_Sseq&))
(mersenne_twister_engine::seed(_Sseq&))
(subtract_with_carry_engine::seed(_Sseq&)): Change return types to
match declarations.
* include/ext/random (simd_fast_mersenne_twister_engine): Use
__is_seed_seq to constrain function templates taking seed sequences.
* include/ext/random.tcc (simd_fast_mersenne_twister_engine::seed):
Change return type to match declaration.
* testsuite/26_numerics/random/discard_block_engine/cons/seed_seq2.cc:
New.
* testsuite/26_numerics/random/independent_bits_engine/cons/
seed_seq2.cc: New.
* testsuite/26_numerics/random/linear_congruential_engine/cons/
seed_seq2.cc: New.
* testsuite/26_numerics/random/mersenne_twister_engine/cons/
seed_seq2.cc: New.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error lineno.
* testsuite/26_numerics/random/shuffle_order_engine/cons/seed_seq2.cc:
New.
* testsuite/26_numerics/random/subtract_with_carry_engine/cons/
seed_seq2.cc: New.
* testsuite/ext/random/simd_fast_mersenne_twister_engine/cons/
seed_seq2.cc: New.

Tested powerpc64le-linux, committed to trunk.





commit 2197343e0cb9439687e453f37bf9ea2908bd499d
Author: Jonathan Wakely 
Date:   Sat May 12 00:22:23 2018 +0100

   PR libstdc++/85749 constrain seed sequences for random number engines
   
   Constrain constructors and member functions of random number engines so

   that functions taking seed sequences can only be called with types that
   meet the seed sequence requirements.
   
   PR libstdc++/85749

   * include/bits/random.h (__detail::__is_seed_seq): New SFINAE helper.
   (linear_congruential_engine, mersenne_twister_engine)
   (subtract_with_carry_engine, discard_block_engine)
   (independent_bits_engine, shuffle_order_engine): Use __is_seed_seq to
   constrain function templates taking seed sequences.
   * include/bits/random.tcc (linear_congruential_engine::seed(_Sseq&))
   (mersenne_twister_engine::seed(_Sseq&))
   (subtract_with_carry_engine::seed(_Sseq&)): Change return types to
   match declarations.
   * include/ext/random (simd_fast_mersenne_twister_engine): Use
   __is_seed_seq to constrain function templates taking seed sequences.
   * include/ext/random.tcc (simd_fast_mersenne_twister_engine::seed):
   Change return type to match declaration.
   * 
testsuite/26_numerics/random/discard_block_engine/cons/seed_seq2.cc:
   New.
   * testsuite/26_numerics/random/independent_bits_engine/cons/
   seed_seq2.cc: New.
   * testsuite/26_numerics/random/linear_congruential_engine/cons/
   seed_seq2.cc: New.
   * testsuite/26_numerics/random/mersenne_twister_engine/cons/
   seed_seq2.cc: New.
   * testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error 
lineno.
   * 
testsuite/26_numerics/random/shuffle_order_engine/cons/seed_seq2.cc:
   New.
   * testsuite/26_numerics/random/subtract_with_carry_engine/cons/
   seed_seq2.cc: New.
   * testsuite/ext/random/simd_fast_mersenne_twister_engine/cons/
   seed_seq2.cc: New.

diff --git a/libstdc++-v3/include/bits/random.h 
b/libstdc++-v3/include/bits/random.h
index f812bbf18b1..b76cfbb558e 100644
--- a/libstdc++-v3/include/bits/random.h
+++ b/libstdc++-v3/include/bits/random.h
@@ -185,6 +185,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Engine& _M_g;
  };

+template
+  using __seed_seq_generate_t = decltype(
+ std::declval<_Sseq&>().generate(std::declval(),
+ std::declval()));
+
+// Detect whether _Sseq is a valid seed sequence for
+// a random number engine _Engine with result type _Res.
+template>
+  using __is_seed_seq = __and_<
+__not_, _Engine>>,
+   is_unsigned,
+   __not_>
+  >;
+
  } // namespace __detail


I'm considering backporting this, with a simpler constraint:

   // Detect whether _Sseq is a valid seed sequence for
   // a random number engine _Engine with result type _Res.
   template
 using __is_seed_seq = __and_< __not_>,
   

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-15 Thread Richard Earnshaw (lists)
On 15/05/18 17:01, Wilco Dijkstra wrote:
> Hi,
>  
>> I see nothing about you addressing James' comment from 17th November...
> 
> I addressed that in a separate patch, see 
> https://patchwork.ozlabs.org/patch/839126/
> 
> Wilco
> 

Which doesn't appear to have been approved.  Did you follow up with Jeff?

R.


[PATCH] libstdc++/85184 remove __glibcxx_assert assertions from

2018-05-15 Thread Jonathan Wakely

As I said in the bugzilla PR, these assertions are all to catch our
own mistakes, not user error.

If we're comfortable the code is correct then we should remove them.

Should we wait until near the end of stage 1, to get more time with
these checks in place?


diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 8359f4f5335..a3d74966435 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,8 @@
+2018-05-15  Jonathan Wakely  
+
+	PR libstdc++/85184
+	* include/std/variant: Remove all uses of __glibcxx_assert.
+
 2018-05-15  Jonathan Wakely  
 
 	PR libstdc++/85749
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index c0212404bb2..efc1d3bf1e0 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -555,7 +555,6 @@ namespace __variant
 		__throw_exception_again;
 	  }
 	  }
-	__glibcxx_assert(this->_M_index == __rhs._M_index);
 	return *this;
   }
 
@@ -624,7 +623,6 @@ namespace __variant
 		__throw_exception_again;
 	  }
 	  }
-	__glibcxx_assert(this->_M_index == __rhs._M_index);
 	return *this;
   }
 
@@ -1105,7 +1103,7 @@ namespace __variant
 	noexcept(is_nothrow_constructible_v<__accepted_type<_Tp&&>, _Tp&&>)
 	: variant(in_place_index<__accepted_index<_Tp&&>>,
 		  std::forward<_Tp>(__t))
-	{ __glibcxx_assert(holds_alternative<__accepted_type<_Tp&&>>(*this)); }
+	{ }
 
   template
@@ -1114,7 +1112,7 @@ namespace __variant
 	variant(in_place_type_t<_Tp>, _Args&&... __args)
 	: variant(in_place_index<__index_of<_Tp>>,
 		  std::forward<_Args>(__args)...)
-	{ __glibcxx_assert(holds_alternative<_Tp>(*this)); }
+	{ }
 
   template
@@ -1125,7 +1123,7 @@ namespace __variant
 		_Args&&... __args)
 	: variant(in_place_index<__index_of<_Tp>>, __il,
 		  std::forward<_Args>(__args)...)
-	{ __glibcxx_assert(holds_alternative<_Tp>(*this)); }
+	{ }
 
   template, _Args&&... __args)
 	: _Base(in_place_index<_Np>, std::forward<_Args>(__args)...),
 	_Default_ctor_enabler(_Enable_default_constructor_tag{})
-	{ __glibcxx_assert(index() == _Np); }
+	{ }
 
   template,
@@ -1144,7 +1142,7 @@ namespace __variant
 		_Args&&... __args)
 	: _Base(in_place_index<_Np>, __il, std::forward<_Args>(__args)...),
 	_Default_ctor_enabler(_Enable_default_constructor_tag{})
-	{ __glibcxx_assert(index() == _Np); }
+	{ }
 
   template
 	enable_if_t<__exactly_once<__accepted_type<_Tp&&>>
@@ -1160,7 +1158,6 @@ namespace __variant
 	std::get<__index>(*this) = std::forward<_Tp>(__rhs);
 	  else
 	this->emplace<__index>(std::forward<_Tp>(__rhs));
-	  __glibcxx_assert(holds_alternative<__accepted_type<_Tp&&>>(*this));
 	  return *this;
 	}
 
@@ -1171,7 +1168,6 @@ namespace __variant
 	{
 	  auto& ret =
 	this->emplace<__index_of<_Tp>>(std::forward<_Args>(__args)...);
-	  __glibcxx_assert(holds_alternative<_Tp>(*this));
 	  return ret;
 	}
 
@@ -1184,7 +1180,6 @@ namespace __variant
 	  auto& ret =
 	this->emplace<__index_of<_Tp>>(__il,
 	   std::forward<_Args>(__args)...);
-	  __glibcxx_assert(holds_alternative<_Tp>(*this));
 	  return ret;
 	}
 
@@ -1207,7 +1202,6 @@ namespace __variant
 	  this->_M_index = variant_npos;
 	  __throw_exception_again;
 	}
-	  __glibcxx_assert(index() == _Np);
 	  return std::get<_Np>(*this);
 	}
 
@@ -1230,7 +1224,6 @@ namespace __variant
 	  this->_M_index = variant_npos;
 	  __throw_exception_again;
 	}
-	  __glibcxx_assert(index() == _Np);
 	  return std::get<_Np>(*this);
 	}
 


Re: [PATCH, v2] Recognize a missed usage of a sbfiz instruction

2018-05-15 Thread James Greenhalgh
On Mon, May 14, 2018 at 03:41:34PM -0500, Luis Machado wrote:
> On 05/11/2018 06:46 AM, Kyrill Tkachov wrote:
> > Hi Luis,
> > 
> > On 10/05/18 11:31, Luis Machado wrote:
> >>
> >> On 05/09/2018 10:44 AM, Kyrill Tkachov wrote:
> >>>
> >>> On 09/05/18 13:30, Luis Machado wrote:
>  Hi Kyrill,
> 
>  On 05/08/2018 11:09 AM, Kyrill Tkachov wrote:
> > Hi Luis,
> >
> > On 07/05/18 15:28, Luis Machado wrote:
> >> Hi,
> >>
> >> On 02/08/2018 10:45 AM, Luis Machado wrote:
> >>> Hi Kyrill,
> >>>
> >>> On 02/08/2018 09:48 AM, Kyrill Tkachov wrote:
>  Hi Luis,
> 
>  On 06/02/18 15:04, Luis Machado wrote:
> > Thanks for the feedback Kyrill. I've adjusted the v2 patch 
> > based on your
> > suggestions and re-tested the changes. Everything is still sane.
> 
>  Thanks! This looks pretty good to me.
> 
> > Since this is ARM-specific and fairly specific, i wonder if it 
> > would be
> > reasonable to consider it for inclusion at the current stage.
> 
>  It is true that the target maintainers can choose to take
>  such patches at any stage. However, any patch at this stage 
>  increases
>  the risk of regressions being introduced and these regressions
>  can come bite us in ways that are very hard to anticipate.
> 
>  Have a look at some of the bugs in bugzilla (or a quick scan of 
>  the gcc-bugs list)
>  for examples of the ways that things can go wrong with any of 
>  the myriad of GCC components
>  and the unexpected ways in which they can interact.
> 
>  For example, I am now working on what I initially thought was a 
>  one-liner fix for
>  PR 84164 but it has expanded into a 3-patch series with a midend 
>  component and
>  target-specific changes for 2 ports.
> 
>  These issues are very hard to catch during review and normal 
>  testing, and can sometimes take months of deep testing by
>  fuzzing and massive codebase rebuilds to expose, so the closer 
>  the commit is to a release
>  the higher the risk is that an obscure edge case will be 
>  unnoticed and unfixed in the release.
> 
>  So the priority at this stage is to minimise the risk of 
>  destabilising the codebase,
>  as opposed to taking in new features and desirable performance 
>  improvements (like your patch!)
> 
>  That is the rationale for delaying committing such changes until 
>  the start
>  of GCC 9 development. But again, this is up to the aarch64 
>  maintainers.
>  I'm sure the patch will be a perfectly fine and desirable commit 
>  for GCC 9.
>  This is just my perspective as maintainer of the arm port.
> >>>
> >>> Thanks. Your explanation makes the situation pretty clear and it 
> >>> sounds very reasonable. I'll put the patch on hold until 
> >>> development is open again.
> >>>
> >>> Regards,
> >>> Luis
> >>
> >> With GCC 9 development open, i take it this patch is worth 
> >> considering again?
> >>
> >
> > Yes, I believe the latest version is at:
> > https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00239.html ?
> >
> > +(define_insn "*ashift_extv_bfiz"
> > +  [(set (match_operand:GPI 0 "register_operand" "=r")
> > +    (ashift:GPI (sign_extract:GPI (match_operand:GPI 1 
> > "register_operand" "r")
> > +  (match_operand 2 
> > "aarch64_simd_shift_imm_offset_" "n")
> > +  (match_operand 3 
> > "aarch64_simd_shift_imm_" "n"))
> > + (match_operand 4 "aarch64_simd_shift_imm_" "n")))]
> > +  ""
> > +  "sbfiz\\t%0, %1, %4, %2"
> > +  [(set_attr "type" "bfx")]
> > +)
> > +
> 
>  Indeed.
> 
> >
> > Can you give a bit more information about what are the values for 
> > operands 2,3 and 4 in your example testcases?
> 
>  For sbfiz32 we have 3, 0 and 19 respectively. For sbfiz64 we have 6, 
>  0 and 38.
> 
> > I'm trying to understand why the value of operand 3 (the bit 
> > position the sign-extract starts from) doesn't get validated
> > in any way and doesn't play any role in the output...
> 
>  This may be an oversight. It seems operand 3 will always be 0 in 
>  this particular case i'm covering. It starts from 0, gets shifted x 
>  bits to the left and then y < x bits to the right). The operation is 
>  essentially an ashift of the bitfield followed by a sign-extension 
>  of the msb of the bitfield being extracted.
> 
>  Having a non-zero operand 3 from RTL means the shift amount won't 
>  translate directly to operand 3 of sbfiz (the position). Then we'd 
>

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-15 Thread Wilco Dijkstra
Hi,

> Which doesn't appear to have been approved.  Did you follow up with Jeff?

I'll get back to that one at some point - it'll take some time to agree on a way
forward with the callback.

Wilco


Re: [PATCH][AArch64] Improve register allocation of fma

2018-05-15 Thread James Greenhalgh
On Tue, May 15, 2018 at 08:00:49AM -0500, Wilco Dijkstra wrote:
> 
> ping

This seems like a fairly horrible hack around the register allocator
behaviour.

BUt, OK.

James

> This patch improves register allocation of fma by preferring to update the
> accumulator register.  This is done by adding fma insns with operand 1 as the
> accumulator.  The register allocator considers copy preferences only in 
> operand
> order, so if the first operand is dead, it has the highest chance of being
> reused as the destination.  As a result code using fma often has a better
> register allocation.  Performance of SPECFP2017 improves by over 0.5% on some
> implementations, while it had no effect on other implementations.  Fma is more
> readable too, in a simple example we now generate:
> 
>     fmadd   s16, s2, s1, s16
>     fmadd   s7, s17, s16, s7
>     fmadd   s6, s16, s7, s6
>     fmadd   s5, s7, s6, s5
> 
> instead of:
> 
>     fmadd   s16, s16, s2, s1
>     fmadd   s7, s7, s16, s6
>     fmadd   s6, s6, s7, s5
>     fmadd   s5, s5, s6, s4
> 
> Bootstrap OK. OK for commit?
> 
> ChangeLog:
> 2018-01-04  Wilco Dijkstra  
> 
>     gcc/
>     * config/aarch64/aarch64.md (fma4): Change into expand pattern.
>     (fnma4): Likewise.
>     (fms4): Likewise.
>     (fnms4): Likewise.
>     (aarch64_fma4): Rename insn, reorder accumulator operand.
>     (aarch64_fnma4): Likewise.
>     (aarch64_fms4): Likewise.
>     (aarch64_fnms4): Likewise.
>     (aarch64_fnmadd4): Likewise.
  


Re: [PR63185][RFC] Improve DSE with branches

2018-05-15 Thread Richard Biener
On May 15, 2018 5:04:53 PM GMT+02:00, Jeff Law  wrote:
>On 05/15/2018 02:15 AM, Richard Biener wrote:
>> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
>> 
>>> Hi,
>>>
>>> Attached patch handles PR63185 when we reach PHI with temp != NULLL.
>>> We could see the PHI and if there isn't any uses for PHI that is
>>> interesting, we could ignore that ?
>>>
>>> Bootstrapped and regression tested on x86_64-linux-gnu.
>>> Is this OK?
>> 
>> No, as Jeff said we can't do it this way.
>> 
>> If we end up with multiple VDEFs in the walk of defvar immediate
>> uses we know we are dealing with a CFG fork.  We can't really
>> ignore any of the paths but we have to
>> 
>>  a) find the merge point (and the associated VDEF)
>>  b) verify for each each chain of VDEFs with associated VUSEs
>> up to that merge VDEF that we have no uses of the to classify
>> store and collect (partial) kills
>>  c) intersect kill info and continue walking from the merge point
>Which makes me come back to "why is this structured as a forward walk?"

Probably historic reasons. We _do_ have a backward walk DSE embedded inside DCE 
though but I had troubles making it as powerful as DSE. 

>DSE seems much more naturally structured as a walk of the
>post-dominator
>tree, walking instructions in reverse order.

DSE performs analysis for each possibly dead store and walks those in postdom 
order. But as the analysis decides whether a candidate is dead it needs to find 
later stores that makes it so. Thus a forward walk. The canonical way would 
possibly to find earlier stores that a store makes dead which would structure 
this more like a dataflow problem. (but the kill sets will be expensive to 
manage I'd guess) 

>Cases like 63185 would naturally fall out of that structure as the
>clobber at the end of the function post-dominates the memset and
>there's
>no intervening aliased loads.
>
>In fact, I thought that's how I initially structured DSE.I'm not
>sure when/why it changed.

IIRC it was improved, became too expensive and then ended up the current way. 

Richard. 

>Jeff



Re: [PATCH][AArch64] Unify vec_set patterns, support floating-point vector modes properly

2018-05-15 Thread Richard Sandiford
Kyrill  Tkachov  writes:
> Hi all,
>
> We've a deficiency in our vec_set family of patterns.  We don't
> support directly loading a vector lane using LD1 for V2DImode and all
> the vector floating-point modes.  We do do it correctly for the other
> integer vector modes (V4SI, V8HI etc) though.
>
> The alternatives on the relative floating-point patterns only allow a
> register-to-register INS instruction.  That means if we want to load a
> value into a vector lane we must first load it into a scalar register
> and then perform an INS, which is wasteful.
>
> There is also an explicit V2DI vec_set expander dangling around for no
> reason that I can see. It seems to do the exact same things as the
> other vec_set expanders. This patch removes that.  It now unifies all
> vec_set expansions into a single "vec_set" define_expand using
> the catch-all VALL_F16 iterator.
>
> I decided to leave two aarch64_simd_vec_set define_insns. One
> for the integer vector modes (that now include V2DI) and one for the
> floating-point vector modes. That is so that we can avoid specifying
> "w,r" alternatives for floating-point modes in case the
> register-allocator gets confused and starts gratuitously moving
> registers between the two banks.  So the floating-point pattern only
> two alternatives, one for SIMD-to-SIMD INS and one for LD1.

Did you see any cases in which this was necessary?  In some ways it
seems to run counter to Wilco's recent patches, which tended to remove
the * markers from the "unnatural" register class and trust the register
allocator to make a sensible decision.

I think our default position should be trust the allocator here.
If the consumers all require "w" registers then the RA will surely
try to use "w" registers if at all possible.  But if the consumers
don't care then it seems reasonable to offer both, since in those
cases it doesn't really make much difference whether the payload
happens to be SF or SI (say).

There are also cases in which the consumer could actively require
an integer register.  E.g. some code uses unions to bitcast floats
to ints and then do bitwise arithmetic on them.

> With this patch we avoid loading values into scalar registers and then
> doing an explicit INS on them to move them into the desired vector
> lanes. For example for:
>
> typedef float v4sf __attribute__ ((vector_size (16)));
> typedef long long v2di __attribute__ ((vector_size (16)));
>
> v2di
> foo_v2di (long long *a, long long *b)
> {
>v2di res = { *a, *b };
>return res;
> }
>
> v4sf
> foo_v4sf (float *a, float *b, float *c, float *d)
> {
>v4sf res = { *a, *b, *c, *d };
>return res;
> }
>
> we currently generate:
>
> foo_v2di:
>  ldr d0, [x0]
>  ldr x0, [x1]
>  ins v0.d[1], x0
>  ret
>
> foo_v4sf:
>  ldr s0, [x0]
>  ldr s3, [x1]
>  ldr s2, [x2]
>  ldr s1, [x3]
>  ins v0.s[1], v3.s[0]
>  ins v0.s[2], v2.s[0]
>  ins v0.s[3], v1.s[0]
>  ret
>
> but with this patch we generate the much cleaner:
> foo_v2di:
>  ldr d0, [x0]
>  ld1 {v0.d}[1], [x1]
>  ret
>
> foo_v4sf:
>  ldr s0, [x0]
>  ld1 {v0.s}[1], [x1]
>  ld1 {v0.s}[2], [x2]
>  ld1 {v0.s}[3], [x3]
>  ret

Nice!  The original reason for:

  /* FIXME: At the moment the cost model seems to underestimate the
 cost of using elementwise accesses.  This check preserves the
 traditional behavior until that can be fixed.  */
  if (*memory_access_type == VMAT_ELEMENTWISE
  && !STMT_VINFO_STRIDED_P (stmt_info)
  && !(stmt == GROUP_FIRST_ELEMENT (stmt_info)
   && !GROUP_NEXT_ELEMENT (stmt_info)
   && !pow2p_hwi (GROUP_SIZE (stmt_info
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "not falling back to elementwise accesses\n");
  return false;
}

was that we seemed to be too optimistic about how cheap it was to
construct a vector from scalars.  Maybe this patch brings the code
closer to the cost (for AArch64 only of course).

FWIW, the patch looks good to me bar the GPR/FPR split.

Thanks,
Richard


Re: [PATCH][AArch64] Improve register allocation of fma

2018-05-15 Thread Wilco Dijkstra
Hi,

James Greenhalgh wrote:
>
> This seems like a fairly horrible hack around the register allocator
> behaviour.

That is why I proposed to improve the register allocator so one can explicitly
specify the copy preference in the md syntax. However that wasn't accepted,
so we'll have to use a hack instead...

Wilco

Re: [PR63185][RFC] Improve DSE with branches

2018-05-15 Thread Jeff Law
On 05/15/2018 03:20 AM, Richard Biener wrote:
> 
> First (baby) step is the following - it arranges to collect the
> defs we need to continue walking from and implements trivial
> reduction by stopping at (full) kills.  This allows us to handle
> the new testcase (which was already handled in the very late DSE
> pass with the help of sinking the store).
Seems like a notable improvement.  They way we were handling defs to
continue walking forward from was, umm, fugly.


> 
> I didn't adjust the byte-tracking case fully (I'm not fully understanding
> the code in the case of a use and I'm not sure whether it's worth
> doing the def reduction with byte-tracking).
The byte tracking case is pretty simple.  We're tracking the bytes from
the original store that are still live.  If we encounter a subsequent
store that over-writes a subset of bytes, we consider those bytes from
the original store as "dead".

If we then encounter a load and the load only reads from "dead" bytes,
then we can ignore the load for the purposes of DSE.

It's something you suggested.  I did some instrumentation when working
on DSE for gcc6/gcc7 and while it triggers, it's not terribly often.



> 
> Your testcase can be handled by reducing the PHI and the call def
> by seeing that the only use of a candidate def is another def
> we have already processed.  Not sure if worth special-casing though,
> I'd rather have a go at "recursing".  That will be the next
> patch.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> Richard.
> 
> 2018-05-15  Richard Biener  
> 
>   * tree-ssa-dse.c (dse_classify_store): Remove use_stmt parameter,
>   add by_clobber_p one.  Change algorithm to collect all defs
>   representing uses we need to walk and try reducing them to
>   a single one before failing.
>   (dse_dom_walker::dse_optimize_stmt): Adjust.
> 
>   * gcc.dg/tree-ssa/ssa-dse-31.c: New testcase.
Seems quite reasonable to me as long as we're continuing with a forward
algorithm.

jeff


Re: [PATCH, rs6000] Fix expected BE counts for vsx-vector-6.h

2018-05-15 Thread Carl Love
Segher:

I removed the Power 6 test file.  I also went back through the tests
and checked that each test had an instruction count check.  I found a
few that were missing the instruction check.  I added the instruction
checks and updated the expected instruction counts for the LE and BEcases.  The 
patch was retested on 

The patch was tested on 

powerpc64le-unknown-linux-gnu (Power 8 LE)   
    powerpc64le-unknown-linux-gnu (Power 9 LE)
    powerpc64-unknown-linux-gnu (Power 8 BE)  configured for power7
    powerpc64-unknown-linux-gnu (Power 8 BE)  configured for power8

Please let me know if the patch looks OK for GCC mainline.

 Carl Love
-
gcc/testsuite/ChangeLog:

2018-05-14 Carl Love  
* gcc.target/powerpc/vsx-vector-6-be.c: Remove file
* gcc.target/powerpc/vsx-vector-6-be.p7.c (dg-final): New test file for
Power 7.
* gcc.target/powerpc/vsx-vector-6-be.p8.c (dg-final): New test file for
Power 8.
* gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update counts for
.xvcmpeqdp., xvcmpgtdp., xvcmpgedp., xxlxor, xvrdpi.
---
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c | 32 
 .../gcc.target/powerpc/vsx-vector-6-be.p7.c| 43 ++
 .../gcc.target/powerpc/vsx-vector-6-be.p8.c| 43 ++
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c |  9 +
 4 files changed, 95 insertions(+), 32 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.p7.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.p8.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c
deleted file mode 100644
index 3305781..000
--- a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c
+++ /dev/null
@@ -1,32 +0,0 @@
-/* { dg-do compile { target { powerpc64-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-options "-mvsx -O2" } */
-
-/* Expected instruction counts for Big Endian */
-
-/* { dg-final { scan-assembler-times "xvabsdp" 1 } } */
-/* { dg-final { scan-assembler-times "xvadddp" 1 } } */
-/* { dg-final { scan-assembler-times "xxlnor" 7 } } */
-/* { dg-final { scan-assembler-times "xvcmpeqdp" 6 } } */
-/* { dg-final { scan-assembler-times "xvcmpgtdp" 8 } } */
-/* { dg-final { scan-assembler-times "xvcmpgedp" 7 } } */
-/* { dg-final { scan-assembler-times "xvrdpim" 1 } } */
-/* { dg-final { scan-assembler-times "xvmaddadp" 1 } } */
-/* { dg-final { scan-assembler-times "xvmsubadp" 1 } } */
-/* { dg-final { scan-assembler-times "xvsubdp" 1 } } */
-/* { dg-final { scan-assembler-times "xvmaxdp" 1 } } */
-/* { dg-final { scan-assembler-times "xvmindp" 1 } } */
-/* { dg-final { scan-assembler-times "xvmuldp" 1 } } */
-/* { dg-final { scan-assembler-times "vperm" 1 } } */
-/* { dg-final { scan-assembler-times "xvrdpic" 1 } } */
-/* { dg-final { scan-assembler-times "xvsqrtdp" 1 } } */
-/* { dg-final { scan-assembler-times "xvrdpiz" 1 } } */
-/* { dg-final { scan-assembler-times "xvmsubasp" 1 } } */
-/* { dg-final { scan-assembler-times "xvnmaddasp" 1 } } */
-/* { dg-final { scan-assembler-times "vmsumshs" 1 } } */
-/* { dg-final { scan-assembler-times "xxland" 13 } } */
-/* { dg-final { scan-assembler-times "xxsel" 2 } } */
-
-/* Source code for the test in vsx-vector-6.h */
-#include "vsx-vector-6.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.p7.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.p7.c
new file mode 100644
index 000..835b24f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.p7.c
@@ -0,0 +1,43 @@
+/* { dg-do compile { target { powerpc64-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mvsx -O2 -mcpu=power7" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
+
+
+/* Expected instruction counts for Big Endian */
+
+/* { dg-final { scan-assembler-times "xvabsdp" 1 } } */
+/* { dg-final { scan-assembler-times "xvadddp" 1 } } */
+/* { dg-final { scan-assembler-times "xxlnor" 7 } } */
+/* { dg-final { scan-assembler-times "xvcmpeqdp" 6 } } */
+/* { dg-final { scan-assembler-times "xvcmpeqdp." 6 } } */
+/* { dg-final { scan-assembler-times "xvcmpgtdp" 8 } } */
+/* { dg-final { scan-assembler-times "xvcmpgtdp." 8 } } */
+/* { dg-final { scan-assembler-times "xvcmpgedp" 7 } } */
+/* { dg-final { scan-assembler-times "xvcmpgedp." 7 } } */
+/* { dg-final { scan-assembler-times "xvrdpim" 1 } } */
+/* { dg-final { scan-assembler-times "xvmaddadp" 1 } } */
+/* { dg-final { scan-assembler-times "xvmsubadp" 1 } } */
+/* { dg-final { scan-assembler-times "xvsubdp" 1 } } */
+/* { dg-final {

C++ PATCH for c++/64372/DR 1560, value category of ?: with one throw

2018-05-15 Thread Jason Merrill
DR 1560 in C++14 fixed this rule to not do a gratuitous lvalue-rvalue
conversion in this case.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit c7b8b79f72865740ead84aa602a6bc8651a60a93
Author: Jason Merrill 
Date:   Tue May 15 15:46:51 2018 -0400

PR c++/64372 - CWG 1560, gratuitous lvalue-rvalue conversion in ?:

* call.c (build_conditional_expr_1): Don't force_rvalue when one arm
is a throw-expression.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f620c0d86e8..09a3618b007 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4969,56 +4969,33 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree arg2, tree arg3,
   arg3_type = unlowered_expr_type (arg3);
   if (VOID_TYPE_P (arg2_type) || VOID_TYPE_P (arg3_type))
 {
-  /* Do the conversions.  We don't these for `void' type arguments
-	 since it can't have any effect and since decay_conversion
-	 does not handle that case gracefully.  */
-  if (!VOID_TYPE_P (arg2_type))
-	arg2 = decay_conversion (arg2, complain);
-  if (!VOID_TYPE_P (arg3_type))
-	arg3 = decay_conversion (arg3, complain);
-  arg2_type = TREE_TYPE (arg2);
-  arg3_type = TREE_TYPE (arg3);
-
   /* [expr.cond]
 
 	 One of the following shall hold:
 
 	 --The second or the third operand (but not both) is a
-	   throw-expression (_except.throw_); the result is of the
-	   type of the other and is an rvalue.
+	   throw-expression (_except.throw_); the result is of the type
+	   and value category of the other.
 
 	 --Both the second and the third operands have type void; the
-	   result is of type void and is an rvalue.
-
-	 We must avoid calling force_rvalue for expressions of type
-	 "void" because it will complain that their value is being
-	 used.  */
+	   result is of type void and is a prvalue.  */
   if (TREE_CODE (arg2) == THROW_EXPR
 	  && TREE_CODE (arg3) != THROW_EXPR)
 	{
-	  if (!VOID_TYPE_P (arg3_type))
-	{
-	  arg3 = force_rvalue (arg3, complain);
-	  if (arg3 == error_mark_node)
-		return error_mark_node;
-	}
-	  arg3_type = TREE_TYPE (arg3);
 	  result_type = arg3_type;
+	  is_glvalue = glvalue_p (arg3);
 	}
   else if (TREE_CODE (arg2) != THROW_EXPR
 	   && TREE_CODE (arg3) == THROW_EXPR)
 	{
-	  if (!VOID_TYPE_P (arg2_type))
-	{
-	  arg2 = force_rvalue (arg2, complain);
-	  if (arg2 == error_mark_node)
-		return error_mark_node;
-	}
-	  arg2_type = TREE_TYPE (arg2);
 	  result_type = arg2_type;
+	  is_glvalue = glvalue_p (arg2);
 	}
   else if (VOID_TYPE_P (arg2_type) && VOID_TYPE_P (arg3_type))
-	result_type = void_type_node;
+	{
+	  result_type = void_type_node;
+	  is_glvalue = false;
+	}
   else
 	{
   if (complain & tf_error)
@@ -5037,7 +5014,6 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree arg2, tree arg3,
 	  return error_mark_node;
 	}
 
-  is_glvalue = false;
   goto valid_operands;
 }
   /* [expr.cond]
@@ -5155,10 +5131,6 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree arg2, tree arg3,
   && same_type_p (arg2_type, arg3_type))
 {
   result_type = arg2_type;
-  if (processing_template_decl)
-	/* Let lvalue_kind know this was a glvalue.  */
-	result_type = cp_build_reference_type (result_type, xvalue_p (arg2));
-
   arg2 = mark_lvalue_use (arg2);
   arg3 = mark_lvalue_use (arg3);
   goto valid_operands;
@@ -5352,6 +5324,13 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree arg2, tree arg3,
 return error_mark_node;
 
  valid_operands:
+  if (processing_template_decl && is_glvalue)
+{
+  /* Let lvalue_kind know this was a glvalue.  */
+  tree arg = (result_type == arg2_type ? arg2 : arg3);
+  result_type = cp_build_reference_type (result_type, xvalue_p (arg));
+}
+
   result = build3_loc (loc, COND_EXPR, result_type, arg1, arg2, arg3);
 
   /* If the ARG2 and ARG3 are the same and don't have side-effects,
diff --git a/gcc/testsuite/g++.dg/cpp1y/dr1560.C b/gcc/testsuite/g++.dg/cpp1y/dr1560.C
new file mode 100644
index 000..b21ca98e279
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/dr1560.C
@@ -0,0 +1,14 @@
+// Core 1560
+// { dg-do compile { target c++14 } }
+
+struct A
+{
+  A();
+  A(const A&) = delete;
+};
+
+void f(bool b)
+{
+  A a;
+  b ? a : throw 42;
+}


Re: [PR63185][RFC] Improve DSE with branches

2018-05-15 Thread Kugan Vivekanandarajah
Hi Richard,

On 15 May 2018 at 19:20, Richard Biener  wrote:
> On Tue, 15 May 2018, Richard Biener wrote:
>
>> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
>>
>> > Hi,
>> >
>> > Attached patch handles PR63185 when we reach PHI with temp != NULLL.
>> > We could see the PHI and if there isn't any uses for PHI that is
>> > interesting, we could ignore that ?
>> >
>> > Bootstrapped and regression tested on x86_64-linux-gnu.
>> > Is this OK?
>>
>> No, as Jeff said we can't do it this way.
>>
>> If we end up with multiple VDEFs in the walk of defvar immediate
>> uses we know we are dealing with a CFG fork.  We can't really
>> ignore any of the paths but we have to
>>
>>  a) find the merge point (and the associated VDEF)
>>  b) verify for each each chain of VDEFs with associated VUSEs
>> up to that merge VDEF that we have no uses of the to classify
>> store and collect (partial) kills
>>  c) intersect kill info and continue walking from the merge point
>>
>> in b) there's the optional possibility to find sinking opportunities
>> in case we have kills on some paths but uses on others.  This is why
>> DSE should be really merged with (store) sinking.
>>
>> So if we want to enhance DSEs handling of branches then we need
>> to refactor the simple dse_classify_store function.  Let me take
>> an attempt at this today.
>
> First (baby) step is the following - it arranges to collect the
> defs we need to continue walking from and implements trivial
> reduction by stopping at (full) kills.  This allows us to handle
> the new testcase (which was already handled in the very late DSE
> pass with the help of sinking the store).

Thanks for the explanation and thanks for working on this.

>
> I took the opportunity to kill the use_stmt parameter of
> dse_classify_store as the only user is only looking for whether
> the kills were all clobbers which I added a new parameter for.
>
> I didn't adjust the byte-tracking case fully (I'm not fully understanding
> the code in the case of a use and I'm not sure whether it's worth
> doing the def reduction with byte-tracking).

Since byte tracking is tracking the killed subset of bytes in the
original def, in your patch can we calculate the bytes killed by each
defs[i] and then find the intersection for the iteration. If we don't
have complete kill, we can use this to set live bytes and iterate till
all the live_bytes are cleared like it was done earlier.

Thanks,
Kugan

>
> Your testcase can be handled by reducing the PHI and the call def
> by seeing that the only use of a candidate def is another def
> we have already processed.  Not sure if worth special-casing though,
> I'd rather have a go at "recursing".  That will be the next
> patch.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2018-05-15  Richard Biener  
>
> * tree-ssa-dse.c (dse_classify_store): Remove use_stmt parameter,
> add by_clobber_p one.  Change algorithm to collect all defs
> representing uses we need to walk and try reducing them to
> a single one before failing.
> (dse_dom_walker::dse_optimize_stmt): Adjust.
>
> * gcc.dg/tree-ssa/ssa-dse-31.c: New testcase.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
> new file mode 100644
> index 000..9ea9268fe1d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-dse1-details" } */
> +
> +void g();
> +void f(int n, char *p)
> +{
> +  *p = 0;
> +  if (n)
> +{
> +  *p = 1;
> +  g();
> +}
> +  *p = 2;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 9220fea7f2e..126592a1d13 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -516,18 +516,21 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap 
> live)
>  }
>
>  /* A helper of dse_optimize_stmt.
> -   Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate
> -   statement *USE_STMT that may prove STMT to be dead.
> -   Return TRUE if the above conditions are met, otherwise FALSE.  */
> +   Given a GIMPLE_ASSIGN in STMT that writes to REF, classify it
> +   according to downstream uses and defs.  Sets *BY_CLOBBER_P to true
> +   if only clobber statements influenced the classification result.
> +   Returns the classification.  */
>
>  static dse_store_status
> -dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
> -   bool byte_tracking_enabled, sbitmap live_bytes)
> +dse_classify_store (ao_ref *ref, gimple *stmt,
> +   bool byte_tracking_enabled, sbitmap live_bytes,
> +   bool *by_clobber_p = NULL)
>  {
>gimple *temp;
>unsigned cnt = 0;
>
> -  *use_stmt = NULL;
> +  if (by_clobber_p)
> +*by_clobber_p = true;
>
>/* Find the first dominated statement that clob

Re: [PATCH, rs6000] Fix expected BE counts for vsx-vector-6.h

2018-05-15 Thread Segher Boessenkool
Hi Carl,

On Tue, May 15, 2018 at 01:43:18PM -0700, Carl Love wrote:
>   * gcc.target/powerpc/vsx-vector-6-be.c: Remove file

Full stop.

>   * gcc.target/powerpc/vsx-vector-6-be.p7.c (dg-final): New test file for
>   Power 7.

The whole file is new, so just
* gcc.target/powerpc/vsx-vector-6-be.p7.c: New test file for Power 7.
or even
* gcc.target/powerpc/vsx-vector-6-be.p7.c: New test.

>   * gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update counts for
>   .xvcmpeqdp., xvcmpgtdp., xvcmpgedp., xxlxor, xvrdpi.

Stray leading dot on this last line?

Okay for trunk, thanks!


Segher


C++ PATCH to make cxx_eval_vec_init_1 respect the quiet flag

2018-05-15 Thread Jason Merrill
While working on something else I noticed that this function was
generating warnings when it's supposed to be quiet.

Tested x86_64-pc-linux-gnu, applied to trunk.
commit ea990203b6a7562e399c22ffd70e7391106d6dc5
Author: Jason Merrill 
Date:   Tue May 15 15:30:43 2018 -0400

* constexpr.c (cxx_eval_vec_init_1): Pass tf_none if ctx->quiet.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 9ee37de88d3..8c6ec555ca9 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2936,6 +2936,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree atype, tree init,
   vec **p = &CONSTRUCTOR_ELTS (ctx->ctor);
   bool pre_init = false;
   unsigned HOST_WIDE_INT i;
+  tsubst_flags_t complain = ctx->quiet ? tf_none : tf_warning_or_error;
 
   /* For the default constructor, build up a call to the default
  constructor of the element type.  We only need to handle class types
@@ -2946,7 +2947,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree atype, tree init,
 /* We only do this at the lowest level.  */;
   else if (value_init)
 {
-  init = build_value_init (elttype, tf_warning_or_error);
+  init = build_value_init (elttype, complain);
   pre_init = true;
 }
   else if (!init)
@@ -2954,7 +2955,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree atype, tree init,
   vec *argvec = make_tree_vector ();
   init = build_special_member_call (NULL_TREE, complete_ctor_identifier,
 	&argvec, elttype, LOOKUP_NORMAL,
-	tf_warning_or_error);
+	complain);
   release_tree_vector (argvec);
   init = build_aggr_init_expr (TREE_TYPE (init), init);
   pre_init = true;
@@ -2981,8 +2982,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree atype, tree init,
 	  reuse = i == 0;
 	}
 	  else
-	eltinit = cp_build_array_ref (input_location, init, idx,
-	  tf_warning_or_error);
+	eltinit = cp_build_array_ref (input_location, init, idx, complain);
 	  eltinit = cxx_eval_vec_init_1 (&new_ctx, elttype, eltinit, value_init,
 	 lval,
 	 non_constant_p, overflow_p);
@@ -3000,11 +3000,10 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree atype, tree init,
 	  /* Copying an element.  */
 	  gcc_assert (same_type_ignoring_top_level_qualifiers_p
 		  (atype, TREE_TYPE (init)));
-	  eltinit = cp_build_array_ref (input_location, init, idx,
-	tf_warning_or_error);
+	  eltinit = cp_build_array_ref (input_location, init, idx, complain);
 	  if (!lvalue_p (init))
 	eltinit = move (eltinit);
-	  eltinit = force_rvalue (eltinit, tf_warning_or_error);
+	  eltinit = force_rvalue (eltinit, complain);
 	  eltinit = cxx_eval_constant_expression (&new_ctx, eltinit, lval,
 		  non_constant_p, overflow_p);
 	}


RFA (ipa-prop): PATCHes to avoid use of deprecated copy ctor and op=

2018-05-15 Thread Jason Merrill
In C++11 and up, the implicitly-declared copy constructor and
assignment operator are deprecated if one of them, or the destructor,
is user-provided.  Implementing that in G++ turned up a few dodgy uses
in the compiler.

In general it's unsafe to copy an ipa_edge_args, because if one of the
pointers is non-null you get two copies of a vec pointer, and when one
of the objects is destroyed it frees the vec and leaves the other
object pointing to freed memory.  This specific example is safe
because it only copies from an object with null pointers, but it would
be better to avoid the copy.  OK for trunk?

It's unsafe to copy a releasing_vec for the same reason.  There are a
few places where the copy constructor is nominally used to initialize
a releasing_vec variable from a temporary returned from a function; in
these cases no actual copy is done, and the function directly
initializes the variable, so it's safe.  I made this clearer by
declaring the copy constructor but not defining it, so uses that get
elided are accepted, but uses that actually want to copy will fail to
link.

In cp_expr we defined the copy constructor to do the same thing that
the implicit definition would do, causing the copy assignment operator
to be deprecated.  We don't need the copy constructor, so let's remove
it.

Tested x86_64-pc-linux-gnu.  Are the ipa-prop bits OK for trunk?
commit 560b104324b10814dcaaf65606943d4ca55647d6
Author: Jason Merrill 
Date:   Tue May 15 20:47:41 2018 -0400

* ipa-prop.h (ipa_edge_args::release): Factor out of destructor.

* ipa-prop.c (ipa_free_edge_args_substructures): Use it.

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 38441cc49bc..11bbf356ce1 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3714,8 +3714,7 @@ ipa_check_create_edge_args (void)
 void
 ipa_free_edge_args_substructures (struct ipa_edge_args *args)
 {
-  vec_free (args->jump_functions);
-  *args = ipa_edge_args ();
+  args->release ();
 }
 
 /* Free all ipa_edge structures.  */
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index a61e06135e3..08919acb46c 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -569,13 +569,18 @@ class GTY((for_user)) ipa_edge_args
   ipa_edge_args () : jump_functions (NULL), polymorphic_call_contexts (NULL)
 {}
 
-  /* Destructor.  */
-  ~ipa_edge_args ()
+  void release ()
 {
   vec_free (jump_functions);
   vec_free (polymorphic_call_contexts);
 }
 
+  /* Destructor.  */
+  ~ipa_edge_args ()
+{
+  release ();
+}
+
   /* Vectors of the callsite's jump function and polymorphic context
  information of each parameter.  */
   vec *jump_functions;
commit 648ffd02e23ac2695de04ab266b4f8862df6c2ed
Author: Jason Merrill 
Date:   Tue May 15 20:46:54 2018 -0400

* cp-tree.h (cp_expr): Remove copy constructor.

* mangle.c (struct releasing_vec): Declare copy constructor.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 9a2eb3be4d1..cab926028b8 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -59,9 +59,6 @@ public:
   cp_expr (tree value, location_t loc):
 m_value (value), m_loc (loc) {}
 
-  cp_expr (const cp_expr &other) :
-m_value (other.m_value), m_loc (other.m_loc) {}
-
   /* Implicit conversions to tree.  */
   operator tree () const { return m_value; }
   tree & operator* () { return m_value; }
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 6a7df804caf..59a3111fba2 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -1555,6 +1555,10 @@ struct releasing_vec
   releasing_vec (vec_t *v): v(v) { }
   releasing_vec (): v(make_tree_vector ()) { }
 
+  /* Copy constructor is deliberately declared but not defined,
+ copies must always be elided.  */
+  releasing_vec (const releasing_vec &);
+
   vec_t &operator* () const { return *v; }
   vec_t *operator-> () const { return v; }
   vec_t *get () const { return v; }


RFA (tree.c): PATCH to make warn_deprecated_use return bool

2018-05-15 Thread Jason Merrill
The function "warning" returns bool to indicated whether or not any
diagnostic was actually emitted; warn_deprecated_use should as well.

It's also unnecessary to duplicate the warning code between the cases
of null or non-null "decl", since the actual warnings were the same.
The only thing that's different is whether we indicate the source
location of "decl".

Tested x86_64-pc-linux-gnu.  OK for trunk?
commit b49f292814693de97b218f6d8b32b20dd68fb8c8
Author: Jason Merrill 
Date:   Tue May 15 17:41:19 2018 -0400

* tree.c (warn_deprecated_use): Return bool.  Simplify logic.

diff --git a/gcc/tree.c b/gcc/tree.c
index 77a73b4495e..68165f4deed 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -12420,14 +12420,16 @@ typedef_variant_p (const_tree type)
   return is_typedef_decl (TYPE_NAME (type));
 }
 
-/* Warn about a use of an identifier which was marked deprecated.  */
-void
+/* Warn about a use of an identifier which was marked deprecated.  Returns
+   whether a warning was given.  */
+
+bool
 warn_deprecated_use (tree node, tree attr)
 {
   const char *msg;
 
   if (node == 0 || !warn_deprecated_decl)
-return;
+return false;
 
   if (!attr)
 {
@@ -12450,7 +12452,7 @@ warn_deprecated_use (tree node, tree attr)
   else
 msg = NULL;
 
-  bool w;
+  bool w = false;
   if (DECL_P (node))
 {
   if (msg)
@@ -12476,49 +12478,29 @@ warn_deprecated_use (tree node, tree attr)
 	what = DECL_NAME (TYPE_NAME (node));
 	}
 
-  if (decl)
+  if (what)
 	{
-	  if (what)
-	{
-	  if (msg)
-		w = warning (OPT_Wdeprecated_declarations,
-			 "%qE is deprecated: %s", what, msg);
-	  else
-		w = warning (OPT_Wdeprecated_declarations,
-			 "%qE is deprecated", what);
-	}
+	  if (msg)
+	w = warning (OPT_Wdeprecated_declarations,
+			 "%qE is deprecated: %s", what, msg);
 	  else
-	{
-	  if (msg)
-		w = warning (OPT_Wdeprecated_declarations,
-			 "type is deprecated: %s", msg);
-	  else
-		w = warning (OPT_Wdeprecated_declarations,
-			 "type is deprecated");
-	}
-	  if (w)
-	inform (DECL_SOURCE_LOCATION (decl), "declared here");
+	w = warning (OPT_Wdeprecated_declarations,
+			 "%qE is deprecated", what);
 	}
   else
 	{
-	  if (what)
-	{
-	  if (msg)
-		warning (OPT_Wdeprecated_declarations, "%qE is deprecated: %s",
-			 what, msg);
-	  else
-		warning (OPT_Wdeprecated_declarations, "%qE is deprecated", what);
-	}
+	  if (msg)
+	w = warning (OPT_Wdeprecated_declarations,
+			 "type is deprecated: %s", msg);
 	  else
-	{
-	  if (msg)
-		warning (OPT_Wdeprecated_declarations, "type is deprecated: %s",
-			 msg);
-	  else
-		warning (OPT_Wdeprecated_declarations, "type is deprecated");
-	}
+	w = warning (OPT_Wdeprecated_declarations,
+			 "type is deprecated");
 	}
+  if (w && decl)
+	inform (DECL_SOURCE_LOCATION (decl), "declared here");
 }
+
+  return w;
 }
 
 /* Return true if REF has a COMPONENT_REF with a bit-field field declaration
diff --git a/gcc/tree.h b/gcc/tree.h
index 74a0d1881a6..ef8bff405fe 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4828,7 +4828,7 @@ extern tree tree_strip_sign_nop_conversions (tree);
 extern const_tree strip_invariant_refs (const_tree);
 extern tree lhd_gcc_personality (void);
 extern void assign_assembler_name_if_needed (tree);
-extern void warn_deprecated_use (tree, tree);
+extern bool warn_deprecated_use (tree, tree);
 extern void cache_integer_cst (tree);
 extern const char *combined_fn_name (combined_fn);
 


Re: [PATCH] improve -Wrestrict for struct members (PR 85753)

2018-05-15 Thread Martin Sebor

On 05/14/2018 03:21 PM, Jeff Law wrote:

On 05/11/2018 05:09 PM, Martin Sebor wrote:

The attached patch extends -Wrestrict to constrain valid offset
ranges into objects of struct types to the bounds of the object
type, the same way the warning already handles arrays.  This
makes it possible to detect overlapping accesses in cases like
the second call to memcpy below:

  char a[16];

  struct { char a[16]; } x;

  void f (int i, int j)
  {
memcpy (&a[i], &a[j], 9);   // -Wrestrict (good)

memcpy (&x.a[i], &x.a[j], 9);   // missing -Wrestrict
  }

These is no way to copy 9 bytes within a 16 byte array without
either overlap or without accessing memory outside the bounaries
of the object.

This is for GCC 9.

Thanks
Martin

gcc-85753.diff


PR tree-optimization/85753 - missing -Wrestrict on memcpy into a member array

gcc/ChangeLog:

PR tree-optimization/85753
* gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref): Handle
RECORD_TYPE in addition to ARRAY_TYPE.

gcc/testsuite/ChangeLog:

PR tree-optimization/85753
* gcc.dg/Wrestrict-10.c: Adjust.
* gcc.dg/Wrestrict-16.c: New test.

OK.
jeff


It occurred to me that it doesn't matter if the type is an array,
struct, or a basic type like int.  They all should be treated
the same so after retesting I ended up committing a slightly
simplified version of the original: r260280.

Martin