Re: [PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP

2021-06-01 Thread Aldy Hernandez via Gcc-patches
I have some old numbers from late April.

VRP vs ranger was more difficult to compare than evrp, since the
gimple is different (ASSERT_EXPRs).  What I did was run a late evrp
pass before each VRP pass and compared branches that were folded, for
an estimate.

On it's own VRP1 could fold 5482 branches versus 6338 for a late evrp
pass running before VRP1, so 15% more.  If late evrp runs before VRP1,
VRP1 can find something late evrp can't in 77 out of 388 .ii files
distilled from a bootstrap.  So, VRP1 can fold a branch in 1 out of 5
files.

For VRP2, the numbers are 376 branches on its own versus 467 for late
evrp2.  That's 24% more branches for late evrp2.  VRP2 can find
something late evrp2 can't in 199 out of 388 files, so one in every 2
files.

I didn't dig deep into the VRP[12] foldable branches.  Some of them
looked like duplicates, and others looked like stuff we could get with
minor tweaks with range-ops.  But I really don't have hard data here.

And yes, I agree, we should re-run these stats when the dust settles
in the next week or two.

Aldy



Re: [PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP

2021-06-01 Thread Andrew MacLeod via Gcc-patches

On 6/1/21 6:01 AM, Richard Biener wrote:

On Mon, 31 May 2021, Andrew MacLeod wrote:


On 5/28/21 11:25 AM, Richard Biener wrote:

This makes sure to perform final value replacement of constants
when we also are sure to propagate those, like in VRP.  This avoids
spurious diagnostics when doing so only from within SCCP which can
leave unreachable loops in the IL triggering bogus diagnostics.

The choice is VRP because it already defers to SCEV for PHI nodes
so the following makes it analyze final values for loop exit PHIs.
To make that work it arranges for loop-closed SSA to be built only
after inserting ASSERT_EXPRs.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

It does

FAIL: gcc.dg/ubsan/pr94423.c   -O2  (internal compiler error)

where VRP somehow propagates abnormals in a bogus way.

OK, so I analyzed this some more and it results from the hunk moving
loop-closed SSA construction after ASSERT_EXPR insertion in
execute_vrp.  The motivation for this is that we end up splitting
the loop exit edge when inserting the ASSERT_EXPR, creating
non-loop-closed SSA and thus fail to pick up the final value.

Now with swapping insert and loop-closed SSA build we get LC
SSA PHIs on an abnormal loop exit in the above testcase which
messes up assert expr removal which does

 /* Propagate the RHS into every use of the LHS.  For SSA names
also propagate abnormals as it merely restores the original
IL in this case (an replace_uses_by would assert).  */

in remove_range_assertions, explicitely ignoring constraints around
abnormals.  But since LC SSA PHIs remain we fail IL verification.
Note the LC SSA PHI is only required because we insert a SSA def
via the ASSERT_EXPR in the loop body.  We can fix up the IL detail
by marking the ASSERT_EXPR source appropriately via

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 450926d5f9b..705e2489eb1 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3809,6 +3809,8 @@ vrp_asserts::remove_range_assertions ()
 FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
   FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
 SET_USE (use_p, var);
+   if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+ SSA_NAME_OCCURS_IN_ABNORMAL_PHI (var) = 1;
   }
 else
   replace_uses_by (lhs, var);

but we also never get rid of a SSA_NAME_OCCURS_IN_ABNORMAL_PHI marking.

One option would be to keep the order as-is but fixup assert expr
insertion to update/honor loop-closed SSA.  But then - how far are you
with removing all this ASSERT_EXPR stuff?

Thanks,
Richard.

I hope we can replace the VRP pass as well this release.. which will 
then kill off all the need for the assert expr stuff.  Aldy did some 
preliminary comparisons with VRP on our branch about a month ago... 
running a ranger EVRP pass right before vrp1 and vrp2 then seeing what 
VRP got that we missed.   and I believe we were doing pretty well, but 
we havent had a chance to look at it in detail to see if there is 
anything major we miss.  Maybe he can comment on where we were.  Once I 
get the rest of the branch into trunk, we'll take another stab at the 
comparison because what is going into trunk is more consistent than what 
was on the branch.


Right now I only have 3 remaining bits to bring over .. relations, 
application of equivalences, and recomputations.  With any luck I'll get 
them in this week. we can then do another run by of how we compare to VRP.


Andrew




Re: [PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP

2021-06-01 Thread Richard Biener
On Mon, 31 May 2021, Andrew MacLeod wrote:

> On 5/28/21 11:25 AM, Richard Biener wrote:
> > This makes sure to perform final value replacement of constants
> > when we also are sure to propagate those, like in VRP.  This avoids
> > spurious diagnostics when doing so only from within SCCP which can
> > leave unreachable loops in the IL triggering bogus diagnostics.
> >
> > The choice is VRP because it already defers to SCEV for PHI nodes
> > so the following makes it analyze final values for loop exit PHIs.
> > To make that work it arranges for loop-closed SSA to be built only
> > after inserting ASSERT_EXPRs.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > It does
> >
> > FAIL: gcc.dg/ubsan/pr94423.c   -O2  (internal compiler error)
> >
> > where VRP somehow propagates abnormals in a bogus way.

OK, so I analyzed this some more and it results from the hunk moving
loop-closed SSA construction after ASSERT_EXPR insertion in
execute_vrp.  The motivation for this is that we end up splitting
the loop exit edge when inserting the ASSERT_EXPR, creating
non-loop-closed SSA and thus fail to pick up the final value.

Now with swapping insert and loop-closed SSA build we get LC
SSA PHIs on an abnormal loop exit in the above testcase which
messes up assert expr removal which does

/* Propagate the RHS into every use of the LHS.  For SSA names
   also propagate abnormals as it merely restores the original
   IL in this case (an replace_uses_by would assert).  */

in remove_range_assertions, explicitely ignoring constraints around
abnormals.  But since LC SSA PHIs remain we fail IL verification.
Note the LC SSA PHI is only required because we insert a SSA def
via the ASSERT_EXPR in the loop body.  We can fix up the IL detail
by marking the ASSERT_EXPR source appropriately via

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 450926d5f9b..705e2489eb1 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3809,6 +3809,8 @@ vrp_asserts::remove_range_assertions ()
FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
  FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
SET_USE (use_p, var);
+   if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+ SSA_NAME_OCCURS_IN_ABNORMAL_PHI (var) = 1;
  }
else
  replace_uses_by (lhs, var);

but we also never get rid of a SSA_NAME_OCCURS_IN_ABNORMAL_PHI marking.

One option would be to keep the order as-is but fixup assert expr
insertion to update/honor loop-closed SSA.  But then - how far are you
with removing all this ASSERT_EXPR stuff?

Thanks,
Richard.


> >  It also
> > needs adjustment for a couple of fails like
> >
> > FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER
> > LOOP VECTORIZED." 1
> >
> > where the testcases do -fno-tree-scev-cprop but that no longer has
> > the desired effect then (might just guard the final value replacement
> > in VRP with this).
> >
> > Any comments?  I probably can plug final_value_at_exit elsewhere
> > than VRP (to avoid the issues with asserts) but it somewhat felt
> > naturally because that already uses SCEV.
> 
> l think its OK.  I'll keep track this change as we may need to incorporate the
> changes into fold_using_range::range_of_phi() when we move towards being on
> par with VRP,  but the testsuite will show a difference if we miss it then
> anyway.
> 
> 
> > Thanks,
> > Richard.
> >
> > 2021-05-28  Richard Biener  
> >
> >  PR tree-optimization/100801
> >  * tree-scalar-evolution.h (final_value_at_exit): Declare.
> >  * tree-scalar-evolution.c (final_value_at_exit): New API,
> >  split out from ...
> >  (final_value_replacement_loop): ... here.
> >  * tree-vrp.c (execute_vrp): Build loop-closed SSA only
> >  after inserting assert expressions.
> >  (vrp_asserts::process_assert_insertions_for): Avoid inserting
> >  asserts for RESULT_DECLs, when building loop-closed SSA
> >  after assert insertion we're not able to propagate out all
> >  PHIs and thus trigger IL validation.
> >  * vr-values.c (vr_values::extract_range_from_phi_node):
> >  For loop exit PHIs also perform final value analysis using
> >  SCEV.
> >
> > * gcc.target/i386/pr100801.c: New testcase.
> > ---
> >   gcc/testsuite/gcc.target/i386/pr100801.c | 29 
> >   gcc/tree-scalar-evolution.c  | 28 +++
> >   gcc/tree-scalar-evolution.h  |  1 +
> >   gcc/tree-vrp.c   | 12 --
> >   gcc/vr-values.c  | 23 +++
> >   5 files changed, 77 insertions(+), 16 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/i386/pr100801.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100801.c
> > b/gcc/testsuite/gcc.target/i386/pr100801.c
> > new file mode 100644
> > index 000..c58f9be6898
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr100801.c
> > @@ -0,0 +1,29 

Re: [PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP

2021-05-31 Thread Andrew MacLeod via Gcc-patches

On 5/28/21 11:25 AM, Richard Biener wrote:

This makes sure to perform final value replacement of constants
when we also are sure to propagate those, like in VRP.  This avoids
spurious diagnostics when doing so only from within SCCP which can
leave unreachable loops in the IL triggering bogus diagnostics.

The choice is VRP because it already defers to SCEV for PHI nodes
so the following makes it analyze final values for loop exit PHIs.
To make that work it arranges for loop-closed SSA to be built only
after inserting ASSERT_EXPRs.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

It does

FAIL: gcc.dg/ubsan/pr94423.c   -O2  (internal compiler error)

where VRP somehow propagates abnormals in a bogus way.  It also
needs adjustment for a couple of fails like

FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER
LOOP VECTORIZED." 1

where the testcases do -fno-tree-scev-cprop but that no longer has
the desired effect then (might just guard the final value replacement
in VRP with this).

Any comments?  I probably can plug final_value_at_exit elsewhere
than VRP (to avoid the issues with asserts) but it somewhat felt
naturally because that already uses SCEV.


l think its OK.  I'll keep track this change as we may need to 
incorporate the changes into fold_using_range::range_of_phi() when we 
move towards being on par with VRP,  but the testsuite will show a 
difference if we miss it then anyway.




Thanks,
Richard.

2021-05-28  Richard Biener  

PR tree-optimization/100801
* tree-scalar-evolution.h (final_value_at_exit): Declare.
* tree-scalar-evolution.c (final_value_at_exit): New API,
split out from ...
(final_value_replacement_loop): ... here.
* tree-vrp.c (execute_vrp): Build loop-closed SSA only
after inserting assert expressions.
(vrp_asserts::process_assert_insertions_for): Avoid inserting
asserts for RESULT_DECLs, when building loop-closed SSA
after assert insertion we're not able to propagate out all
PHIs and thus trigger IL validation.
* vr-values.c (vr_values::extract_range_from_phi_node):
For loop exit PHIs also perform final value analysis using
SCEV.

* gcc.target/i386/pr100801.c: New testcase.
---
  gcc/testsuite/gcc.target/i386/pr100801.c | 29 
  gcc/tree-scalar-evolution.c  | 28 +++
  gcc/tree-scalar-evolution.h  |  1 +
  gcc/tree-vrp.c   | 12 --
  gcc/vr-values.c  | 23 +++
  5 files changed, 77 insertions(+), 16 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/i386/pr100801.c

diff --git a/gcc/testsuite/gcc.target/i386/pr100801.c 
b/gcc/testsuite/gcc.target/i386/pr100801.c
new file mode 100644
index 000..c58f9be6898
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100801.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+#include 
+
+void copy_32_unaligned(unsigned int* dest, const unsigned int* src, size_t 
count)
+{
+  // invariant/nop
+  __m128i shufmask =  _mm_set_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 
3, 2, 1, 0);
+
+  size_t i;
+  for (i = 0; i + 4 <= count; i += 4) {
+__m128i input = _mm_loadu_si128((const __m128i*)([i]));
+__m128i output = input;
+// no warning without the shuffle:
+output = _mm_shuffle_epi8(input, shufmask);
+_mm_storeu_si128((__m128i*)([i]), output);
+  }
+  for (; i < count; ++i) {  // handle residual elements
+dest[i] = src[i]; /* { dg-bogus "invokes undefined" } */
+  }
+}
+
+void test(unsigned int* buf1, unsigned int* buf2)
+{
+  copy_32_unaligned(buf2, buf1,
+   // multiples of 4 and greater or equal then 12 trigger it:
+   12);
+}
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index b22d49a0ab6..6917b8c9dab 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -3491,6 +3491,23 @@ expression_expensive_p (tree expr)
  || expanded_size > cache.elements ());
  }
  
+/* Compute the final value of DEF, a DEF inside the loop E exits from.  */

+
+tree
+final_value_at_exit (tree def, edge e, bool *folded_casts)
+{
+  class loop *loop = e->src->loop_father;
+  gcc_assert (loop_exit_edge_p (loop, e));
+  class loop *ex_loop
+= superloop_at_depth (loop, loop_depth (e->dest->loop_father) + 1);
+  tree ev = analyze_scalar_evolution_in_loop (ex_loop, loop, def, 
folded_casts);
+  ev = compute_overall_effect_of_inner_loop (ex_loop, ev);
+  if (!tree_does_not_contain_chrecs (def)
+  || chrec_contains_symbols_defined_in_loop (ev, ex_loop->num))
+ev = chrec_dont_know;
+  return ev;
+}
+
  /* Do final value replacement for LOOP, return true if we did anything.  */
  
  bool

@@ -3513,10 +3530,6 @@ final_value_replacement_loop (class loop *loop)
/* Set stmt insertion pointer.  All stmts are inserted before this 

[PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP

2021-05-28 Thread Richard Biener
This makes sure to perform final value replacement of constants
when we also are sure to propagate those, like in VRP.  This avoids
spurious diagnostics when doing so only from within SCCP which can
leave unreachable loops in the IL triggering bogus diagnostics.

The choice is VRP because it already defers to SCEV for PHI nodes
so the following makes it analyze final values for loop exit PHIs.
To make that work it arranges for loop-closed SSA to be built only
after inserting ASSERT_EXPRs.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

It does

FAIL: gcc.dg/ubsan/pr94423.c   -O2  (internal compiler error)

where VRP somehow propagates abnormals in a bogus way.  It also
needs adjustment for a couple of fails like

FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER 
LOOP VECTORIZED." 1

where the testcases do -fno-tree-scev-cprop but that no longer has
the desired effect then (might just guard the final value replacement
in VRP with this).

Any comments?  I probably can plug final_value_at_exit elsewhere
than VRP (to avoid the issues with asserts) but it somewhat felt
naturally because that already uses SCEV.

Thanks,
Richard.

2021-05-28  Richard Biener  

PR tree-optimization/100801
* tree-scalar-evolution.h (final_value_at_exit): Declare.
* tree-scalar-evolution.c (final_value_at_exit): New API,
split out from ...
(final_value_replacement_loop): ... here.
* tree-vrp.c (execute_vrp): Build loop-closed SSA only
after inserting assert expressions.
(vrp_asserts::process_assert_insertions_for): Avoid inserting
asserts for RESULT_DECLs, when building loop-closed SSA
after assert insertion we're not able to propagate out all
PHIs and thus trigger IL validation.
* vr-values.c (vr_values::extract_range_from_phi_node):
For loop exit PHIs also perform final value analysis using
SCEV.

* gcc.target/i386/pr100801.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr100801.c | 29 
 gcc/tree-scalar-evolution.c  | 28 +++
 gcc/tree-scalar-evolution.h  |  1 +
 gcc/tree-vrp.c   | 12 --
 gcc/vr-values.c  | 23 +++
 5 files changed, 77 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100801.c

diff --git a/gcc/testsuite/gcc.target/i386/pr100801.c 
b/gcc/testsuite/gcc.target/i386/pr100801.c
new file mode 100644
index 000..c58f9be6898
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100801.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+#include 
+
+void copy_32_unaligned(unsigned int* dest, const unsigned int* src, size_t 
count)
+{
+  // invariant/nop
+  __m128i shufmask =  _mm_set_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 
3, 2, 1, 0);
+
+  size_t i;
+  for (i = 0; i + 4 <= count; i += 4) {
+__m128i input = _mm_loadu_si128((const __m128i*)([i]));
+__m128i output = input;
+// no warning without the shuffle:
+output = _mm_shuffle_epi8(input, shufmask);
+_mm_storeu_si128((__m128i*)([i]), output);
+  }
+  for (; i < count; ++i) {  // handle residual elements
+dest[i] = src[i]; /* { dg-bogus "invokes undefined" } */
+  }
+}
+
+void test(unsigned int* buf1, unsigned int* buf2)
+{
+  copy_32_unaligned(buf2, buf1,
+   // multiples of 4 and greater or equal then 12 trigger it:
+   12);
+}
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index b22d49a0ab6..6917b8c9dab 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -3491,6 +3491,23 @@ expression_expensive_p (tree expr)
  || expanded_size > cache.elements ());
 }
 
+/* Compute the final value of DEF, a DEF inside the loop E exits from.  */
+
+tree
+final_value_at_exit (tree def, edge e, bool *folded_casts)
+{
+  class loop *loop = e->src->loop_father;
+  gcc_assert (loop_exit_edge_p (loop, e));
+  class loop *ex_loop
+= superloop_at_depth (loop, loop_depth (e->dest->loop_father) + 1);
+  tree ev = analyze_scalar_evolution_in_loop (ex_loop, loop, def, 
folded_casts);
+  ev = compute_overall_effect_of_inner_loop (ex_loop, ev);
+  if (!tree_does_not_contain_chrecs (def)
+  || chrec_contains_symbols_defined_in_loop (ev, ex_loop->num))
+ev = chrec_dont_know;
+  return ev;
+}
+
 /* Do final value replacement for LOOP, return true if we did anything.  */
 
 bool
@@ -3513,10 +3530,6 @@ final_value_replacement_loop (class loop *loop)
   /* Set stmt insertion pointer.  All stmts are inserted before this point.  */
   gimple_stmt_iterator gsi = gsi_after_labels (exit->dest);
 
-  class loop *ex_loop
-= superloop_at_depth (loop,
- loop_depth (exit->dest->loop_father) + 1);
-
   bool any = false;
   gphi_iterator psi;
   for (psi = gsi_start_phis (exit->dest); !gsi_end_p (psi); )