Re: [PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP
I have some old numbers from late April. VRP vs ranger was more difficult to compare than evrp, since the gimple is different (ASSERT_EXPRs). What I did was run a late evrp pass before each VRP pass and compared branches that were folded, for an estimate. On it's own VRP1 could fold 5482 branches versus 6338 for a late evrp pass running before VRP1, so 15% more. If late evrp runs before VRP1, VRP1 can find something late evrp can't in 77 out of 388 .ii files distilled from a bootstrap. So, VRP1 can fold a branch in 1 out of 5 files. For VRP2, the numbers are 376 branches on its own versus 467 for late evrp2. That's 24% more branches for late evrp2. VRP2 can find something late evrp2 can't in 199 out of 388 files, so one in every 2 files. I didn't dig deep into the VRP[12] foldable branches. Some of them looked like duplicates, and others looked like stuff we could get with minor tweaks with range-ops. But I really don't have hard data here. And yes, I agree, we should re-run these stats when the dust settles in the next week or two. Aldy
Re: [PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP
On 6/1/21 6:01 AM, Richard Biener wrote: On Mon, 31 May 2021, Andrew MacLeod wrote: On 5/28/21 11:25 AM, Richard Biener wrote: This makes sure to perform final value replacement of constants when we also are sure to propagate those, like in VRP. This avoids spurious diagnostics when doing so only from within SCCP which can leave unreachable loops in the IL triggering bogus diagnostics. The choice is VRP because it already defers to SCEV for PHI nodes so the following makes it analyze final values for loop exit PHIs. To make that work it arranges for loop-closed SSA to be built only after inserting ASSERT_EXPRs. Bootstrapped and tested on x86_64-unknown-linux-gnu. It does FAIL: gcc.dg/ubsan/pr94423.c -O2 (internal compiler error) where VRP somehow propagates abnormals in a bogus way. OK, so I analyzed this some more and it results from the hunk moving loop-closed SSA construction after ASSERT_EXPR insertion in execute_vrp. The motivation for this is that we end up splitting the loop exit edge when inserting the ASSERT_EXPR, creating non-loop-closed SSA and thus fail to pick up the final value. Now with swapping insert and loop-closed SSA build we get LC SSA PHIs on an abnormal loop exit in the above testcase which messes up assert expr removal which does /* Propagate the RHS into every use of the LHS. For SSA names also propagate abnormals as it merely restores the original IL in this case (an replace_uses_by would assert). */ in remove_range_assertions, explicitely ignoring constraints around abnormals. But since LC SSA PHIs remain we fail IL verification. Note the LC SSA PHI is only required because we insert a SSA def via the ASSERT_EXPR in the loop body. We can fix up the IL detail by marking the ASSERT_EXPR source appropriately via diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 450926d5f9b..705e2489eb1 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -3809,6 +3809,8 @@ vrp_asserts::remove_range_assertions () FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) FOR_EACH_IMM_USE_ON_STMT (use_p, iter) SET_USE (use_p, var); + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)) + SSA_NAME_OCCURS_IN_ABNORMAL_PHI (var) = 1; } else replace_uses_by (lhs, var); but we also never get rid of a SSA_NAME_OCCURS_IN_ABNORMAL_PHI marking. One option would be to keep the order as-is but fixup assert expr insertion to update/honor loop-closed SSA. But then - how far are you with removing all this ASSERT_EXPR stuff? Thanks, Richard. I hope we can replace the VRP pass as well this release.. which will then kill off all the need for the assert expr stuff. Aldy did some preliminary comparisons with VRP on our branch about a month ago... running a ranger EVRP pass right before vrp1 and vrp2 then seeing what VRP got that we missed. and I believe we were doing pretty well, but we havent had a chance to look at it in detail to see if there is anything major we miss. Maybe he can comment on where we were. Once I get the rest of the branch into trunk, we'll take another stab at the comparison because what is going into trunk is more consistent than what was on the branch. Right now I only have 3 remaining bits to bring over .. relations, application of equivalences, and recomputations. With any luck I'll get them in this week. we can then do another run by of how we compare to VRP. Andrew
Re: [PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP
On Mon, 31 May 2021, Andrew MacLeod wrote: > On 5/28/21 11:25 AM, Richard Biener wrote: > > This makes sure to perform final value replacement of constants > > when we also are sure to propagate those, like in VRP. This avoids > > spurious diagnostics when doing so only from within SCCP which can > > leave unreachable loops in the IL triggering bogus diagnostics. > > > > The choice is VRP because it already defers to SCEV for PHI nodes > > so the following makes it analyze final values for loop exit PHIs. > > To make that work it arranges for loop-closed SSA to be built only > > after inserting ASSERT_EXPRs. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > It does > > > > FAIL: gcc.dg/ubsan/pr94423.c -O2 (internal compiler error) > > > > where VRP somehow propagates abnormals in a bogus way. OK, so I analyzed this some more and it results from the hunk moving loop-closed SSA construction after ASSERT_EXPR insertion in execute_vrp. The motivation for this is that we end up splitting the loop exit edge when inserting the ASSERT_EXPR, creating non-loop-closed SSA and thus fail to pick up the final value. Now with swapping insert and loop-closed SSA build we get LC SSA PHIs on an abnormal loop exit in the above testcase which messes up assert expr removal which does /* Propagate the RHS into every use of the LHS. For SSA names also propagate abnormals as it merely restores the original IL in this case (an replace_uses_by would assert). */ in remove_range_assertions, explicitely ignoring constraints around abnormals. But since LC SSA PHIs remain we fail IL verification. Note the LC SSA PHI is only required because we insert a SSA def via the ASSERT_EXPR in the loop body. We can fix up the IL detail by marking the ASSERT_EXPR source appropriately via diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 450926d5f9b..705e2489eb1 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -3809,6 +3809,8 @@ vrp_asserts::remove_range_assertions () FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) FOR_EACH_IMM_USE_ON_STMT (use_p, iter) SET_USE (use_p, var); + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)) + SSA_NAME_OCCURS_IN_ABNORMAL_PHI (var) = 1; } else replace_uses_by (lhs, var); but we also never get rid of a SSA_NAME_OCCURS_IN_ABNORMAL_PHI marking. One option would be to keep the order as-is but fixup assert expr insertion to update/honor loop-closed SSA. But then - how far are you with removing all this ASSERT_EXPR stuff? Thanks, Richard. > > It also > > needs adjustment for a couple of fails like > > > > FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER > > LOOP VECTORIZED." 1 > > > > where the testcases do -fno-tree-scev-cprop but that no longer has > > the desired effect then (might just guard the final value replacement > > in VRP with this). > > > > Any comments? I probably can plug final_value_at_exit elsewhere > > than VRP (to avoid the issues with asserts) but it somewhat felt > > naturally because that already uses SCEV. > > l think its OK. I'll keep track this change as we may need to incorporate the > changes into fold_using_range::range_of_phi() when we move towards being on > par with VRP, but the testsuite will show a difference if we miss it then > anyway. > > > > Thanks, > > Richard. > > > > 2021-05-28 Richard Biener > > > > PR tree-optimization/100801 > > * tree-scalar-evolution.h (final_value_at_exit): Declare. > > * tree-scalar-evolution.c (final_value_at_exit): New API, > > split out from ... > > (final_value_replacement_loop): ... here. > > * tree-vrp.c (execute_vrp): Build loop-closed SSA only > > after inserting assert expressions. > > (vrp_asserts::process_assert_insertions_for): Avoid inserting > > asserts for RESULT_DECLs, when building loop-closed SSA > > after assert insertion we're not able to propagate out all > > PHIs and thus trigger IL validation. > > * vr-values.c (vr_values::extract_range_from_phi_node): > > For loop exit PHIs also perform final value analysis using > > SCEV. > > > > * gcc.target/i386/pr100801.c: New testcase. > > --- > > gcc/testsuite/gcc.target/i386/pr100801.c | 29 > > gcc/tree-scalar-evolution.c | 28 +++ > > gcc/tree-scalar-evolution.h | 1 + > > gcc/tree-vrp.c | 12 -- > > gcc/vr-values.c | 23 +++ > > 5 files changed, 77 insertions(+), 16 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100801.c > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr100801.c > > b/gcc/testsuite/gcc.target/i386/pr100801.c > > new file mode 100644 > > index 000..c58f9be6898 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr100801.c > > @@ -0,0 +1,29
Re: [PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP
On 5/28/21 11:25 AM, Richard Biener wrote: This makes sure to perform final value replacement of constants when we also are sure to propagate those, like in VRP. This avoids spurious diagnostics when doing so only from within SCCP which can leave unreachable loops in the IL triggering bogus diagnostics. The choice is VRP because it already defers to SCEV for PHI nodes so the following makes it analyze final values for loop exit PHIs. To make that work it arranges for loop-closed SSA to be built only after inserting ASSERT_EXPRs. Bootstrapped and tested on x86_64-unknown-linux-gnu. It does FAIL: gcc.dg/ubsan/pr94423.c -O2 (internal compiler error) where VRP somehow propagates abnormals in a bogus way. It also needs adjustment for a couple of fails like FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 where the testcases do -fno-tree-scev-cprop but that no longer has the desired effect then (might just guard the final value replacement in VRP with this). Any comments? I probably can plug final_value_at_exit elsewhere than VRP (to avoid the issues with asserts) but it somewhat felt naturally because that already uses SCEV. l think its OK. I'll keep track this change as we may need to incorporate the changes into fold_using_range::range_of_phi() when we move towards being on par with VRP, but the testsuite will show a difference if we miss it then anyway. Thanks, Richard. 2021-05-28 Richard Biener PR tree-optimization/100801 * tree-scalar-evolution.h (final_value_at_exit): Declare. * tree-scalar-evolution.c (final_value_at_exit): New API, split out from ... (final_value_replacement_loop): ... here. * tree-vrp.c (execute_vrp): Build loop-closed SSA only after inserting assert expressions. (vrp_asserts::process_assert_insertions_for): Avoid inserting asserts for RESULT_DECLs, when building loop-closed SSA after assert insertion we're not able to propagate out all PHIs and thus trigger IL validation. * vr-values.c (vr_values::extract_range_from_phi_node): For loop exit PHIs also perform final value analysis using SCEV. * gcc.target/i386/pr100801.c: New testcase. --- gcc/testsuite/gcc.target/i386/pr100801.c | 29 gcc/tree-scalar-evolution.c | 28 +++ gcc/tree-scalar-evolution.h | 1 + gcc/tree-vrp.c | 12 -- gcc/vr-values.c | 23 +++ 5 files changed, 77 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr100801.c diff --git a/gcc/testsuite/gcc.target/i386/pr100801.c b/gcc/testsuite/gcc.target/i386/pr100801.c new file mode 100644 index 000..c58f9be6898 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100801.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +#include + +void copy_32_unaligned(unsigned int* dest, const unsigned int* src, size_t count) +{ + // invariant/nop + __m128i shufmask = _mm_set_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); + + size_t i; + for (i = 0; i + 4 <= count; i += 4) { +__m128i input = _mm_loadu_si128((const __m128i*)([i])); +__m128i output = input; +// no warning without the shuffle: +output = _mm_shuffle_epi8(input, shufmask); +_mm_storeu_si128((__m128i*)([i]), output); + } + for (; i < count; ++i) { // handle residual elements +dest[i] = src[i]; /* { dg-bogus "invokes undefined" } */ + } +} + +void test(unsigned int* buf1, unsigned int* buf2) +{ + copy_32_unaligned(buf2, buf1, + // multiples of 4 and greater or equal then 12 trigger it: + 12); +} diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index b22d49a0ab6..6917b8c9dab 100644 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -3491,6 +3491,23 @@ expression_expensive_p (tree expr) || expanded_size > cache.elements ()); } +/* Compute the final value of DEF, a DEF inside the loop E exits from. */ + +tree +final_value_at_exit (tree def, edge e, bool *folded_casts) +{ + class loop *loop = e->src->loop_father; + gcc_assert (loop_exit_edge_p (loop, e)); + class loop *ex_loop += superloop_at_depth (loop, loop_depth (e->dest->loop_father) + 1); + tree ev = analyze_scalar_evolution_in_loop (ex_loop, loop, def, folded_casts); + ev = compute_overall_effect_of_inner_loop (ex_loop, ev); + if (!tree_does_not_contain_chrecs (def) + || chrec_contains_symbols_defined_in_loop (ev, ex_loop->num)) +ev = chrec_dont_know; + return ev; +} + /* Do final value replacement for LOOP, return true if we did anything. */ bool @@ -3513,10 +3530,6 @@ final_value_replacement_loop (class loop *loop) /* Set stmt insertion pointer. All stmts are inserted before this
[PATCH][RFC] tree-optimization/100801 - perform final value replacement from VRP
This makes sure to perform final value replacement of constants when we also are sure to propagate those, like in VRP. This avoids spurious diagnostics when doing so only from within SCCP which can leave unreachable loops in the IL triggering bogus diagnostics. The choice is VRP because it already defers to SCEV for PHI nodes so the following makes it analyze final values for loop exit PHIs. To make that work it arranges for loop-closed SSA to be built only after inserting ASSERT_EXPRs. Bootstrapped and tested on x86_64-unknown-linux-gnu. It does FAIL: gcc.dg/ubsan/pr94423.c -O2 (internal compiler error) where VRP somehow propagates abnormals in a bogus way. It also needs adjustment for a couple of fails like FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 where the testcases do -fno-tree-scev-cprop but that no longer has the desired effect then (might just guard the final value replacement in VRP with this). Any comments? I probably can plug final_value_at_exit elsewhere than VRP (to avoid the issues with asserts) but it somewhat felt naturally because that already uses SCEV. Thanks, Richard. 2021-05-28 Richard Biener PR tree-optimization/100801 * tree-scalar-evolution.h (final_value_at_exit): Declare. * tree-scalar-evolution.c (final_value_at_exit): New API, split out from ... (final_value_replacement_loop): ... here. * tree-vrp.c (execute_vrp): Build loop-closed SSA only after inserting assert expressions. (vrp_asserts::process_assert_insertions_for): Avoid inserting asserts for RESULT_DECLs, when building loop-closed SSA after assert insertion we're not able to propagate out all PHIs and thus trigger IL validation. * vr-values.c (vr_values::extract_range_from_phi_node): For loop exit PHIs also perform final value analysis using SCEV. * gcc.target/i386/pr100801.c: New testcase. --- gcc/testsuite/gcc.target/i386/pr100801.c | 29 gcc/tree-scalar-evolution.c | 28 +++ gcc/tree-scalar-evolution.h | 1 + gcc/tree-vrp.c | 12 -- gcc/vr-values.c | 23 +++ 5 files changed, 77 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr100801.c diff --git a/gcc/testsuite/gcc.target/i386/pr100801.c b/gcc/testsuite/gcc.target/i386/pr100801.c new file mode 100644 index 000..c58f9be6898 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100801.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +#include + +void copy_32_unaligned(unsigned int* dest, const unsigned int* src, size_t count) +{ + // invariant/nop + __m128i shufmask = _mm_set_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); + + size_t i; + for (i = 0; i + 4 <= count; i += 4) { +__m128i input = _mm_loadu_si128((const __m128i*)([i])); +__m128i output = input; +// no warning without the shuffle: +output = _mm_shuffle_epi8(input, shufmask); +_mm_storeu_si128((__m128i*)([i]), output); + } + for (; i < count; ++i) { // handle residual elements +dest[i] = src[i]; /* { dg-bogus "invokes undefined" } */ + } +} + +void test(unsigned int* buf1, unsigned int* buf2) +{ + copy_32_unaligned(buf2, buf1, + // multiples of 4 and greater or equal then 12 trigger it: + 12); +} diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index b22d49a0ab6..6917b8c9dab 100644 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -3491,6 +3491,23 @@ expression_expensive_p (tree expr) || expanded_size > cache.elements ()); } +/* Compute the final value of DEF, a DEF inside the loop E exits from. */ + +tree +final_value_at_exit (tree def, edge e, bool *folded_casts) +{ + class loop *loop = e->src->loop_father; + gcc_assert (loop_exit_edge_p (loop, e)); + class loop *ex_loop += superloop_at_depth (loop, loop_depth (e->dest->loop_father) + 1); + tree ev = analyze_scalar_evolution_in_loop (ex_loop, loop, def, folded_casts); + ev = compute_overall_effect_of_inner_loop (ex_loop, ev); + if (!tree_does_not_contain_chrecs (def) + || chrec_contains_symbols_defined_in_loop (ev, ex_loop->num)) +ev = chrec_dont_know; + return ev; +} + /* Do final value replacement for LOOP, return true if we did anything. */ bool @@ -3513,10 +3530,6 @@ final_value_replacement_loop (class loop *loop) /* Set stmt insertion pointer. All stmts are inserted before this point. */ gimple_stmt_iterator gsi = gsi_after_labels (exit->dest); - class loop *ex_loop -= superloop_at_depth (loop, - loop_depth (exit->dest->loop_father) + 1); - bool any = false; gphi_iterator psi; for (psi = gsi_start_phis (exit->dest); !gsi_end_p (psi); )