Thanks, Richard! I'll follow up with these changes over the next day or two. Appreciate all the help!
Bill On Wed, 2016-11-16 at 16:08 +0100, Richard Biener wrote: > On Tue, Nov 15, 2016 at 9:03 PM, Bill Schmidt > <wschm...@linux.vnet.ibm.com> wrote: > > Hi, > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77848 identifies a situation > > where if-conversion causes degradation when the if-converted loop is not > > subsequently vectorized. The if-conversion pass does not have a cost > > model to avoid such degradations. However, it does have a capability to > > version the if-converted loop, so that the vectorizer can choose the > > if-converted version if vectorization occurs, or the unmodified version > > if vectorization does not occur. Currently versioning is only done under > > special circumstances. > > > > This patch does two things: It requires loop versioning whenever loop > > vectorization is enabled so that such degradations can't occur; and it > > extends loop versioning to outer loops when such loops are of the right > > form for outer loop vectorization. The latter is needed to avoid > > introducing degradations with versioning of inner loops, which disturbs > > the pattern that outer loop vectorization expects. > > > > This is an embarrassingly simple patch, given how much time I spent going > > down other paths. The most surprising thing is that versioning the outer > > loop doesn't require any additional handshaking with the vectorizer. It > > just works. I've verified this on some examples, and we end up with the > > correct vectorization and with the unused loop nest discarded. > > > > The one remaining problem with this bug is that it precludes SLP from > > seeing if-converted loops to work on. With this patch, if the vectorizer > > can't vectorize an if-converted loop, the original version survives. We > > have one test case that fails when that happens, because it expected to > > do SLP vectorization on the if-converted statements: > > > >> FAIL: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects > >> scan-tree-dump-times slp1 "basic block vectorized" 1 > >> FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times slp1 "basic block > >> vectorized" 1 > > > > Arguably, this shows a deficiency in SLP vectorization, since it won't > > see if-converted statements in non-loop code in any event. Eventually > > SLP should learn to handle these kinds of PHI statements itself. > > > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu, with only the > > specified regression. Is this ok for trunk? > > Thanks for working on this. Comments below. > > > Thanks, > > Bill > > > > > > [gcc] > > > > 2016-11-15 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > > > PR tree-optimization/77848 > > * tree-if-conv.c (version_loop_for_if_conversion): When versioning > > an outer loop, only save basic block aux information for the inner > > loop. > > (versionable_outer_loop_p): New function. > > (tree_if_conversion): Always version a loop when vectorization > > is enabled; version the outer loop instead of the inner one > > if the pattern will be recognized for outer-loop vectorization. > > > > [gcc/testsuite] > > > > 2016-11-15 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > > > PR tree-optimization/77848 > > * gfortran.dg/vect/pr78848.f: New test. > > > > > > Index: gcc/testsuite/gfortran.dg/vect/pr77848.f > > =================================================================== > > --- gcc/testsuite/gfortran.dg/vect/pr77848.f (revision 0) > > +++ gcc/testsuite/gfortran.dg/vect/pr77848.f (working copy) > > @@ -0,0 +1,24 @@ > > +! PR 77848: Verify versioning is on when vectorization fails > > +! { dg-do compile } > > +! { dg-options "-O3 -ffast-math -fdump-tree-ifcvt > > -fdump-tree-vect-details" } > > + > > + subroutine sub(x,a,n,m) > > + implicit none > > + real*8 x(*),a(*),atemp > > + integer i,j,k,m,n > > + real*8 s,t,u,v > > + do j=1,m > > + atemp=0.d0 > > + do i=1,n > > + if (abs(a(i)).gt.atemp) then > > + atemp=a(i) > > + k = i > > + end if > > + enddo > > + call dummy(atemp,k) > > + enddo > > + return > > + end > > + > > +! { dg-final { scan-tree-dump "LOOP_VECTORIZED" "ifcvt" } } > > +! { dg-final { scan-tree-dump "vectorized 0 loops in function" "vect" } } > > Index: gcc/tree-if-conv.c > > =================================================================== > > --- gcc/tree-if-conv.c (revision 242412) > > +++ gcc/tree-if-conv.c (working copy) > > @@ -2533,6 +2533,7 @@ version_loop_for_if_conversion (struct loop *loop) > > struct loop *new_loop; > > gimple *g; > > gimple_stmt_iterator gsi; > > + unsigned int save_length; > > > > g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, > > build_int_cst (integer_type_node, > > loop->num), > > @@ -2540,8 +2541,9 @@ version_loop_for_if_conversion (struct loop *loop) > > gimple_call_set_lhs (g, cond); > > > > /* Save BB->aux around loop_version as that uses the same field. */ > > - void **saved_preds = XALLOCAVEC (void *, loop->num_nodes); > > - for (unsigned i = 0; i < loop->num_nodes; i++) > > + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; > > + void **saved_preds = XALLOCAVEC (void *, save_length); > > + for (unsigned i = 0; i < save_length; i++) > > saved_preds[i] = ifc_bbs[i]->aux; > > > > initialize_original_copy_tables (); > > @@ -2550,7 +2552,7 @@ version_loop_for_if_conversion (struct loop *loop) > > REG_BR_PROB_BASE, true); > > free_original_copy_tables (); > > > > - for (unsigned i = 0; i < loop->num_nodes; i++) > > + for (unsigned i = 0; i < save_length; i++) > > ifc_bbs[i]->aux = saved_preds[i]; > > > > if (new_loop == NULL) > > @@ -2565,6 +2567,40 @@ version_loop_for_if_conversion (struct loop *loop) > > return true; > > } > > > > +/* Return true when LOOP satisfies the follow conditions that will > > + allow it to be recognized by the vectorizer for outer-loop > > + vectorization: > > + - The loop has exactly one inner loop. > > + - The loop has a single exit. > > + - The loop header has a single successor, which is the inner > > + loop header. > > + - The loop exit block has a single predecessor, which is the > > + inner loop's exit block. */ > > + > > +static bool > > +versionable_outer_loop_p (struct loop *loop) > > +{ > > + if (!loop->inner > > + || loop->inner->next > > + || !single_exit (loop) > > + || !single_succ_p (loop->header) > > + || single_succ (loop->header) != loop->inner->header) > > + return false; > > not sure if the single_exit test would cover it but please add > > || ! loop_outer (loop) > > as a test that 'loop' isn't the root node of the loop tree. > > > + > > + gcc_assert (single_pred_p (loop->latch)); > > This assert is redundant. > > > + basic_block outer_exit = single_pred (loop->latch); > > + gcc_assert (single_pred_p (loop->inner->latch)); > > likewise (single_pred performs them). > > > + basic_block inner_exit = single_pred (loop->inner->latch); > > + > > + if (!single_pred_p (outer_exit) || single_pred (outer_exit) != > > inner_exit) > > + return false; > > + > > + if (dump_file) > > + fprintf (dump_file, "Found versionable outer loop\n"); > > "Found vectorizable outer loop for versioning" > > > + > > + return true; > > +} > > + > > /* Performs splitting of critical edges. Skip splitting and return false > > if LOOP will not be converted because: > > > > @@ -2767,8 +2803,16 @@ tree_if_conversion (struct loop *loop) > > || loop->dont_vectorize)) > > goto cleanup; > > > > - if ((any_pred_load_store || any_complicated_phi) > > - && !version_loop_for_if_conversion (loop)) > > + /* Since we have no cost model, always version loops if vectorization > > + is enabled. Either version this loop, or if the pattern is right > > + for outer-loop vectorization, version the outer loop. In the > > + latter case we will still if-convert the original inner loop. */ > > + /* FIXME: When SLP vectorization can handle if-conversion on its own, > > + predicate all of if-conversion on flag_tree_loop_vectorize. */ > > + if ((any_pred_load_store || any_complicated_phi || > > flag_tree_loop_vectorize) > > I'd say given fun->has_force_vectorize_loops this should be > > if (flag_tree_loop_if_convert != 1 > > + && !version_loop_for_if_conversion > > + (versionable_outer_loop_p (loop_outer (loop)) > > + ? loop_outer (loop) : loop)) > > goto cleanup; > > and thus always version if the user didnt' specify -ftree-loop-if-convert > (-ftree-loop-if-convert-stores is dead, I forgot to remove uses). > > Can you as a first patch (after fixing the minor things above) commit > the patch w/o changing the condition under which we version > (but _do_ version the outer loop if possible?). This should be a strict > improvement enabling more outer loop vectorization. > > The 2nd patch can then fix the PR and change the condition. > > Thus, ok with the nits fixed and the condition unchanged. > > Can you re-test the 2nd part with my suggested changed predicate? > > Thanks, > Richard. > > > /* Now all statements are if-convertible. Combine all the basic > > >