This example which I wrote to test ifconversion, currently fails to if-convert or vectorize:

int foo ()
{
  for (int i = 0; i < 32 ; i++)
    {
      int m = (a[i] & i) ? 5 : 4;
      b[i] = a[i] * m;
    }
}

...because jump-threading in dom1 rearranged the loop into a form that neither if-conversion nor vectorization would attempt. Discussion at https://gcc.gnu.org/ml/gcc/2015-04/msg00343.html lead to the suggestion that I should rerun loop-header copying (an earlier attempt to fix ifconversion, https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01743.html, still did not enable vectorization.)

This patch does so (and makes slightly less conservative, to tackle the example above). I found I had to make this a separate pass, so that the phi nodes were cleaned up at the end of the pass before running tree_if_conversion. Also at this stage in the compiler (inside loop opts) it was not possible to run loop_optimizer_init+finalize, or other loop_optimizer data structures needed later would be deleted; hence, I have two nearly-but-not-quite-identical passes, the new "ch_vect" avoiding the init/finalize. I tried to tackle this with some C++ subclassing, which removes the duplication, but the result feels a little ugly; suggestions for any neater approach welcome.

This patch causes failure of the scan-tree-dump of dom2 in gcc.dg/ssa/pr21417.c. This looks for jump-threading to perform an optimization, but no longer finds the expected line in the log - as the loop-header-copying phase has already done an equivalent transformation *before* dom2. The final CFG is thus in the desired form, but I'm not sure how to determine this (scanning the CFG itself is very difficult, well beyond what we can do with regex, requiring looking at multiple lines and basic blocks). Can anyone advise? [The test issue can be worked around by preserving the old do_while_p logic for the first header-copying pass, and using the new logic only for the second, but this is more awkward inside the compiler, which feels wrong.]

Besides the new vect-ifcvt-11.c, the testsuite actually has a couple of other examples where this patch enables (undesired!) vectorization. I've dealt with these, but for the record: * gcc.dg/vect/slp-perm-7.c: the initialization loop in main, contained a check that input[i] < 200; this was already optimized out (because input[i] was set to i%256, where i<N with N #defined to 16), but that loop was not vectorized because: /work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: note: not vectorized: latch block not empty. /work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: note: bad loop form.

* gcc.dg/vect/vect-strided-a-u16-i4.c: the main1() function has three loops; the first (initialization) has an 'if (y) abort() /* Avoid vectorization. */'. However, the 'volatile int y = 0' this was meant to reference, is actually shadowed by a local non-volatile; the test is thus peeled off and absent from the body of the loop. The loop only avoided vectorization because of non-empty latch and bad loop form, as previous.

With this patch, both those loops now have good form, hence I have fixed both to check a global volatile to prevent these extraneous parts from being vectorized.

Tested with bootstrap + check-gcc on x86_64 and AArch64 (linux). As noted above, this causes a spurious PASS->FAIL of a scan-tree-dump test, which I'm unsure how to fix, but no other regressions.

gcc/ChangeLog:

        * tree-pass.h (make_pass_ch_vect): New.
        * passes.def: Add pass_ch_vect just before pass_if_conversion.

        * tree-ssa-loop-ch.c (do_while_loop_p): For single-exit loops,
        look for blocks with exit edge and code after them.
        (pass_data_ch_vect, class pass_ch_vect, make_pass_ch_vect): New.
        (class pass_ch): Extend pass_ch_vect.
        (pass_ch::execute): Move all but loop_optimizer_init/finalize to...
        (pass_ch_vect::execute): ...here.

        * tree-ssa-loop.c (pass_tree_loop_init::execute): Add flags
        LOOPS_HAVE_PREHEADERS and LOOPS_HAVE_SIMPLE_LATCHES.

gcc/testsuite/ChangeLog:

        * gcc.dg/vect/slp-perm-7.c (zero): New.
        (main): Test zero rather than input[i], to avoid vectorization.
        * gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of x,y,z,w.
        of unsigned
        * gcc.dg/vect/vect-ifcvt-11.c: New.

diff --git a/gcc/passes.def b/gcc/passes.def
index 1d598b2..87cfe2a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -247,6 +247,7 @@ along with GCC; see the file COPYING3.  If not see
 	  PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
 	      NEXT_PASS (pass_expand_omp_ssa);
 	  POP_INSERT_PASSES ()
+	  NEXT_PASS (pass_ch_vect);
 	  NEXT_PASS (pass_if_conversion);
 	  /* pass_vectorize must immediately follow pass_if_conversion.
 	     Please do not add any other passes in between.  */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
index 6291096..e0ec55a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
@@ -20,6 +20,8 @@
 
 #define N 16
 
+volatile int zero = 0;
+
 /* SLP with load permutation and loop-based vectorization.  */
 void foo (int *__restrict__ pInput, int *__restrict__ pOutput,
           int *__restrict__ pInput2, int *__restrict__ pOutput2)
@@ -57,7 +59,7 @@ int main (int argc, const char* argv[])
       input2[i] = i%256;
       output[i] = 0;
       output2[i] = 0;
-      if (input[i] > 200)
+      if (zero) /* Avoid vectorization.  */
         abort ();
     }
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
new file mode 100644
index 0000000..53b139d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
@@ -0,0 +1,37 @@
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+
+#define N 16
+
+extern void abort(void);
+
+int A[N] = {36, 39, 42, 45, 43, 32, 21, 12, 23, 34, 45, 56, 67, 78, 81, 11};
+int B[N] = {144,195,210,225,172,128,105,60, 92, 136,225,280,268,390,324,55};
+
+__attribute__((noinline))
+void foo ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      int m = (A[i] & i) ? 5 : 4;
+      A[i] = A[i] * m;
+    }
+}
+
+int main ()
+{
+
+  check_vect ();
+  foo ();
+  /* check results:  */
+  for (int i = 0; i < N; i++)
+    if (A[i] != B[i])
+      abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
index 68114a6..0656fb7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
@@ -21,7 +21,6 @@ main1 ()
   s *ptr = arr;
   s res[N];
   int i;
-  unsigned short x, y, z, w;
 
   for (i = 0; i < N; i++)
     {
@@ -35,6 +34,7 @@ main1 ()
 
   for (i = 0; i < N; i++)
     {
+      unsigned short x, y, z, w;
       x = ptr->b - ptr->a;
       y = ptr->d - ptr->c;
       res[i].c = x + y;
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index bc8763d..cb0f986 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -379,6 +379,7 @@ extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index c6441b8..0fb7155 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -140,6 +140,29 @@ do_while_loop_p (struct loop *loop)
       && gimple_code (stmt) == GIMPLE_COND)
     return false;
 
+  /* Leave anything with multiple exits, given it satisfies the above.  */
+  if (!single_exit (loop))
+    return true;
+
+  /* If any block in the loop has an exit edge, and code after it, it is
+     not a do-while loop.  */
+  basic_block *body = get_loop_body (loop);
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      bool block_has_exit = false, block_precedes_code = false;
+      edge_iterator ei;
+      edge e;
+      FOR_EACH_EDGE (e, ei, body[i]->succs)
+	if (loop_exit_edge_p (loop, e))
+	  block_has_exit = true;
+	else if (e->dest != loop->header
+		 && e->dest != loop->latch)
+	  block_precedes_code = true;
+      if (block_has_exit && block_precedes_code)
+        return false;
+    }
+  free (body);
+
   return true;
 }
 
@@ -162,21 +185,48 @@ const pass_data pass_data_ch =
   0, /* todo_flags_finish */
 };
 
-class pass_ch : public gimple_opt_pass
+const pass_data pass_data_ch_vect =
+{
+  GIMPLE_PASS, /* type */
+  "ch_vect", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_CH, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_ch_vect : public gimple_opt_pass
 {
 public:
-  pass_ch (gcc::context *ctxt)
-    : gimple_opt_pass (pass_data_ch, ctxt)
+  pass_ch_vect (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_ch_vect, ctxt)
   {}
 
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_ch != 0; }
   virtual unsigned int execute (function *);
+protected:
+  pass_ch_vect (const pass_data data, gcc::context *ctxt)
+    : gimple_opt_pass (data, ctxt)
+  {}
+}; // class pass_ch_vect
 
+/* Adds a call to loop_optimizer_init before it executes,
+   and loop_optimizer_finalize after.  */
+class pass_ch : public pass_ch_vect
+{
+public:
+  pass_ch (gcc::context *ctxt) : pass_ch_vect (pass_data_ch, ctxt)
+  {}
+
+  virtual unsigned int execute (function *);
 }; // class pass_ch
 
 unsigned int
-pass_ch::execute (function *fun)
+pass_ch_vect::execute (function *fun)
 {
   struct loop *loop;
   basic_block header;
@@ -186,13 +236,8 @@ pass_ch::execute (function *fun)
   unsigned bbs_size;
   bool changed = false;
 
-  loop_optimizer_init (LOOPS_HAVE_PREHEADERS
-		       | LOOPS_HAVE_SIMPLE_LATCHES);
   if (number_of_loops (fun) <= 1)
-    {
-      loop_optimizer_finalize ();
       return 0;
-    }
 
   bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
   copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
@@ -296,17 +341,35 @@ pass_ch::execute (function *fun)
       changed = true;
     }
 
-  update_ssa (TODO_update_ssa);
+  if (changed)
+    update_ssa (TODO_update_ssa);
   free (bbs);
   free (copied_bbs);
 
-  loop_optimizer_finalize ();
   return changed ? TODO_cleanup_cfg : 0;
 }
 
+unsigned int
+pass_ch::execute (function *fun)
+{
+  loop_optimizer_init (LOOPS_HAVE_PREHEADERS
+		       | LOOPS_HAVE_SIMPLE_LATCHES);
+
+  unsigned int res = pass_ch_vect::execute (fun);
+
+  loop_optimizer_finalize ();
+  return res;
+}
+
 } // anon namespace
 
 gimple_opt_pass *
+make_pass_ch_vect (gcc::context *ctxt)
+{
+  return new pass_ch_vect (ctxt);
+}
+
+gimple_opt_pass *
 make_pass_ch (gcc::context *ctxt)
 {
   return new pass_ch (ctxt);
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index ccb8f97..a5029bb 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -234,7 +234,9 @@ unsigned int
 pass_tree_loop_init::execute (function *fun ATTRIBUTE_UNUSED)
 {
   loop_optimizer_init (LOOPS_NORMAL
-		       | LOOPS_HAVE_RECORDED_EXITS);
+		       | LOOPS_HAVE_RECORDED_EXITS
+		       | LOOPS_HAVE_PREHEADERS
+		       | LOOPS_HAVE_SIMPLE_LATCHES);
   rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
 
   /* We might discover new loops, e.g. when turning irreducible

Reply via email to