On 11/11/15 12:02, Richard Biener wrote:
On Mon, 9 Nov 2015, Tom de Vries wrote:

On 09/11/15 16:35, Tom de Vries wrote:
Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

       1    Insert new exit block only when needed in
          transform_to_exit_first_loop_alt
       2    Make create_parallel_loop return void
       3    Ignore reduction clause on kernels directive
       4    Implement -foffload-alias
       5    Add in_oacc_kernels_region in struct loop
       6    Add pass_oacc_kernels
       7    Add pass_dominator_oacc_kernels
       8    Add pass_ch_oacc_kernels
       9    Add pass_parallelize_loops_oacc_kernels
      10    Add pass_oacc_kernels pass group in passes.def
      11    Update testcases after adding kernels pass group
      12    Handle acc loop directive
      13    Add c-c++-common/goacc/kernels-*.c
      14    Add gfortran.dg/goacc/kernels-*.f95
      15    Add libgomp.oacc-c-c++-common/kernels-*.c
      16    Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch adds the pass_oacc_kernels pass group to the pass list in
passes.def.

Note the repetition of pass_lim/pass_copy_prop. The first pair is for an inner
loop in a loop nest, the second for an outer loop in a loop nest.

@@ -86,6 +86,27 @@ along with GCC; see the file COPYING3.  If not see
           /* pass_build_ealias is a dummy pass that ensures that we
              execute TODO_rebuild_alias at this point.  */
           NEXT_PASS (pass_build_ealias);
+         /* Pass group that runs when there are oacc kernels in the
+            function.  */
+         NEXT_PASS (pass_oacc_kernels);
+         PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+             NEXT_PASS (pass_dominator_oacc_kernels);
+             NEXT_PASS (pass_ch_oacc_kernels);
+             NEXT_PASS (pass_dominator_oacc_kernels);
+             NEXT_PASS (pass_tree_loop_init);
+             NEXT_PASS (pass_lim);
+             NEXT_PASS (pass_copy_prop);
+             NEXT_PASS (pass_lim);
+             NEXT_PASS (pass_copy_prop);

iterate lim/copyprop twice?!  Why's that needed?


I've managed to eliminate the last pass_copy_prop, but not pass_lim. I've added a comment:
...
  /* We use pass_lim to rewrite in-memory iteration and reduction
     variable accesses in loops into local variables accesses.
     However, a single pass instantion manages to do this only for
     one loop level, so we use pass_lim twice to at least be able to
     handle a loop nest with a depth of two.  */
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_lim);
...

+             NEXT_PASS (pass_scev_cprop);

What's that for?  It's supposed to help removing loops - I don't
expect kernels to vanish.

I'm using pass_scev_cprop for the "final value replacement" functionality. Added comment.


+             NEXT_PASS (pass_tree_loop_done);
+             NEXT_PASS (pass_dominator_oacc_kernels);

Three times DOM?  No please.  I wonder why you don't run oacc_kernels
after FRE and drop the initial DOM(s).


Done. There's just one pass_dominator_oacc_kernels left now.

+             NEXT_PASS (pass_dce);
+             NEXT_PASS (pass_tree_loop_init);
+             NEXT_PASS (pass_parallelize_loops_oacc_kernels);
+             NEXT_PASS (pass_expand_omp_ssa);
+             NEXT_PASS (pass_tree_loop_done);

The switches into/outof tree_loop also look odd to me, but well
(they'll be controlled by -ftree-loop-optimize)).


I've eliminated all the uses for pass_tree_loop_init/pass_tree_loop_done in the pass group. Instead, I've added conditional loop optimizer setup in:
-  pass_lim and pass_scev_cprop (added in this patch), and
- pass_parallelize_loops_oacc_kernels (added in patch "Add
  pass_parallelize_loops_oacc_kernels").

Thanks,
- Tom

Add pass_oacc_kernels pass group in passes.def

2015-11-09  Tom de Vries  <t...@codesourcery.com>

	* omp-low.c (pass_expand_omp_ssa::clone): New function.
	* passes.def: Add pass_oacc_kernels pass group.
	* tree-ssa-loop-ch.c (pass_ch::clone): New function.
	* tree-ssa-loop-im.c (tree_ssa_lim): Allow to run outside
	pass_tree_loop.
	* tree-ssa-loop.c (pass_scev_cprop::clone): New function.
	(pass_scev_cprop::execute): Allow to run outside pass_tree_loop.

---
 gcc/omp-low.c          |  1 +
 gcc/passes.def         | 25 +++++++++++++++++++++++++
 gcc/tree-ssa-loop-ch.c |  2 ++
 gcc/tree-ssa-loop-im.c | 14 ++++++++++++++
 gcc/tree-ssa-loop.c    | 22 +++++++++++++++++++++-
 5 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 9eae09a..8078afb 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13385,6 +13385,7 @@ public:
       return !(fun->curr_properties & PROP_gimple_eomp);
     }
   virtual unsigned int execute (function *) { return execute_expand_omp (); }
+  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
 
diff --git a/gcc/passes.def b/gcc/passes.def
index db822d3..d76cfd3 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -87,6 +87,31 @@ along with GCC; see the file COPYING3.  If not see
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
 	  NEXT_PASS (pass_fre);
+	  /* Pass group that runs when the function is an offloaded function
+	     containing oacc kernels loops.  */
+	  NEXT_PASS (pass_oacc_kernels);
+	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+	      /* We need pass_ch here, because pass_lim has no effect on
+	         exit-first loops (PR65442).  Ideally we want to remove both
+		 this pass instantiation, and the reverse transformation
+		 transform_to_exit_first_loop_alt, which is done in
+		 pass_parallelize_loops_oacc_kernels. */
+	      NEXT_PASS (pass_ch);
+	      /* We use pass_lim to rewrite in-memory iteration and reduction
+	         variable accesses in loops into local variables accesses.
+		 However, a single pass instantion manages to do this only for
+		 one loop level, so we use pass_lim twice to at least be able to
+		 handle a loop nest with a depth of two.  */
+	      NEXT_PASS (pass_lim);
+	      NEXT_PASS (pass_copy_prop);
+	      NEXT_PASS (pass_lim);
+	      /* We use pass_scev_cprop here for final value replacement.  */
+	      NEXT_PASS (pass_scev_cprop);
+	      NEXT_PASS (pass_dominator_oacc_kernels);
+	      NEXT_PASS (pass_dce);
+	      NEXT_PASS (pass_parallelize_loops_oacc_kernels);
+	      NEXT_PASS (pass_expand_omp_ssa);
+	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_merge_phi);
           NEXT_PASS (pass_dse);
 	  NEXT_PASS (pass_cd_dce);
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 7e618bf..6493fcc 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -165,6 +165,8 @@ public:
   /* Initialize and finalize loop structures, copying headers inbetween.  */
   virtual unsigned int execute (function *);
 
+  opt_pass * clone () { return new pass_ch (m_ctxt); }
+
 protected:
   /* ch_base method: */
   virtual bool process_loop_p (struct loop *loop);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 30b53ce..48810f3 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-propagate.h"
 #include "trans-mem.h"
 #include "gimple-fold.h"
+#include "tree-scalar-evolution.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -2501,6 +2502,19 @@ tree_ssa_lim (void)
 {
   unsigned int todo;
 
+  if (!loops_state_satisfies_p (LOOPS_NORMAL
+				| LOOPS_HAVE_RECORDED_EXITS
+				| LOOP_CLOSED_SSA))
+    {
+      loop_optimizer_init (LOOPS_NORMAL
+			   | LOOPS_HAVE_RECORDED_EXITS);
+      rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+
+      /* We might discover new loops, e.g. when turning irreducible
+	 regions into reducible.  */
+      scev_initialize ();
+    }
+
   tree_ssa_lim_initialize ();
 
   /* Gathers information about memory accesses in the loops.  */
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index b51cac2..570406f 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -373,10 +373,30 @@ public:
 
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_scev_cprop; }
-  virtual unsigned int execute (function *) { return scev_const_prop (); }
+  virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_scev_cprop (m_ctxt); }
 
 }; // class pass_scev_cprop
 
+unsigned int
+pass_scev_cprop::execute (function *)
+{
+  if (!loops_state_satisfies_p (LOOPS_NORMAL
+				| LOOPS_HAVE_RECORDED_EXITS
+				| LOOP_CLOSED_SSA))
+    {
+      loop_optimizer_init (LOOPS_NORMAL
+			   | LOOPS_HAVE_RECORDED_EXITS);
+      rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+
+      /* We might discover new loops, e.g. when turning irreducible
+	 regions into reducible.  */
+      scev_initialize ();
+    }
+
+  return scev_const_prop (); 
+}
+
 } // anon namespace
 
 gimple_opt_pass *

Reply via email to