Hi Jakub,

Thanks for the review! I believe I've addressed all your comments in
the attached version of the patch.

On Mon, 3 Jun 2019 18:23:00 +0200
Jakub Jelinek <ja...@redhat.com> wrote:

> Why vec<tree> * rather than vec<tree>?

> > @@ -878,6 +884,7 @@ new_omp_context (gimple *stmt, omp_context
> > *outer_ctx) }
> >  
> >    ctx->cb.decl_map = new hash_map<tree, tree>;
> > +  ctx->oacc_addressable_var_decls = new vec<tree> ();  
> 
> You then don't have to new it here and delete below.  As the context
> is cleared with XCNEW, you don't need to do anything here, and just
> release when deleting.  Note, even if using a pointer for some reason
> was needed (not in this case), using unconditional new for something
> only used for small subset of contexts is unacceptable, it would be
> then desirable to only create when needed.

Fixed.

> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This
> > information
> > +   is used to mark up variables that should be made private
> > per-gang.  */ +
> > +static void
> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> > +{
> > +  tree c;
> > +
> > +  if (!ctx)
> > +    return;
> > +
> > +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> > +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> > +      {
> > +   tree decl = OMP_CLAUSE_DECL (c);
> > +   if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> > +     ctx->oacc_addressable_var_decls->safe_push (decl);
> > +      }
> > +}  
> 
> You don't want to do this for all GOMP_FOR or GOMP_TARGET context,
> I'd hope you only want to do that for OpenACC contexts.  Perhaps it
> is ok to bail out early if the context isn't OpenACC one.  On the
> other side, the if (!ctx) condition makes no sense, the callers of
> course guarantee that ctx is non-NULL.

I'm not sure where that came from -- ctx can be NULL at the top-level
of lower_omp as called from execute_lower_omp. Maybe that was left over
from an earlier version of the patch. Anyway, I've removed that bit
and fixed the patch to only call oacc_record_private_var_clauses in
OpenACC contexts.

> > @@ -10665,6 +10774,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p,
> > omp_context *ctx) ctx);
> >        break;
> >      case GIMPLE_BIND:
> > +      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind
> > *> (stmt)));  
> 
> Again, why is this done unconditionally?  It should be relevant to
> gather it only in some subset of context, so guard that and don't do
> it otherwise.

And here (where ctx *can* be NULL).

> >        lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)),
> > ctx); maybe_remove_omp_member_access_dummy_vars (as_a <gbind *>
> > (stmt)); break;
> > @@ -10905,6 +11015,7 @@ execute_lower_omp (void)
> >  
> >    if (all_contexts)
> >      {
> > +      splay_tree_foreach (all_contexts,
> > process_oacc_gangprivate_1, NULL);  
> 
> Similarly.  Either guard with if (flag_openacc), or have some flag
> cleared at the start of the pass and set only if you find something
> interesting so that the splay_tree_foreach does something.

I've introduced maybe_oacc_gangprivate_vars, and the splay tree walk is
only called if that's true. It's set whenever something's put in
oacc_addressable_var_decls in some omp context.

Re-tested with offloading to NVPTX. OK?

Thanks,

Julian

commit 6c2a018b940d0b132395048b0600f7d897319ee2
Author: Julian Brown <jul...@codesourcery.com>
Date:   Thu Aug 9 20:27:04 2018 -0700

    [OpenACC] Add support for gang local storage allocation in shared memory
    
    2019-06-03  Julian Brown  <jul...@codesourcery.com>
                Chung-Lin Tang  <clt...@codesourcery.com>
    
            gcc/
            * config/nvptx/nvptx.c (tree-hash-traits.h): Include.
            (gangprivate_shared_size): New global variable.
            (gangprivate_shared_align): Likewise.
            (gangprivate_shared_sym): Likewise.
            (gangprivate_shared_hmap): Likewise.
            (nvptx_option_override): Initialize gangprivate_shared_sym,
            gangprivate_shared_align.
            (nvptx_file_end): Output gangprivate_shared_sym.
            (nvptx_goacc_expand_accel_var): New function.
            (nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
            function comment.
            (TARGET_GOACC_EXPAND_ACCEL): Likewise.
            * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
            * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
            * expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
            "oacc gangprivate" attribute.
            * omp-low.c (omp_context): Add oacc_partitioning_level and
            oacc_addressable_var_decls fields.
            (maybe_oacc_gangprivate_vars): New global variable.
            (delete_omp_context): Release oacc_addressable_var_decls in old
            omp_context.
            (lower_oacc_head_tail): Record partitioning-level count in omp context.
            (oacc_record_private_var_clauses, oacc_record_vars_in_bind,
            mark_oacc_gangprivate): New functions.
            (lower_omp_for): Call oacc_record_private_var_clauses with OpenACC "for"
            clauses.
            (lower_omp_target): Likewise, for OpenACC "target" clauses.
            Call mark_oacc_gangprivate for offloaded target regions.
            (process_oacc_gangprivate): New function.
            (lower_omp_1): Call oacc_record_vars_in_bind for GIMPLE_BIND within
            OpenACC regions.
            (execute_lower_omp): Call process_oacc_gangprivate for each OMP
            context.
            * target.def (expand_accel_var): New hook.
    
            libgomp/
            * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
            * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
            * testsuite/libgomp.oacc-c/pr85465.c: New test.
            * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
            * testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a28099ac89d..c93fb926609 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -74,6 +74,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -166,6 +167,12 @@ static unsigned vector_red_align;
 static unsigned vector_red_partition;
 static GTY(()) rtx vector_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -247,6 +254,10 @@ nvptx_option_override (void)
   vector_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
   vector_red_partition = 0;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -5237,6 +5248,10 @@ nvptx_file_end (void)
     write_shared_buffer (asm_out_file, vector_red_sym,
 			 vector_red_align, vector_red_size);
 
+  if (gangprivate_shared_size)
+    write_shared_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
+
   if (need_softstack_decl)
     {
       write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
@@ -6430,14 +6445,49 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+/* Implement TARGET_GOACC_EXPAND_ACCEL_VAR.  Place "oacc gangprivate"
+   variables in shared memory.  */
+
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (VAR_P (var)
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size
+	    = (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
 static GTY(()) tree nvptx_previous_fndecl;
 
+/* Implement TARGET_SET_CURRENT_FUNCTION.  Reset per-function context.  */
+
 static void
 nvptx_set_current_function (tree fndecl)
 {
   if (!fndecl || fndecl == nvptx_previous_fndecl)
     return;
 
+  gangprivate_shared_hmap.empty ();
   nvptx_previous_fndecl = fndecl;
   vector_red_partition = 0;
   oacc_bcast_partition = 0;
@@ -6579,6 +6629,9 @@ nvptx_set_current_function (tree fndecl)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 622e8cf240f..61da9709268 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6161,6 +6161,14 @@ like @code{cond_add@var{m}}.  The default implementation returns a zero
 constant of type @var{type}.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 17560fce6b7..5579623e331 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4210,6 +4210,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_PREFERRED_ELSE_VALUE
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index c78bc74c0d9..34510aab55d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may be intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a7f35ffe416..67e1e82ec00 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -141,6 +141,12 @@ struct omp_context
   /* True if lower_omp_1 should look up lastprivate conditional in parent
      context.  */
   bool combined_into_simd_safelen0;
+
+  /* The number of levels of OpenACC partitioning invoked in this context.  */
+  unsigned oacc_partitioning_levels;
+
+  /* Addressable variable decls in this context.  */
+  vec<tree> oacc_addressable_var_decls;
 };
 
 static splay_tree all_contexts;
@@ -148,6 +154,7 @@ static int taskreg_nesting_level;
 static int target_nesting_level;
 static bitmap task_shared_vars;
 static vec<omp_context *> taskreg_contexts;
+static bool maybe_oacc_gangprivate_vars;
 
 static void scan_omp (gimple_seq *, omp_context *);
 static tree scan_omp_1_op (tree *, int *, void *);
@@ -964,6 +971,7 @@ delete_omp_context (splay_tree_value value)
     }
 
   delete ctx->lastprivate_conditional_map;
+  ctx->oacc_addressable_var_decls.release ();
 
   XDELETE (ctx);
 }
@@ -6881,6 +6889,9 @@ lower_oacc_head_tail (location_t loc, tree clauses,
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
   gcc_assert (count);
+
+  ctx->oacc_partitioning_levels = count;
+
   for (unsigned done = 1; count; count--, done++)
     {
       gimple_seq fork_seq = NULL;
@@ -8582,6 +8593,77 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Record vars listed in private clauses in CLAUSES in CTX.  This information
+   is used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
+{
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+      {
+	tree decl = OMP_CLAUSE_DECL (c);
+	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
+	  {
+	    ctx->oacc_addressable_var_decls.safe_push (decl);
+	    maybe_oacc_gangprivate_vars = true;
+	  }
+      }
+}
+
+/* Record addressable vars declared in BINDVARS in CTX.  This information is
+   used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
+{
+  for (tree v = bindvars; v; v = DECL_CHAIN (v))
+    if (VAR_P (v) && TREE_ADDRESSABLE (v))
+      {
+	ctx->oacc_addressable_var_decls.safe_push (v);
+	maybe_oacc_gangprivate_vars = true;
+      }
+}
+
+/* Mark addressable variables which are declared implicitly or explicitly as
+   gang private with a special attribute.  These may need to have their
+   declarations altered later on in compilation (e.g. in
+   execute_oacc_device_lower or the backend, depending on how the OpenACC
+   execution model is implemented on a given target) to ensure that sharing
+   semantics are correct.  */
+
+static void
+mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
+{
+  int i;
+  tree decl;
+
+  FOR_EACH_VEC_ELT (*decls, i, decl)
+    {
+      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
+	{
+	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
+	  if (inner_decl)
+	    {
+	      decl = inner_decl;
+	      break;
+	    }
+	}
+      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file,
+		       "Setting 'oacc gangprivate' attribute for decl:");
+	      print_generic_decl (dump_file, decl, TDF_SLIM);
+	      fputc ('\n', dump_file);
+	    }
+	  DECL_ATTRIBUTES (decl)
+	    = tree_cons (get_identifier ("oacc gangprivate"),
+			 NULL, DECL_ATTRIBUTES (decl));
+	}
+    }
+}
 
 /* Lower code for an OMP loop directive.  */
 
@@ -8599,6 +8681,9 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  if (is_gimple_omp_oacc (ctx->stmt))
+    oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
+
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
   block = make_node (BLOCK);
@@ -9544,6 +9629,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   clauses = gimple_omp_target_clauses (stmt);
 
+  if (is_gimple_omp_oacc (ctx->stmt))
+    oacc_record_private_var_clauses (ctx, clauses);
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND))
@@ -9794,6 +9882,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
+      mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, ctx);
+
       /* Declare all the variables created by mapping and the variables
 	 declared in the scope of the target body.  */
       record_vars_into (ctx->block_vars, child_fn);
@@ -10645,6 +10735,25 @@ lower_omp_grid_body (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		       gimple_build_omp_return (false));
 }
 
+/* Find gang-private variables in a context.  */
+
+static int
+process_oacc_gangprivate (splay_tree_node node, void * ARG_UNUSED (data))
+{
+  omp_context *ctx = (omp_context *) node->value;
+  unsigned level_total = 0;
+  omp_context *thisctx;
+
+  for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
+    level_total += thisctx->oacc_partitioning_levels;
+
+  /* If the current context and parent contexts are distributed over a
+     total of one parallelism level, we have gang partitioning.  */
+  if (level_total == 1)
+    mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, ctx);
+
+  return 0;
+}
 
 /* Callback for lower_omp_1.  Return non-NULL if *tp needs to be
    regimplified.  If DATA is non-NULL, lower_omp_1 is outside
@@ -10789,6 +10898,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
+      if (ctx && is_gimple_omp_oacc (ctx->stmt))
+	oacc_record_vars_in_bind (ctx,
+				  gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
@@ -11024,6 +11136,7 @@ execute_lower_omp (void)
   FOR_EACH_VEC_ELT (taskreg_contexts, i, ctx)
     finish_taskreg_scan (ctx);
   taskreg_contexts.release ();
+  maybe_oacc_gangprivate_vars = false;
 
   if (all_contexts->root)
     {
@@ -11036,6 +11149,8 @@ execute_lower_omp (void)
 
   if (all_contexts)
     {
+      if (maybe_oacc_gangprivate_vars)
+	splay_tree_foreach (all_contexts, process_oacc_gangprivate, NULL);
       splay_tree_delete (all_contexts);
       all_contexts = NULL;
     }
diff --git a/gcc/target.def b/gcc/target.def
index 7d52102c815..5334c206afa 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1719,6 +1719,16 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 00000000000..f378346ed0a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 00000000000..a4f81a39e24
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      int g = (ary[ix] >> 16) & 255;
+      int w = (ary[ix] >> 8) & 255;
+      int v = ary[ix] & 255;
+
+      gangdist[g]++;
+      workerdist[w]++;
+      vectordist[v]++;
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/pr85465.c b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
new file mode 100644
index 00000000000..329e8a09cf9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int
+main (void)
+{
+#pragma acc parallel
+  foo ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
new file mode 100644
index 00000000000..5f8a5e650ea
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -0,0 +1,25 @@
+! Test for "oacc gangprivate" attribute on gang-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-omplower-details" }
+! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        !$acc atomic update
+        w = w + 1
+        !$acc end atomic
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
new file mode 100644
index 00000000000..d147229d91e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
@@ -0,0 +1,23 @@
+! Test for lack of "oacc gangprivate" attribute on worker-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-omplower-details" }
+! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl" 0 "omplower" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang worker private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        w = w + 1
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main

Reply via email to