Hi!

On Fri, 22 Jan 2016 14:31:35 +0100, Bernd Schmidt <bschm...@redhat.com> wrote:
> On 01/22/2016 02:25 PM, Jakub Jelinek wrote:
> 
> > What about #pragma oacc parallel?  That would never do that?
> 
> It shouldn't, no (IMO).

Correct.


Here is the patch re-worked for trunk.  Instead of passing
-foffload-force in the affected libgomp test cases, I instead chose to
have them expect the warning.  This way, we're testing more in line to
what users will be doing, and we'll notice how the OpenACC kernels
handling improves, when parloops gets able to parallelize more offloaded
code (and the "avoid offloading" handling will no longer trigger).  OK to
commit?

commit acd66946777671486a0f69706b25a3ec5f877306
Author: Thomas Schwinge <tho...@codesourcery.com>
Date:   Tue Feb 2 20:41:42 2016 +0100

    Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid 
offloading"
    
        gcc/
        * common.opt: Add -foffload-force.
        * lto-wrapper.c (merge_and_complain, append_compiler_options):
        Handle it.
        * doc/invoke.texi: Document it.
        * config/nvptx/mkoffload.c (struct id_map): Add "flags" member.
        (record_id): Parse, and set it.
        (process): Use it.
        * config/nvptx/nvptx.c (nvptx_attribute_table): Add "omp avoid
        offloading".
        (nvptx_record_offload_symbol): Use it.
        (nvptx_goacc_validate_dims): Set it.
        libgomp/
        * libgomp.h (gomp_offload_target_available_p): New function
        declaration.
        * target.c (gomp_offload_target_available_p): New function
        definition.
        (GOMP_offload_register_ver, GOMP_offload_unregister_ver)
        (gomp_init_device, gomp_unload_device): Handle and document "avoid
        offloading" flag ("host_table == NULL").
        (resolve_device): Document "avoid offloading".
        * oacc-init.c (resolve_device): Likewise.
        * libgomp.texi (Enabling OpenACC): Likewise.
        * testsuite/lib/libgomp.exp
        (check_effective_target_nvptx_offloading_configured): New proc
        definition.
        * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: New
        file.
        * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c:
        Likewise.
        * testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise.
        * testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise.
        * testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/abort-3.c: Expect warning.
        * testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
        Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
        Likewise.
        * testsuite/libgomp.oacc-fortran/combined-directives-1.f90:
        Likewise.
        * testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Likewise.
    
        libgomp/
        * testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c: Set
        "-ftree-parallelize-loops=32".
        * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/nested-2.c: Likewise.
---
 gcc/common.opt                                     |    4 +
 gcc/config/nvptx/mkoffload.c                       |   73 +++++++++++-
 gcc/config/nvptx/nvptx.c                           |   42 ++++++-
 gcc/doc/invoke.texi                                |   12 +-
 gcc/lto-wrapper.c                                  |    2 +
 libgomp/libgomp.h                                  |    1 +
 libgomp/libgomp.texi                               |    8 ++
 libgomp/oacc-init.c                                |   19 ++-
 libgomp/target.c                                   |  122 ++++++++++++++++----
 libgomp/testsuite/lib/libgomp.exp                  |   10 ++
 .../testsuite/libgomp.oacc-c-c++-common/abort-3.c  |    4 +-
 .../testsuite/libgomp.oacc-c-c++-common/abort-4.c  |    4 +-
 .../libgomp.oacc-c-c++-common/avoid-offloading-1.c |   28 +++++
 .../libgomp.oacc-c-c++-common/avoid-offloading-2.c |   38 ++++++
 .../libgomp.oacc-c-c++-common/avoid-offloading-3.c |   29 +++++
 .../combined-directives-1.c                        |    4 +-
 .../libgomp.oacc-c-c++-common/default-1.c          |    4 +-
 .../libgomp.oacc-c-c++-common/deviceptr-1.c        |    4 +-
 .../libgomp.oacc-c-c++-common/host_data-1.c        |    1 +
 .../libgomp.oacc-c-c++-common/kernels-1.c          |   10 +-
 .../kernels-alias-ipa-pta-2.c                      |    4 +-
 .../kernels-alias-ipa-pta-3.c                      |    4 +-
 .../kernels-alias-ipa-pta.c                        |    4 +-
 .../libgomp.oacc-c-c++-common/kernels-empty.c      |    2 +-
 .../kernels-loop-and-seq-2.c                       |    3 +-
 .../kernels-loop-and-seq-3.c                       |    4 +-
 .../kernels-loop-and-seq-4.c                       |    3 +-
 .../kernels-loop-and-seq-5.c                       |    3 +-
 .../kernels-loop-and-seq-6.c                       |    3 +-
 .../kernels-loop-and-seq.c                         |    4 +-
 .../kernels-loop-collapse.c                        |    3 +-
 .../testsuite/libgomp.oacc-c-c++-common/nested-2.c |    2 +-
 .../libgomp.oacc-fortran/avoid-offloading-1.f      |   32 +++++
 .../libgomp.oacc-fortran/avoid-offloading-2.f      |   41 +++++++
 .../libgomp.oacc-fortran/avoid-offloading-3.f      |   31 +++++
 .../libgomp.oacc-fortran/combined-directives-1.f90 |    5 +-
 .../libgomp.oacc-fortran/non-scalar-data.f90       |    5 +-
 37 files changed, 494 insertions(+), 78 deletions(-)

diff --git gcc/common.opt gcc/common.opt
index 520fa9c..2cf798d 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -1779,6 +1779,10 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-force
+Common Var(flag_offload_force)
+Force offloading if the compiler wanted to avoid it.
+
 fomit-frame-pointer
 Common Report Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c
index c8eed45..586ee8b 100644
--- gcc/config/nvptx/mkoffload.c
+++ gcc/config/nvptx/mkoffload.c
@@ -41,9 +41,19 @@ const char tool_name[] = "nvptx mkoffload";
 
 #define COMMENT_PREFIX "#"
 
+enum id_map_flag
+  {
+    /* All clear.  */
+    ID_MAP_FLAG_NONE = 0,
+    /* Avoid offloading.  For example, because there is no sufficient
+       parallelism.  */
+    ID_MAP_FLAG_AVOID_OFFLOADING = 1
+  };
+
 struct id_map
 {
   id_map *next;
+  int flags;
   char *ptx_name;
 };
 
@@ -107,6 +117,38 @@ record_id (const char *p1, id_map ***where)
     fatal_error (input_location, "malformed ptx file");
 
   id_map *v = XNEW (id_map);
+
+  /* Do we have any flags?  */
+  v->flags = ID_MAP_FLAG_NONE;
+  if (p1[0] == '(')
+    {
+      /* Current flag.  */
+      const char *cur = p1 + 1;
+
+      /* Seek to the beginning of ") ".  */
+      p1 = strchr (cur, ')');
+      if (!p1 || p1 > end || p1[1] != ' ')
+       fatal_error (input_location, "malformed ptx file: "
+                    "expected \") \" at \"%s\"", cur);
+
+      while (cur < p1)
+       {
+         const char *next = strchr (cur, ',');
+         if (!next || next > p1)
+           next = p1;
+
+         if (strncmp (cur, "avoid offloading", next - cur - 1) == 0)
+           v->flags |= ID_MAP_FLAG_AVOID_OFFLOADING;
+         else
+           fatal_error (input_location, "malformed ptx file: "
+                        "unknown flag at \"%s\"", cur);
+
+         cur = next;
+       }
+
+      /* Skip past ") ".  */
+      p1 += 2;
+    }
   size_t len = end - p1;
   v->ptx_name = XNEWVEC (char, len + 1);
   memcpy (v->ptx_name, p1, len);
@@ -296,12 +338,17 @@ process (FILE *in, FILE *out)
   fprintf (out, "\n};\n\n");
 
   /* Dump out function idents.  */
+  bool avoid_offloading_p = false;
   fprintf (out, "static const struct nvptx_fn {\n"
           "  const char *name;\n"
           "  unsigned short dim[%d];\n"
           "} func_mappings[] = {\n", GOMP_DIM_MAX);
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
-    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    {
+      if (id->flags & ID_MAP_FLAG_AVOID_OFFLOADING)
+       avoid_offloading_p = true;
+      fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+    }
   fprintf (out, "\n};\n\n");
 
   fprintf (out,
@@ -318,7 +365,11 @@ process (FILE *in, FILE *out)
           "  sizeof (var_mappings) / sizeof (var_mappings[0]),\n"
           "  func_mappings,"
           "  sizeof (func_mappings) / sizeof (func_mappings[0])\n"
-          "};\n\n");
+          "};\n");
+  if (avoid_offloading_p)
+    /* Need a unique handle for target_data.  */
+    fprintf (out, "static int target_data_avoid_offloading;\n");
+  fprintf (out, "\n");
 
   fprintf (out, "#ifdef __cplusplus\n"
           "extern \"C\" {\n"
@@ -338,18 +389,28 @@ process (FILE *in, FILE *out)
   fprintf (out, "static __attribute__((constructor)) void init (void)\n"
           "{\n"
           "  GOMP_offload_register_ver (%#x, __OFFLOAD_TABLE__,"
-          "%d/*NVIDIA_PTX*/, &target_data);\n"
-          "};\n",
+          "%d/*NVIDIA_PTX*/, &target_data);\n",
           GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
           GOMP_DEVICE_NVIDIA_PTX);
+  if (avoid_offloading_p)
+    fprintf (out, "  GOMP_offload_register_ver (%#x, (void *) 0,"
+            "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n",
+            GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
+            GOMP_DEVICE_NVIDIA_PTX);
+  fprintf (out, "};\n");
 
   fprintf (out, "static __attribute__((destructor)) void fini (void)\n"
           "{\n"
           "  GOMP_offload_unregister_ver (%#x, __OFFLOAD_TABLE__,"
-          "%d/*NVIDIA_PTX*/, &target_data);\n"
-          "};\n",
+          "%d/*NVIDIA_PTX*/, &target_data);\n",
           GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
           GOMP_DEVICE_NVIDIA_PTX);
+  if (avoid_offloading_p)
+    fprintf (out, "  GOMP_offload_unregister_ver (%#x, (void *) 0,"
+            "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n",
+            GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
+            GOMP_DEVICE_NVIDIA_PTX);
+  fprintf (out, "};\n");
 }
 
 static void
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index 78614f8..fe28154 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -3803,6 +3803,9 @@ static const struct attribute_spec 
nvptx_attribute_table[] =
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
        affects_type_identity } */
   { "kernel", 0, 0, true, false,  false, nvptx_handle_kernel_attribute, false 
},
+  /* Avoid offloading.  For example, because there is no sufficient
+     parallelism.  */
+  { "omp avoid offloading", 0, 0, true, false, false, NULL, false },
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 
@@ -3867,7 +3870,10 @@ nvptx_record_offload_symbol (tree decl)
        tree dims = TREE_VALUE (attr);
        unsigned ix;
 
-       fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
+       fprintf (asm_out_file, "//:FUNC_MAP %s\"%s\"",
+                (lookup_attribute ("omp avoid offloading",
+                                   DECL_ATTRIBUTES (decl))
+                 ? "(avoid offloading) " : ""),
                 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
 
        for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
@@ -4124,6 +4130,40 @@ nvptx_expand_builtin (tree exp, rtx target, rtx 
ARG_UNUSED (subtarget),
 static bool
 nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
 {
+  /* Detect if a function is unsuitable for offloading.  */
+  if (!flag_offload_force && decl)
+    {
+      tree oacc_function_attr = get_oacc_fn_attrib (decl);
+      if (oacc_function_attr
+         && oacc_fn_attrib_kernels_p (oacc_function_attr))
+       {
+         bool avoid_offloading_p = true;
+         for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++)
+           {
+             if (dims[ix] > 1)
+               {
+                 avoid_offloading_p = false;
+                 break;
+               }
+           }
+         if (avoid_offloading_p)
+           {
+             /* OpenACC kernels constructs will never be parallelized for
+                optimization levels smaller than -O2; avoid the diagnostic in
+                this case.  */
+             if (optimize >= 2)
+               warning_at (DECL_SOURCE_LOCATION (decl), 0,
+                           "OpenACC kernels construct will be executed "
+                           "sequentially; will by default avoid offloading "
+                           "to prevent data copy penalty");
+             DECL_ATTRIBUTES (decl)
+               = tree_cons (get_identifier ("omp avoid offloading"),
+                            NULL_TREE, DECL_ATTRIBUTES (decl));
+
+           }
+       }
+    }
+
   bool changed = false;
 
   /* The vector size must be 32, unless this is a SEQ routine.  */
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index fcc404e..c09fbc5 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -180,7 +180,8 @@ in the following sections.
 @gccoptlist{-ansi  -std=@var{standard}  -fgnu89-inline @gol
 -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol
 -fno-asm  -fno-builtin  -fno-builtin-@var{function} @gol
--fhosted  -ffreestanding -fopenacc -fopenmp -fopenmp-simd @gol
+-fhosted  -ffreestanding @gol
+-foffload-force -fopenacc -fopenacc-dim=@var{geom} -fopenmp -fopenmp-simd @gol
 -fms-extensions -fplan9-extensions -fsso-struct=@var{endianness}
 -fallow-single-precision  -fcond-mismatch -flax-vector-conversions @gol
 -fsigned-bitfields  -fsigned-char @gol
@@ -1953,6 +1954,15 @@ This is equivalent to @option{-fno-hosted}.
 @xref{Standards,,Language Standards Supported by GCC}, for details of
 freestanding and hosted environments.
 
+@item -foffload-force
+@opindex -foffload-force
+The option @option{-foffload-force} forces offloading if the compiler
+wanted to avoid it.  For example, when there isn't sufficient
+parallelism in certain offloading constructs, the compiler may come to
+the conclusion that offloading incurs too much overhead (for data
+transfers, for example), and unless overridden with this flag, it then
+suggests to the runtime (libgomp) to avoid offloading.
+
 @item -fopenacc
 @opindex fopenacc
 @cindex OpenACC accelerator programming
diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c
index ced6f2f..702ae47 100644
--- gcc/lto-wrapper.c
+++ gcc/lto-wrapper.c
@@ -275,6 +275,7 @@ merge_and_complain (struct cl_decoded_option 
**decoded_options,
        case OPT_fsigned_zeros:
        case OPT_ftrapping_math:
        case OPT_fwrapv:
+       case OPT_foffload_force:
        case OPT_fopenmp:
        case OPT_fopenacc:
        case OPT_fcilkplus:
@@ -517,6 +518,7 @@ append_compiler_options (obstack *argv_obstack, struct 
cl_decoded_option *opts,
        case OPT_fsigned_zeros:
        case OPT_ftrapping_math:
        case OPT_fwrapv:
+       case OPT_foffload_force:
        case OPT_fopenmp:
        case OPT_fopenacc:
        case OPT_fopenacc_dim_:
diff --git libgomp/libgomp.h libgomp/libgomp.h
index 7108a6d..8747b72 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -984,6 +984,7 @@ extern void gomp_unmap_vars (struct target_mem_desc *, 
bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_unload_device (struct gomp_device_descr *);
+extern bool gomp_offload_target_available_p (int);
 
 /* work.c */
 
diff --git libgomp/libgomp.texi libgomp/libgomp.texi
index 987ee5f..5795c00 100644
--- libgomp/libgomp.texi
+++ libgomp/libgomp.texi
@@ -1815,6 +1815,14 @@ flag @option{-fopenacc} must be specified.  This enables 
the OpenACC directive
 arranges for automatic linking of the OpenACC runtime library 
 (@ref{OpenACC Runtime Library Routines}).
 
+Offloading is enabled by default.  In some cases, the compiler may
+come to the conclusion that offloading incurs too much overhead, and
+suggest to the runtime to avoid it.  To counteract that, you can use
+the option @option{-foffload-force} to force offloading in such cases.
+Alternatively, offloading is also enabled if a specific device type is
+requested, in a call to @code{acc_init} or by setting the
+@env{ACC_DEVICE_TYPE} environment variable, for example.
+
 A complete description of all OpenACC directives accepted may be found in 
 the @uref{http://www.openacc.org/, OpenACC} Application Programming
 Interface manual, version 2.0.
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 42d005d..2f053f3 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -122,7 +122,10 @@ resolve_device (acc_device_t d, bool fail_is_error)
       {
        if (goacc_device_type)
          {
-           /* Lookup the named device.  */
+           /* Lookup the device that has been explicitly named, so do not pay
+              attention to gomp_offload_target_available_p.  (That is,
+              enforced usage even with an "avoid offloading" flag set, and
+              hard error if not actually available.)  */
            while (++d != _ACC_device_hwm)
              if (dispatchers[d]
                  && !strcasecmp (goacc_device_type,
@@ -148,8 +151,15 @@ resolve_device (acc_device_t d, bool fail_is_error)
     case acc_device_not_host:
       /* Find the first available device after acc_device_not_host.  */
       while (++d != _ACC_device_hwm)
-       if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+       if (dispatchers[d]
+           && dispatchers[d]->get_num_devices_func () > 0
+           /* No device has been explicitly named, so pay attention to
+              gomp_offload_target_available_p, to not decide on an offload
+              target that we don't have offload data available for, or have an
+              "avoid offloading" flag set for.  */
+           && gomp_offload_target_available_p (dispatchers[d]->type))
          goto found;
+      /* No non-host device found.  */
       if (d_arg == acc_device_default)
        {
          d = acc_device_host;
@@ -168,7 +178,7 @@ resolve_device (acc_device_t d, bool fail_is_error)
       break;
 
     default:
-      if (d > _ACC_device_hwm)
+      if (d >= _ACC_device_hwm)
        {
          if (fail_is_error)
            goto unsupported_device;
@@ -181,7 +191,8 @@ resolve_device (acc_device_t d, bool fail_is_error)
 
   assert (d != acc_device_none
          && d != acc_device_default
-         && d != acc_device_not_host);
+         && d != acc_device_not_host
+         && d < _ACC_device_hwm);
 
   if (dispatchers[d] == NULL && fail_is_error)
     {
diff --git libgomp/target.c libgomp/target.c
index 96fe3d5..afcbedb 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -1165,12 +1165,19 @@ gomp_unload_image_from_device (struct gomp_device_descr 
*devicep,
 
 /* This function should be called from every offload image while loading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
-   the target, and TARGET_DATA needed by target plugin.  */
+   the target, and TARGET_DATA needed by target plugin.
+
+   If HOST_TABLE is NULL, this image (TARGET_DATA) is stored as an "avoid
+   offloading" flag, and the TARGET_TYPE will not be considered by default
+   until this image gets unregistered.  */
 
 void
 GOMP_offload_register_ver (unsigned version, const void *host_table,
                           int target_type, const void *target_data)
 {
+  gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__,
+             version, host_table, target_type, target_data);
+
   int i;
 
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
@@ -1179,16 +1186,19 @@ GOMP_offload_register_ver (unsigned version, const void 
*host_table,
   
   gomp_mutex_lock (&register_lock);
 
-  /* Load image to all initialized devices.  */
-  for (i = 0; i < num_devices; i++)
+  if (host_table != NULL)
     {
-      struct gomp_device_descr *devicep = &devices[i];
-      gomp_mutex_lock (&devicep->lock);
-      if (devicep->type == target_type
-         && devicep->state == GOMP_DEVICE_INITIALIZED)
-       gomp_load_image_to_device (devicep, version,
-                                  host_table, target_data, true);
-      gomp_mutex_unlock (&devicep->lock);
+      /* Load image to all initialized devices.  */
+      for (i = 0; i < num_devices; i++)
+       {
+         struct gomp_device_descr *devicep = &devices[i];
+         gomp_mutex_lock (&devicep->lock);
+         if (devicep->type == target_type
+             && devicep->state == GOMP_DEVICE_INITIALIZED)
+           gomp_load_image_to_device (devicep, version,
+                                      host_table, target_data, true);
+         gomp_mutex_unlock (&devicep->lock);
+       }
     }
 
   /* Insert image to array of pending images.  */
@@ -1214,26 +1224,36 @@ GOMP_offload_register (const void *host_table, int 
target_type,
 
 /* This function should be called from every offload image while unloading.
    It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
-   the target, and TARGET_DATA needed by target plugin.  */
+   the target, and TARGET_DATA needed by target plugin.
+
+   If HOST_TABLE is NULL, the "avoid offloading" flag gets cleared for this
+   image (TARGET_DATA), and this TARGET_TYPE may again be considered by
+   default.  */
 
 void
 GOMP_offload_unregister_ver (unsigned version, const void *host_table,
                             int target_type, const void *target_data)
 {
+  gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__,
+             version, host_table, target_type, target_data);
+
   int i;
 
   gomp_mutex_lock (&register_lock);
 
-  /* Unload image from all initialized devices.  */
-  for (i = 0; i < num_devices; i++)
+  if (host_table != NULL)
     {
-      struct gomp_device_descr *devicep = &devices[i];
-      gomp_mutex_lock (&devicep->lock);
-      if (devicep->type == target_type
-         && devicep->state == GOMP_DEVICE_INITIALIZED)
-       gomp_unload_image_from_device (devicep, version,
-                                      host_table, target_data);
-      gomp_mutex_unlock (&devicep->lock);
+      /* Unload image from all initialized devices.  */
+      for (i = 0; i < num_devices; i++)
+       {
+         struct gomp_device_descr *devicep = &devices[i];
+         gomp_mutex_lock (&devicep->lock);
+         if (devicep->type == target_type
+             && devicep->state == GOMP_DEVICE_INITIALIZED)
+           gomp_unload_image_from_device (devicep, version,
+                                          host_table, target_data);
+         gomp_mutex_unlock (&devicep->lock);
+       }
     }
 
   /* Remove image from array of pending images.  */
@@ -1267,7 +1287,8 @@ gomp_init_device (struct gomp_device_descr *devicep)
   for (i = 0; i < num_offload_images; i++)
     {
       struct offload_image_descr *image = &offload_images[i];
-      if (image->type == devicep->type)
+      if (image->type == devicep->type
+         && image->host_table != NULL)
        gomp_load_image_to_device (devicep, image->version,
                                   image->host_table, image->target_data,
                                   false);
@@ -1287,7 +1308,8 @@ gomp_unload_device (struct gomp_device_descr *devicep)
       for (i = 0; i < num_offload_images; i++)
        {
          struct offload_image_descr *image = &offload_images[i];
-         if (image->type == devicep->type)
+         if (image->type == devicep->type
+             && image->host_table != NULL)
            gomp_unload_image_from_device (devicep, image->version,
                                           image->host_table,
                                           image->target_data);
@@ -1311,6 +1333,62 @@ gomp_free_memmap (struct splay_tree_s *mem_map)
     }
 }
 
+/* Do we have offload data available for the given offload target type?
+   Instead of verifying that *all* offload data is available that could
+   possibly be required, we instead just look for *any*.  If we later find any
+   offload data missing, that's user error.  If any offload data of this target
+   type is tagged with an "avoid offloading" flag, do not consider this target
+   type available unless it has been initialized already.  */
+
+attribute_hidden bool
+gomp_offload_target_available_p (int type)
+{
+  bool available = false;
+
+  /* Has the offload target type already been initialized?  */
+  for (int i = 0; !available && i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type == type
+         && devicep->state == GOMP_DEVICE_INITIALIZED)
+       available = true;
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  /* If the offload target type has been initialized already, we ignore "avoid
+     offloading" flags.  This is important, because data/state may be present
+     on the device, that we must continue to use.  */
+  if (!available)
+    {
+      gomp_mutex_lock (&register_lock);
+      if (num_offload_images == 0)
+       /* If there is no offload data available at all, there is no way to
+          later fail to find any of it for a specific offload target type.
+          This is the case where there are no offloaded code regions in user
+          code, but the target type can be initialized successfully, and
+          executable directqives be used, or runtime library calls be
+          made.  */
+       available = true;
+      else
+       {
+         /* Can the offload target be initialized?  */
+         for (int i = 0; !available && i < num_offload_images; i++)
+           if (offload_images[i].type == type
+               && offload_images[i].host_table != NULL)
+             available = true;
+         /* If yes, is an "avoid offloading" flag set?  */
+         for (int i = 0; available && i < num_offload_images; i++)
+           if (offload_images[i].type == type
+               && offload_images[i].host_table == NULL)
+             available = false;
+       }
+      gomp_mutex_unlock (&register_lock);
+    }
+
+  return available;
+}
+
 /* Host fallback for GOMP_target{,_ext} routines.  */
 
 static void
diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
index a4c9d83..8d2be80 100644
--- libgomp/testsuite/lib/libgomp.exp
+++ libgomp/testsuite/lib/libgomp.exp
@@ -344,6 +344,16 @@ proc check_effective_target_offload_device_nonshared_as { 
} {
     } ]
 }
 
+# Return 1 if the compiler has been configured for nvptx offloading.
+
+proc check_effective_target_nvptx_offloading_configured { } {
+    # PR libgomp/65099: Currently, we only support offloading in 64-bit
+    # configurations.
+    global offload_targets
+    return [expr [string match "*,nvptx,*" ",$offload_targets,"] \
+               && [is-effective-target lp64] ]
+}
+
 # Return 1 if at least one nvidia board is present.
 
 proc check_effective_target_openacc_nvidia_accel_present { } {
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
index bca425e..23156d8 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
@@ -1,5 +1,3 @@
-/* { dg-do run } */
-
 #include <stdio.h>
 #include <stdlib.h>
 
@@ -7,7 +5,7 @@ int
 main (void)
 {
   fprintf (stderr, "CheCKpOInT\n");
-#pragma acc kernels
+#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be 
executed sequentially; will by default avoid offloading to prevent data copy 
penalty" "" { target nvptx_offloading_configured } } */
   {
     abort ();
   }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
index c29ca3f..f4d6a07 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
@@ -1,12 +1,10 @@
-/* { dg-do run } */
-
 #include <stdlib.h>
 
 int
 main (int argc, char **argv)
 {
 
-#pragma acc kernels
+#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be 
executed sequentially; will by default avoid offloading to prevent data copy 
penalty" "" { target nvptx_offloading_configured } } */
   {
     if (argc != 1)
       abort ();
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
new file mode 100644
index 0000000..08745fc
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
@@ -0,0 +1,28 @@
+/* Test that the compiler decides to "avoid offloading".  */
+
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* The ACC_DEVICE_TYPE environment variable gets set in the testing
+   framework, and that overrides the "avoid offloading" flag at run time.
+   { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } */
+
+#include <openacc.h>
+
+int main(void)
+{
+  int x, y;
+
+#pragma acc data copyout(x, y)
+#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be 
executed sequentially; will by default avoid offloading to prevent data copy 
penalty" "" { target nvptx_offloading_configured } } */
+  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
+
+  if (x != 33)
+    __builtin_abort();
+#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia
+  if (y != 1)
+    __builtin_abort();
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
new file mode 100644
index 0000000..724228a
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
@@ -0,0 +1,38 @@
+/* Test that a user can override the compiler's "avoid offloading"
+   decision at run time.  */
+
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+
+#include <openacc.h>
+
+int main(void)
+{
+  /* Override the compiler's "avoid offloading" decision.  */
+  acc_device_t d;
+#if defined ACC_DEVICE_TYPE_nvidia
+  d = acc_device_nvidia;
+#elif defined ACC_DEVICE_TYPE_host
+  d = acc_device_host;
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+  acc_init (d);
+
+  int x, y;
+
+#pragma acc data copyout(x, y)
+#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be 
executed sequentially; will by default avoid offloading to prevent data copy 
penalty" "" { target nvptx_offloading_configured } } */
+  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
+
+  if (x != 33)
+    __builtin_abort();
+#if defined ACC_DEVICE_TYPE_nvidia
+  if (y != 0)
+    __builtin_abort();
+#else
+  if (y != 1)
+    __builtin_abort();
+#endif
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
new file mode 100644
index 0000000..2fb5196
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
@@ -0,0 +1,29 @@
+/* Test that a user can override the compiler's "avoid offloading"
+   decision at compile time.  */
+
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* Override the compiler's "avoid offloading" decision.
+   { dg-additional-options "-foffload-force" } */
+
+#include <openacc.h>
+
+int main(void)
+{
+  int x, y;
+
+#pragma acc data copyout(x, y)
+#pragma acc kernels
+  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
+
+  if (x != 33)
+    __builtin_abort();
+#if defined ACC_DEVICE_TYPE_nvidia
+  if (y != 0)
+    __builtin_abort();
+#else
+  if (y != 1)
+    __builtin_abort();
+#endif
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
index dad6d13..87ca378 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
@@ -1,6 +1,6 @@
 /* This test exercises combined directives.  */
 
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
 
@@ -33,7 +33,7 @@ main (int argc, char **argv)
        abort ();
     }
 
-#pragma acc kernels loop copy (a[0:N]) copy (b[0:N])
+#pragma acc kernels loop copy (a[0:N]) copy (b[0:N]) /* { dg-warning "OpenACC 
kernels construct will be executed sequentially; will by default avoid 
offloading to prevent data copy penalty" "" { target 
nvptx_offloading_configured } } */
   for (i = 0; i < N; i++)
     {
       b[i] = 3.0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
index 1ac0b95..8f0144c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include  <openacc.h>
 
@@ -51,7 +51,7 @@ int test_kernels ()
     ary[i] = ~0;
 
   /* val defaults to copy, ary defaults to copy.  */
-#pragma acc kernels copy(ondev)
+#pragma acc kernels copy(ondev) /* { dg-warning "OpenACC kernels construct 
will be executed sequentially; will by default avoid offloading to prevent data 
copy penalty" "" { target nvptx_offloading_configured } } */
   {
     ondev = acc_on_device (acc_device_not_host);
 #pragma acc loop 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
index e271a37..9a5f7b1 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
@@ -1,5 +1,3 @@
-/* { dg-do run } */
-
 #include <stdlib.h>
 
 int main (void)
@@ -10,7 +8,7 @@ int main (void)
   a = A;
 
 #pragma acc data copyout (a_1, a_2)
-#pragma acc kernels deviceptr (a)
+#pragma acc kernels deviceptr (a) /* { dg-warning "OpenACC kernels construct 
will be executed sequentially; will by default avoid offloading to prevent data 
copy penalty" "" { target nvptx_offloading_configured } } */
   {
     a_1 = a;
     a_2 = &a;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
index 51745ba..3ef6f9b 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
@@ -1,4 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-lcuda -lcublas -lcudart" } */
 
 #include <stdlib.h>
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
index 3acfdf5..614ad33 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
 
@@ -73,7 +73,7 @@ int main (void)
   i = -1;
   j = -2;
   v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin (i, 
j)
+#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin (i, 
j) /* { dg-warning "OpenACC kernels construct will be executed sequentially; 
will by default avoid offloading to prevent data copy penalty" "" { target 
nvptx_offloading_configured } } */
   {
     if (i != -1 || j != -2)
       abort ();
@@ -96,7 +96,7 @@ int main (void)
   i = -1;
   j = -2;
   v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyout 
(i, j)
+#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyout 
(i, j) /* { dg-warning "OpenACC kernels construct will be executed 
sequentially; will by default avoid offloading to prevent data copy penalty" "" 
{ target nvptx_offloading_configured } } */
   {
     i = 2;
     j = 1;
@@ -110,7 +110,7 @@ int main (void)
   i = -1;
   j = -2;
   v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (i, j)
+#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (i, 
j) /* { dg-warning "OpenACC kernels construct will be executed sequentially; 
will by default avoid offloading to prevent data copy penalty" "" { target 
nvptx_offloading_configured } } */
   {
     if (i != -1 || j != -2)
       abort ();
@@ -126,7 +126,7 @@ int main (void)
   i = -1;
   j = -2;
   v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create (i, 
j)
+#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create (i, 
j) /* { dg-warning "OpenACC kernels construct will be executed sequentially; 
will by default avoid offloading to prevent data copy penalty" "" { target 
nvptx_offloading_configured } } */
   {
     i = 2;
     j = 1;
diff --git 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
index 0f323c8..8d5101d 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-O2 -fipa-pta" } */
+/* { dg-additional-options "-fipa-pta" } */
 
 #include <stdlib.h>
 
@@ -11,7 +11,7 @@ main (void)
   unsigned int *b = (unsigned int *)malloc (N * sizeof (unsigned int));
   unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int));
 
-#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N])
+#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) /* { dg-warning "OpenACC 
kernels construct will be executed sequentially; will by default avoid 
offloading to prevent data copy penalty" "" { target 
nvptx_offloading_configured } } */
   {
     a[0] = 0;
     b[0] = 1;
diff --git 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
index 654e750..3726b0c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-O2 -fipa-pta" } */
+/* { dg-additional-options "-fipa-pta" } */
 
 #include <stdlib.h>
 
@@ -11,7 +11,7 @@ main (void)
   unsigned int *b = a;
   unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int));
 
-#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N])
+#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) /* { dg-warning "OpenACC 
kernels construct will be executed sequentially; will by default avoid 
offloading to prevent data copy penalty" "" { target 
nvptx_offloading_configured } } */
   {
     a[0] = 0;
     b[0] = 1;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
index 44d4fd2..eea4f76 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-O2 -fipa-pta" } */
+/* { dg-additional-options "-fipa-pta" } */
 
 #include <stdlib.h>
 
@@ -11,7 +11,7 @@ main (void)
   unsigned int b[N];
   unsigned int c[N];
 
-#pragma acc kernels pcopyout (a, b, c)
+#pragma acc kernels pcopyout (a, b, c) /* { dg-warning "OpenACC kernels 
construct will be executed sequentially; will by default avoid offloading to 
prevent data copy penalty" "" { target nvptx_offloading_configured } } */
   {
     a[0] = 0;
     b[0] = 1;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
index a68a7cd..860b6da 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
@@ -1,6 +1,6 @@
 int
 main (void)
 {
-#pragma acc kernels
+#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be 
executed sequentially; will by default avoid offloading to prevent data copy 
penalty" "" { target nvptx_offloading_configured } } */
   ;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
index 2e4100f..5cdc200 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
@@ -8,7 +7,7 @@
 unsigned int
 foo (int n, unsigned int *a)
 {
-#pragma acc kernels copy (a[0:N])
+#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct 
will be executed sequentially; will by default avoid offloading to prevent data 
copy penalty" "" { target nvptx_offloading_configured } } */
   {
     a[0] = a[0] + 1;
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
index b3e736b..2e4d4d2 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
@@ -8,8 +7,7 @@
 unsigned int
 foo (int n, unsigned int *a)
 {
-
-#pragma acc kernels copy (a[0:N])
+#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct 
will be executed sequentially; will by default avoid offloading to prevent data 
copy penalty" "" { target nvptx_offloading_configured } } */
   {
     for (int i = 0; i < n; i++)
       a[i] = 1;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
index 8b9affa..5bf00db 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
@@ -8,7 +7,7 @@
 unsigned int
 foo (int n, unsigned int *a)
 {
-#pragma acc kernels copy (a[0:N])
+#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct 
will be executed sequentially; will by default avoid offloading to prevent data 
copy penalty" "" { target nvptx_offloading_configured } } */
   {
     a[0] = 2;
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
index 83d4e7f..d39b667 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
@@ -9,7 +8,7 @@ unsigned int
 foo (int n, unsigned int *a)
 {
   int r;
-#pragma acc kernels copyout(r) copy (a[0:N])
+#pragma acc kernels copyout(r) copy (a[0:N]) /* { dg-warning "OpenACC kernels 
construct will be executed sequentially; will by default avoid offloading to 
prevent data copy penalty" "" { target nvptx_offloading_configured } } */
   {
     r = a[0];
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
index 01d5e5e..bb2e85b 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
@@ -8,7 +7,7 @@
 unsigned int
 foo (int n, unsigned int *a)
 {
-#pragma acc kernels copy (a[0:N])
+#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct 
will be executed sequentially; will by default avoid offloading to prevent data 
copy penalty" "" { target nvptx_offloading_configured } } */
   {
     int r = a[0];
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
index 61d1283..e513827 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
@@ -8,8 +7,7 @@
 unsigned int
 foo (int n, unsigned int *a)
 {
-
-#pragma acc kernels copy (a[0:N])
+#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct 
will be executed sequentially; will by default avoid offloading to prevent data 
copy penalty" "" { target nvptx_offloading_configured } } */
   {
     for (int i = 0; i < n; i++)
       a[i] = 1;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
index f7f04cb..c4791a4 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
@@ -1,4 +1,3 @@
-/* { dg-do run } */
 /* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
@@ -11,7 +10,7 @@ void __attribute__((noinline, noclone))
 foo (int m, int n)
 {
   int i, j;
-  #pragma acc kernels
+  #pragma acc kernels /* { dg-warning "OpenACC kernels construct will be 
executed sequentially; will by default avoid offloading to prevent data copy 
penalty" "" { target nvptx_offloading_configured } } */
   {
 #pragma acc loop collapse(2)
     for (i = 0; i < m; i++)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
index c164598..94a5ae2 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 
 #include <stdlib.h>
 
diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f 
libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
new file mode 100644
index 0000000..5f18b94
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
@@ -0,0 +1,32 @@
+! Test that the compiler decides to "avoid offloading".
+
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! The "avoid offloading" warning is only triggered for -O2 and higher.
+! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
+! The ACC_DEVICE_TYPE environment variable gets set in the testing
+! framework, and that overrides the "avoid offloading" flag at run time.
+! { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } }
+
+      IMPLICIT NONE
+      INCLUDE "openacc_lib.h"
+
+      INTEGER, VOLATILE :: X
+      LOGICAL :: Y
+
+!$ACC DATA COPYOUT(X, Y)
+!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed 
sequentially; will by default avoid offloading to prevent data copy penalty" "" 
{ target nvptx_offloading_configured } }
+      X = 33
+      Y = ACC_ON_DEVICE (ACC_DEVICE_HOST);
+!$ACC END KERNELS
+!$ACC END DATA
+
+      IF (X .NE. 33) CALL ABORT
+#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia
+      IF (.NOT. Y) CALL ABORT
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+
+      END
diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f 
libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
new file mode 100644
index 0000000..51801ad
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
@@ -0,0 +1,41 @@
+! Test that a user can override the compiler's "avoid offloading"
+! decision at run time.
+
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! The "avoid offloading" warning is only triggered for -O2 and higher.
+! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
+
+      IMPLICIT NONE
+      INCLUDE "openacc_lib.h"
+
+      INTEGER :: D
+      INTEGER, VOLATILE :: X
+      LOGICAL :: Y
+
+!     Override the compiler's "avoid offloading" decision.
+#if defined ACC_DEVICE_TYPE_nvidia
+      D = ACC_DEVICE_NVIDIA
+#elif defined ACC_DEVICE_TYPE_host
+      D = ACC_DEVICE_HOST
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+      CALL ACC_INIT (D)
+
+!$ACC DATA COPYOUT(X, Y)
+!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed 
sequentially; will by default avoid offloading to prevent data copy penalty" "" 
{ target nvptx_offloading_configured } }
+      X = 33
+      Y = ACC_ON_DEVICE (ACC_DEVICE_HOST)
+!$ACC END KERNELS
+!$ACC END DATA
+
+      IF (X .NE. 33) CALL ABORT
+#if defined ACC_DEVICE_TYPE_nvidia
+      IF (Y) CALL ABORT
+#else
+      IF (.NOT. Y) CALL ABORT
+#endif
+
+      END
diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f 
libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
new file mode 100644
index 0000000..bea6ab8
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
@@ -0,0 +1,31 @@
+! Test that a user can override the compiler's "avoid offloading"
+! decision at compile time.
+
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! Override the compiler's "avoid offloading" decision.
+! { dg-additional-options "-foffload-force" }
+
+      IMPLICIT NONE
+      INCLUDE "openacc_lib.h"
+
+      INTEGER :: D
+      INTEGER, VOLATILE :: X
+      LOGICAL :: Y
+
+!$ACC DATA COPYOUT(X, Y)
+!$ACC KERNELS
+      X = 33
+      Y = ACC_ON_DEVICE (ACC_DEVICE_HOST)
+!$ACC END KERNELS
+!$ACC END DATA
+
+      IF (X .NE. 33) CALL ABORT
+#if defined ACC_DEVICE_TYPE_nvidia
+      IF (Y) CALL ABORT
+#else
+      IF (.NOT. Y) CALL ABORT
+#endif
+
+      END
diff --git libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 
libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
index 94100b2..4b52579 100644
--- libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
@@ -1,6 +1,9 @@
 ! This test exercises combined directives.
 
 ! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! The "avoid offloading" warning is only triggered for -O2 and higher.
+! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
 
 program main
   integer, parameter :: n = 32
@@ -27,7 +30,7 @@ program main
   !$acc kernels loop copy (a(1:n)) copy (b(1:n))
   do i = 1, n
     b(i) = 3.0;
-    a(i) = a(i) + b(i)
+    a(i) = a(i) + b(i) ! { dg-warning "OpenACC kernels construct will be 
executed sequentially; will by default avoid offloading to prevent data copy 
penalty" "" { target nvptx_offloading_configured } }
   end do
 
   do i = 1, n
diff --git libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 
libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
index 4afb562..b9298c7 100644
--- libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
@@ -2,6 +2,9 @@
 ! offloaded regions are properly mapped using present_or_copy.
 
 ! { dg-do run }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! The "avoid offloading" warning is only triggered for -O2 and higher.
+! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
 
 program main
   implicit none
@@ -30,7 +33,7 @@ subroutine kernels (array, n)
   integer, dimension (n) :: array
   integer :: n, i
 
-  !$acc kernels
+  !$acc kernels ! { dg-warning "OpenACC kernels construct will be executed 
sequentially; will by default avoid offloading to prevent data copy penalty" "" 
{ target nvptx_offloading_configured } }
   do i = 1, n
      array(i) = i
   end do


Grüße
 Thomas

Reply via email to