Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-17 Thread Thomas Schwinge
Hi Chung-Lin!

On Fri, 14 Dec 2018 22:28:58 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/13 11:51 PM, Thomas Schwinge wrote:
> > On Thu, 13 Dec 2018 23:28:49 +0800, Chung-Lin 
> > Tang  wrote:
> >> On 2018/12/7 6:26 AM, Julian Brown wrote:
> >>> On Thu, 6 Dec 2018 22:22:46 +
> >>> Julian Brown  wrote:
> >>>
>  On Thu, 6 Dec 2018 21:42:14 +0100
>  Thomas Schwinge  wrote:
> 
> > [...]
> > ..., where the "Invalid read of size 8" happens, and which
> > eventually would try to "free (tgt)" again, via
> > libgomp/target.c:gomp_unmap_tgt:
> >
> >   attribute_hidden void
> >   gomp_unmap_tgt (struct target_mem_desc *tgt)
> >   {
> > /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
> > region.  */ if (tgt->tgt_end)
> >   gomp_free_device_memory (tgt->device_descr, tgt->to_free);
> >   
> > free (tgt->array);
> > free (tgt);
> >   }
> >
> > Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
> > or something else?
> 
> I think I understand the problem now. In gomp_unmap_vars_async(), in the case 
> of
> tgt->list_count == 0 (i.e. no map arguments at all) the code should simply 
> free the tgt
> and return, while the code in goacc_async_copyout_unmap_vars() didn't handle 
> this case
> and always scheduled an asynchronous free of the tgt later, causing that 
> valgrind error
> you see.
> 
> I am still testing the attached patch, but I think it is the right fix: I 
> reviewed what I
> wrote and it seemed the way I organized things into a 
> goacc_async_copyout_unmap_vars() routine,
> including the hackish refcount++, etc. is simply unneeded. I have deleted 
> those stuff
> and consolidated things back into gomp_unmap_vars_async().
> 
> I'll update the whole patches later after complete testing, the attached is 
> the patch atop
> of the prior async patches. (the small program you gave above does pass 
> valgrind now)

Thanks, confirmed.


Grüße
 Thomas


> diff -ru trunk-orig/libgomp/oacc-async.c trunk-work/libgomp/oacc-async.c
> --- trunk-orig/libgomp/oacc-async.c   2018-12-14 21:06:06.649794724 +0800
> +++ trunk-work/libgomp/oacc-async.c   2018-12-14 22:11:29.252251925 +0800
> @@ -238,31 +238,6 @@
>thr->default_async = async;
>  }
>  
> -static void
> -goacc_async_unmap_tgt (void *ptr)
> -{
> -  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
> -
> -  if (tgt->refcount > 1)
> -tgt->refcount--;
> -  else
> -gomp_unmap_tgt (tgt);
> -}
> -
> -attribute_hidden void
> -goacc_async_copyout_unmap_vars (struct target_mem_desc *tgt,
> - struct goacc_asyncqueue *aq)
> -{
> -  struct gomp_device_descr *devicep = tgt->device_descr;
> -
> -  /* Increment reference to delay freeing of device memory until callback
> - has triggered.  */
> -  tgt->refcount++;
> -  gomp_unmap_vars_async (tgt, true, aq);
> -  devicep->openacc.async.queue_callback_func (aq, goacc_async_unmap_tgt,
> -   (void *) tgt);
> -}
> -
>  attribute_hidden void
>  goacc_async_free (struct gomp_device_descr *devicep,
> struct goacc_asyncqueue *aq, void *ptr)
> diff -ru trunk-orig/libgomp/oacc-int.h trunk-work/libgomp/oacc-int.h
> --- trunk-orig/libgomp/oacc-int.h 2018-12-14 21:06:06.649794724 +0800
> +++ trunk-work/libgomp/oacc-int.h 2018-12-14 22:11:43.379947915 +0800
> @@ -104,8 +104,6 @@
>  
>  void goacc_init_asyncqueues (struct gomp_device_descr *);
>  bool goacc_fini_asyncqueues (struct gomp_device_descr *);
> -void goacc_async_copyout_unmap_vars (struct target_mem_desc *,
> -  struct goacc_asyncqueue *);
>  void goacc_async_free (struct gomp_device_descr *, struct goacc_asyncqueue *,
>  void *);
>  struct goacc_asyncqueue *get_goacc_asyncqueue (int);
> diff -ru trunk-orig/libgomp/oacc-mem.c trunk-work/libgomp/oacc-mem.c
> --- trunk-orig/libgomp/oacc-mem.c 2018-12-14 21:06:06.649794724 +0800
> +++ trunk-work/libgomp/oacc-mem.c 2018-12-14 22:10:08.325998369 +0800
> @@ -911,7 +911,7 @@
>else
>   {
> goacc_aq aq = get_goacc_asyncqueue (async);
> -   goacc_async_copyout_unmap_vars (t, aq);
> +   gomp_unmap_vars_async (t, true, aq);
>   }
>  }
>  
> diff -ru trunk-orig/libgomp/oacc-parallel.c trunk-work/libgomp/oacc-parallel.c
> --- trunk-orig/libgomp/oacc-parallel.c2018-12-14 21:06:06.649794724 
> +0800
> +++ trunk-work/libgomp/oacc-parallel.c2018-12-14 22:09:51.918353575 
> +0800
> @@ -245,7 +245,7 @@
>  {
>acc_dev->openacc.async.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
>   dims, tgt, aq);
> -  goacc_async_copyout_unmap_vars (tgt, aq);
> +  gomp_unmap_vars_async (tgt, true, aq);
>  }
>  }
>  
> diff -ru trunk-orig/libgomp/target.c trunk-work/libgomp/target.c
> --- trunk-orig/libgomp/target.c   

Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:09:49 +0800, Chung-Lin Tang  
wrote:
> This patch is a re-organization of OpenACC asynchronous queues.

Again, many thanks for that!

In addition to the review emails I just posted, I've also put all that
stuff into a GitHub branch:
.

This also includes some more "into async re-work: replicate [...]"
commits to adjust your work for preparational things that I plan to
commit before.  I split these out intentionally, so that you can easily
see/review these changes.


Grüße
 Thomas


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-14 Thread Chung-Lin Tang

On 2018/12/13 11:51 PM, Thomas Schwinge wrote:

On Thu, 13 Dec 2018 23:28:49 +0800, Chung-Lin Tang  
wrote:

On 2018/12/7 6:26 AM, Julian Brown wrote:

On Thu, 6 Dec 2018 22:22:46 +
Julian Brown  wrote:


On Thu, 6 Dec 2018 21:42:14 +0100
Thomas Schwinge  wrote:


[...]
..., where the "Invalid read of size 8" happens, and which
eventually would try to "free (tgt)" again, via
libgomp/target.c:gomp_unmap_tgt:

  attribute_hidden void
  gomp_unmap_tgt (struct target_mem_desc *tgt)
  {
/* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
region.  */ if (tgt->tgt_end)
  gomp_free_device_memory (tgt->device_descr, tgt->to_free);
  
free (tgt->array);

free (tgt);
  }

Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
or something else?


I think I understand the problem now. In gomp_unmap_vars_async(), in the case of
tgt->list_count == 0 (i.e. no map arguments at all) the code should simply free 
the tgt
and return, while the code in goacc_async_copyout_unmap_vars() didn't handle 
this case
and always scheduled an asynchronous free of the tgt later, causing that 
valgrind error
you see.

I am still testing the attached patch, but I think it is the right fix: I 
reviewed what I
wrote and it seemed the way I organized things into a 
goacc_async_copyout_unmap_vars() routine,
including the hackish refcount++, etc. is simply unneeded. I have deleted those 
stuff
and consolidated things back into gomp_unmap_vars_async().

I'll update the whole patches later after complete testing, the attached is the 
patch atop
of the prior async patches. (the small program you gave above does pass 
valgrind now)

Julian, I didn't try the OG8 refcount changes, it's just too large a set of 
changes to
reason about in so short time, maybe later when we are prepared to fix things 
completely as
you noted what those patches were capable of.

Chung-Lin






diff -ru trunk-orig/libgomp/oacc-async.c trunk-work/libgomp/oacc-async.c
--- trunk-orig/libgomp/oacc-async.c 2018-12-14 21:06:06.649794724 +0800
+++ trunk-work/libgomp/oacc-async.c 2018-12-14 22:11:29.252251925 +0800
@@ -238,31 +238,6 @@
   thr->default_async = async;
 }
 
-static void
-goacc_async_unmap_tgt (void *ptr)
-{
-  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
-
-  if (tgt->refcount > 1)
-tgt->refcount--;
-  else
-gomp_unmap_tgt (tgt);
-}
-
-attribute_hidden void
-goacc_async_copyout_unmap_vars (struct target_mem_desc *tgt,
-   struct goacc_asyncqueue *aq)
-{
-  struct gomp_device_descr *devicep = tgt->device_descr;
-
-  /* Increment reference to delay freeing of device memory until callback
- has triggered.  */
-  tgt->refcount++;
-  gomp_unmap_vars_async (tgt, true, aq);
-  devicep->openacc.async.queue_callback_func (aq, goacc_async_unmap_tgt,
- (void *) tgt);
-}
-
 attribute_hidden void
 goacc_async_free (struct gomp_device_descr *devicep,
  struct goacc_asyncqueue *aq, void *ptr)
diff -ru trunk-orig/libgomp/oacc-int.h trunk-work/libgomp/oacc-int.h
--- trunk-orig/libgomp/oacc-int.h   2018-12-14 21:06:06.649794724 +0800
+++ trunk-work/libgomp/oacc-int.h   2018-12-14 22:11:43.379947915 +0800
@@ -104,8 +104,6 @@
 
 void goacc_init_asyncqueues (struct gomp_device_descr *);
 bool goacc_fini_asyncqueues (struct gomp_device_descr *);
-void goacc_async_copyout_unmap_vars (struct target_mem_desc *,
-struct goacc_asyncqueue *);
 void goacc_async_free (struct gomp_device_descr *, struct goacc_asyncqueue *,
   void *);
 struct goacc_asyncqueue *get_goacc_asyncqueue (int);
diff -ru trunk-orig/libgomp/oacc-mem.c trunk-work/libgomp/oacc-mem.c
--- trunk-orig/libgomp/oacc-mem.c   2018-12-14 21:06:06.649794724 +0800
+++ trunk-work/libgomp/oacc-mem.c   2018-12-14 22:10:08.325998369 +0800
@@ -911,7 +911,7 @@
   else
{
  goacc_aq aq = get_goacc_asyncqueue (async);
- goacc_async_copyout_unmap_vars (t, aq);
+ gomp_unmap_vars_async (t, true, aq);
}
 }
 
diff -ru trunk-orig/libgomp/oacc-parallel.c trunk-work/libgomp/oacc-parallel.c
--- trunk-orig/libgomp/oacc-parallel.c  2018-12-14 21:06:06.649794724 +0800
+++ trunk-work/libgomp/oacc-parallel.c  2018-12-14 22:09:51.918353575 +0800
@@ -245,7 +245,7 @@
 {
   acc_dev->openacc.async.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
dims, tgt, aq);
-  goacc_async_copyout_unmap_vars (tgt, aq);
+  gomp_unmap_vars_async (tgt, true, aq);
 }
 }
 
diff -ru trunk-orig/libgomp/target.c trunk-work/libgomp/target.c
--- trunk-orig/libgomp/target.c 2018-12-14 21:06:06.653794622 +0800
+++ trunk-work/libgomp/target.c 2018-12-14 20:42:03.629154346 +0800
@@ -1072,6 +1072,17 @@
   return is_tgt_unmapped;
 }
 
+static void
+gomp_unref_tgt (void *ptr)
+{
+  struct target_mem_desc *tgt = 

Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

A little bit of documentation starter update for you to include.  Please
make sure that all relevant functions have such comments addded.

commit 7e0896281d155e1544751f43c1eaace8e005e019
Author: Thomas Schwinge 
Date:   Thu Dec 13 17:59:46 2018 +0100

[WIP] into async re-work: documentation
---
 libgomp/libgomp.h | 3 +++
 libgomp/oacc-async.c  | 7 +++
 libgomp/plugin/plugin-nvptx.c | 4 ++--
 libgomp/target.c  | 3 +++
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git libgomp/libgomp.h libgomp/libgomp.h
index 8b74d6368389..574fcd1ee4ad 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -949,6 +949,9 @@ typedef struct acc_dispatch_t
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
   struct {
+/* Once created and put into the "active" list, asyncqueues are then never
+   destructed and removed from the "active" list, other than if the TODO
+   device is shut down.  */
 gomp_mutex_t lock;
 int nasyncqueue;
 struct goacc_asyncqueue **asyncqueue;
diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index b091ba2460ac..0f5f74bdf836 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -280,6 +280,10 @@ goacc_async_free (struct gomp_device_descr *devicep,
 devicep->openacc.async.queue_callback_func (aq, free, ptr);
 }
 
+/* This function initializes the asyncqueues for the device specified by
+   DEVICEP.  TODO DEVICEP must be locked on entry, and remains locked on
+   return.  */
+
 attribute_hidden void
 goacc_init_asyncqueues (struct gomp_device_descr *devicep)
 {
@@ -289,6 +293,9 @@ goacc_init_asyncqueues (struct gomp_device_descr *devicep)
   devicep->openacc.async.active = NULL;
 }
 
+/* This function finalizes the asyncqueues for the device specified by DEVICEP.
+   TODO DEVICEP must be locked on entry, and remains locked on return.  */
+
 attribute_hidden bool
 goacc_fini_asyncqueues (struct gomp_device_descr *devicep)
 {
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 7b658264b8e7..577ed39ef3f6 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1340,14 +1340,14 @@ GOMP_OFFLOAD_openacc_cuda_get_current_context (void)
   return nvptx_get_current_cuda_context ();
 }
 
-/* NOTE: This returns a CUstream, not a ptx_stream pointer.  */
+/* This returns a CUstream.  */
 void *
 GOMP_OFFLOAD_openacc_cuda_get_stream (struct goacc_asyncqueue *aq)
 {
   return (void *) aq->cuda_stream;
 }
 
-/* NOTE: This takes a CUstream, not a ptx_stream pointer.  */
+/* This takes a CUstream.  */
 int
 GOMP_OFFLOAD_openacc_cuda_set_stream (struct goacc_asyncqueue *aq, void 
*stream)
 {
diff --git libgomp/target.c libgomp/target.c
index e67d9248ae0b..96df1890a729 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -1506,6 +1506,9 @@ gomp_init_device (struct gomp_device_descr *devicep)
   devicep->state = GOMP_DEVICE_INITIALIZED;
 }
 
+/* This function finalizes the target device, specified by DEVICEP.  DEVICEP
+   must be locked on entry, and remains locked on return.  */
+
 attribute_hidden bool
 gomp_fini_device (struct gomp_device_descr *devicep)
 {


Grüße
 Thomas


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Thu, 06 Dec 2018 21:42:14 +0100, I wrote:
> On Tue, 25 Sep 2018 21:09:49 +0800, Chung-Lin Tang  
> wrote:
> > Also included in this patch is the code for the acc_get/set_default_async 
> > API functions in OpenACC 2.5.
> > It's a minor part of this patch, but since some code was merge together, 
> > I'm submitting it together here.
> 
> As I requested, I'm reviewing those changes separately, and have backed
> out those changes in my working copy.

... as follows:

commit 79b89a5214dc2624a52f0593bbfad5cefed0c025
Author: Thomas Schwinge 
Date:   Thu Dec 6 15:57:46 2018 +0100

into async re-work: revert default_async changes
---
 include/gomp-constants.h   |   1 -
 libgomp/libgomp.map|   4 -
 libgomp/oacc-async.c   |  19 +-
 libgomp/oacc-init.c|   2 -
 libgomp/oacc-int.h |   3 -
 libgomp/openacc.f90|  22 +-
 libgomp/openacc.h  |   3 -
 libgomp/openacc_lib.h  |  13 -
 .../libgomp.oacc-c-c++-common/asyncwait-2.c| 904 -
 9 files changed, 2 insertions(+), 969 deletions(-)

diff --git include/gomp-constants.h include/gomp-constants.h
index acd25851bcc7..1021306ed661 100644
--- include/gomp-constants.h
+++ include/gomp-constants.h
@@ -160,7 +160,6 @@ enum gomp_map_kind
 /* Asynchronous behavior.  Keep in sync with
libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
 
-#define GOMP_ASYNC_DEFAULT 0
 #define GOMP_ASYNC_NOVAL   -1
 #define GOMP_ASYNC_SYNC-2
 
diff --git libgomp/libgomp.map libgomp/libgomp.map
index c5e1b876fccd..d2381da3bf07 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -464,12 +464,8 @@ OACC_2.5 {
acc_delete_finalize_async_32_h_;
acc_delete_finalize_async_64_h_;
acc_delete_finalize_async_array_h_;
-   acc_get_default_async;
-   acc_get_default_async_h_;
acc_memcpy_from_device_async;
acc_memcpy_to_device_async;
-   acc_set_default_async;
-   acc_set_default_async_h_;
acc_update_device_async;
acc_update_device_async_32_h_;
acc_update_device_async_64_h_;
diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 68aaf199a27e..553082fe3d4a 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -60,7 +60,7 @@ lookup_goacc_asyncqueue (struct goacc_thread *thr, bool 
create, int async)
   /* The special value acc_async_noval (-1) maps to the thread-specific
  default async stream.  */
   if (async == acc_async_noval)
-async = thr->default_async;
+async = 0; //TODO thr->default_async;
 
   if (async == acc_async_sync)
 return NULL;
@@ -221,23 +221,6 @@ acc_wait_all_async (int async)
   gomp_mutex_unlock (>dev->openacc.async.lock);
 }
 
-int
-acc_get_default_async (void)
-{
-  struct goacc_thread *thr = get_goacc_thread ();
-  return thr->default_async;
-}
-
-void
-acc_set_default_async (int async)
-{
-  if (async < acc_async_sync)
-gomp_fatal ("invalid async argument: %d", async);
-
-  struct goacc_thread *thr = get_goacc_thread ();
-  thr->default_async = async;
-}
-
 static void
 goacc_async_unmap_tgt (void *ptr)
 {
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 2c2f91ce3c2c..c40f48829078 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -426,8 +426,6 @@ goacc_attach_host_thread_to_device (int ord)
   
   thr->target_tls
 = acc_dev->openacc.create_thread_data_func (ord);
-
-  thr->default_async = acc_async_default;
 }
 
 /* OpenACC 2.0a (3.2.12, 3.2.13) doesn't specify whether the serialization of
diff --git libgomp/oacc-int.h libgomp/oacc-int.h
index 3354eb654ce9..97f3fc8a61ed 100644
--- libgomp/oacc-int.h
+++ libgomp/oacc-int.h
@@ -73,9 +73,6 @@ struct goacc_thread
 
   /* Target-specific data (used by plugin).  */
   void *target_tls;
-
-  /* Default OpenACC async queue for current thread, exported to plugin.  */
-  int default_async;
 };
 
 #if defined HAVE_TLS || defined USE_EMUTLS
diff --git libgomp/openacc.f90 libgomp/openacc.f90
index 7d31ee689479..7c809fe00738 100644
--- libgomp/openacc.f90
+++ libgomp/openacc.f90
@@ -51,10 +51,9 @@ module openacc_kinds
 
   integer, parameter :: acc_handle_kind = int32
 
-  public :: acc_async_default, acc_async_noval, acc_async_sync
+  public :: acc_async_noval, acc_async_sync
 
   ! Keep in sync with include/gomp-constants.h.
-  integer (acc_handle_kind), parameter :: acc_async_default = 0
   integer (acc_handle_kind), parameter :: acc_async_noval = -1
   integer (acc_handle_kind), parameter :: acc_async_sync = -2
 
@@ -93,16 +92,6 @@ module openacc_internal
   integer (acc_device_kind) d
 end function
 
-subroutine acc_set_default_async_h (a)
-  import
-  integer a
-end subroutine
-
-function acc_get_default_async_h ()
-  import
-  integer 

Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-13 Thread Thomas Schwinge
Hi!

On Thu, 13 Dec 2018 23:28:49 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/7 6:26 AM, Julian Brown wrote:
> > On Thu, 6 Dec 2018 22:22:46 +
> > Julian Brown  wrote:
> > 
> >> On Thu, 6 Dec 2018 21:42:14 +0100
> >> Thomas Schwinge  wrote:
> >>
> >>> [...]
> >>> ..., where the "Invalid read of size 8" happens, and which
> >>> eventually would try to "free (tgt)" again, via
> >>> libgomp/target.c:gomp_unmap_tgt:
> >>>
> >>>  attribute_hidden void
> >>>  gomp_unmap_tgt (struct target_mem_desc *tgt)
> >>>  {
> >>>/* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
> >>> region.  */ if (tgt->tgt_end)
> >>>  gomp_free_device_memory (tgt->device_descr, tgt->to_free);
> >>>  
> >>>free (tgt->array);
> >>>free (tgt);
> >>>  }
> >>>
> >>> Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
> >>> or something else?
> >>
> >> It might be worth trying this with the refcounting changes in the
> >> attach/detach patch.

Well, which exactly?

> > ...oh, also make sure you have this patch in the series you're testing
> > with:
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01973.html
> > 
> > else your "wait" will be ignored, IIUC.

Thanks, and right, and yes, I got that one included.

> just first asking if you tried Julian's patch during this time, and if so did 
> it do anything different?

I did not test with all the attach/detach patches on top of this one
here.  That's too many changes at once.


Grüße
 Thomas


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-13 Thread Chung-Lin Tang

On 2018/12/7 6:26 AM, Julian Brown wrote:

On Thu, 6 Dec 2018 22:22:46 +
Julian Brown  wrote:


On Thu, 6 Dec 2018 21:42:14 +0100
Thomas Schwinge  wrote:


[...]
..., where the "Invalid read of size 8" happens, and which
eventually would try to "free (tgt)" again, via
libgomp/target.c:gomp_unmap_tgt:

 attribute_hidden void
 gomp_unmap_tgt (struct target_mem_desc *tgt)
 {
   /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
region.  */ if (tgt->tgt_end)
 gomp_free_device_memory (tgt->device_descr, tgt->to_free);
 
   free (tgt->array);

   free (tgt);
 }

Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
or something else?


It might be worth trying this with the refcounting changes in the
attach/detach patch.


...oh, also make sure you have this patch in the series you're testing
with:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01973.html

else your "wait" will be ignored, IIUC.

Julian


Hi Thomas,
just first asking if you tried Julian's patch during this time, and if so did 
it do anything different?
(and apologies for missing responding this part for so long :P )

Chung-Lin


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-06 Thread Julian Brown
On Thu, 6 Dec 2018 22:22:46 +
Julian Brown  wrote:

> On Thu, 6 Dec 2018 21:42:14 +0100
> Thomas Schwinge  wrote:
> 
> > [...]
> > ..., where the "Invalid read of size 8" happens, and which
> > eventually would try to "free (tgt)" again, via
> > libgomp/target.c:gomp_unmap_tgt:
> > 
> > attribute_hidden void
> > gomp_unmap_tgt (struct target_mem_desc *tgt)
> > {
> >   /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
> > region.  */ if (tgt->tgt_end)
> > gomp_free_device_memory (tgt->device_descr, tgt->to_free);
> > 
> >   free (tgt->array);
> >   free (tgt);
> > }
> > 
> > Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
> > or something else?  
> 
> It might be worth trying this with the refcounting changes in the
> attach/detach patch.

...oh, also make sure you have this patch in the series you're testing
with:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01973.html

else your "wait" will be ignored, IIUC.

Julian


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-06 Thread Julian Brown
On Thu, 6 Dec 2018 21:42:14 +0100
Thomas Schwinge  wrote:

> [...]
> ..., where the "Invalid read of size 8" happens, and which eventually
> would try to "free (tgt)" again, via libgomp/target.c:gomp_unmap_tgt:
> 
> attribute_hidden void
> gomp_unmap_tgt (struct target_mem_desc *tgt)
> {
>   /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
> region.  */ if (tgt->tgt_end)
> gomp_free_device_memory (tgt->device_descr, tgt->to_free);
> 
>   free (tgt->array);
>   free (tgt);
> }
> 
> Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
> or something else?

It might be worth trying this with the refcounting changes in the
attach/detach patch.

Julian


Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-06 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:09:49 +0800, Chung-Lin Tang  
wrote:
> This patch is a re-organization of OpenACC asynchronous queues.

Thanks!

> The previous style of implementation
> was essentially re-defining the entire async API inside the plugin-interface, 
> and relaying all such
> API calls to the target plugin, which is awkward in design; it requires 
> (each) target plugin to
> essentially re-implement large portions of the async functionality to support 
> OpenACC, and the
> way it uses a state-setting style to "select/de-select" asynchronous queues 
> for operations litters
> a lot of code paths.
> 
> The new design proposed here in this patch declares a "struct 
> goacc_asyncqueue*" opaque type in libgomp.h,
> and re-defines the plugin interface to a few operations (e.g. 
> construct/destruct/test/synchronize/etc.)
> on this async-queue type, all details are target-dependent inside the 
> specific plugin/plugin-.c file.

Conceptually, ACK.


> Also included in this patch is the code for the acc_get/set_default_async API 
> functions in OpenACC 2.5.
> It's a minor part of this patch, but since some code was merge together, I'm 
> submitting it together here.

As I requested, I'm reviewing those changes separately, and have backed
out those changes in my working copy.


> Testing has been done with offloading enabled. The results are mostly okay, 
> but with a few issues
> with either yet incomplete submission of our testsuite adjustment patches, or 
> other independent problems.

We'll need to understand these.  


> Seeking permission to commit this to trunk first.

A few things will need to be clarified.


For example, for the simple program:

int main(void)
{
#pragma acc parallel async(1)
  ;
#pragma acc wait

  return 0;
}

..., I'm seeing memory corruption, which (oaccasionally...) shows up as
an abort due to "free" complaining, but also reproduces more reliably
with "valgrind".  It also reproduces on openacc-gcc-8-branch:

$ valgrind ./a.out
[...]
==26392== Invalid read of size 8
==26392==at 0x4E653B0: goacc_async_unmap_tgt (oacc-async.c:368)
==26392==by 0x5C90901: cuda_callback_wrapper (plugin-nvptx.c:1648)
==26392==by 0x6066B8D: ??? (in 
/usr/lib/x86_64-linux-gnu/libcuda.so.390.77)
==26392==by 0x607A10F: ??? (in 
/usr/lib/x86_64-linux-gnu/libcuda.so.390.77)
==26392==by 0x50816DA: start_thread (pthread_create.c:463)
==26392==by 0x53BA88E: clone (clone.S:95)
==26392==  Address 0x8d19f50 is 0 bytes inside a block of size 64 free'd
==26392==at 0x4C30D3B: free (vg_replace_malloc.c:530)
==26392==by 0x4E65BEE: goacc_async_copyout_unmap_vars (oacc-async.c:383)
==26392==by 0x4E607C9: GOACC_parallel_keyed_internal 
(oacc-parallel.c:403)
==26392==by 0x4E60EAA: GOACC_parallel_keyed_v2 (oacc-parallel.c:439)
==26392==by 0x40094F: ??? (in [...]/a.out)
==26392==by 0x52BAB96: (below main) (libc-start.c:310)
==26392==  Block was alloc'd at
==26392==at 0x4C2FB0F: malloc (vg_replace_malloc.c:299)
==26392==by 0x4E47538: gomp_malloc (alloc.c:37)
==26392==by 0x4E5AEEB: gomp_map_vars_async (target.c:731)
==26392==by 0x4E60C2B: GOACC_parallel_keyed_internal 
(oacc-parallel.c:345)
==26392==by 0x4E60EAA: GOACC_parallel_keyed_v2 (oacc-parallel.c:439)
==26392==by 0x40094F: ??? (in [...]/a.out)
==26392==by 0x52BAB96: (below main) (libc-start.c:310)
[...]

Per my understanding, the problem is that, called from
libgomp/oacc-async.c:goacc_async_copyout_unmap_vars,
libgomp/target.c:gomp_unmap_vars_async runs into:

  if (tgt->list_count == 0)
{
  free (tgt);
  return;
}

..., and then goacc_async_copyout_unmap_vars does:

  devicep->openacc.async.queue_callback_func (aq, goacc_async_unmap_tgt,
  (void *) tgt);

..., which will then call libgomp/oacc-async.c:goacc_async_unmap_tgt:

static void
goacc_async_unmap_tgt (void *ptr)
{
  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;

  if (tgt->refcount > 1)
tgt->refcount--;
  else
gomp_unmap_tgt (tgt);
}

..., where the "Invalid read of size 8" happens, and which eventually
would try to "free (tgt)" again, via libgomp/target.c:gomp_unmap_tgt:

attribute_hidden void
gomp_unmap_tgt (struct target_mem_desc *tgt)
{
  /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end region.  */
  if (tgt->tgt_end)
gomp_free_device_memory (tgt->device_descr, tgt->to_free);

  free (tgt->array);
  free (tgt);
}

Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong, or
something else?


Grüße
 Thomas


[PATCH 0/6, OpenACC, libgomp] Async re-work

2018-09-25 Thread Chung-Lin Tang

This patch is a re-organization of OpenACC asynchronous queues. The previous 
style of implementation
was essentially re-defining the entire async API inside the plugin-interface, 
and relaying all such
API calls to the target plugin, which is awkward in design; it requires (each) 
target plugin to
essentially re-implement large portions of the async functionality to support 
OpenACC, and the
way it uses a state-setting style to "select/de-select" asynchronous queues for 
operations litters
a lot of code paths.

The new design proposed here in this patch declares a "struct 
goacc_asyncqueue*" opaque type in libgomp.h,
and re-defines the plugin interface to a few operations (e.g. 
construct/destruct/test/synchronize/etc.)
on this async-queue type, all details are target-dependent inside the specific 
plugin/plugin-.c file.

Also included in this patch is the code for the acc_get/set_default_async API 
functions in OpenACC 2.5.
It's a minor part of this patch, but since some code was merge together, I'm 
submitting it together here.

Testing has been done with offloading enabled. The results are mostly okay, but 
with a few issues
with either yet incomplete submission of our testsuite adjustment patches, or 
other independent problems.
Seeking permission to commit this to trunk first.

Thanks,
Chung-Lin

2018-09-25  Chung-Lin Tang  

include/
* gomp-constants.h (GOMP_ASYNC_DEFAULT): Define.
(GOMP_VERSION): Increment for new plugin interface changes.

libgomp/
* libgomp-plugin.h (struct goacc_asyncqueue): Declare.
(struct goacc_asyncqueue_list): Likewise.
(goacc_aq): Likewise.
(goacc_aq_list): Likewise.
(GOMP_OFFLOAD_openacc_register_async_cleanup): Remove.
(GOMP_OFFLOAD_openacc_async_test): Remove.
(GOMP_OFFLOAD_openacc_async_test_all): Remove.
(GOMP_OFFLOAD_openacc_async_wait): Remove.
(GOMP_OFFLOAD_openacc_async_wait_async): Remove.
(GOMP_OFFLOAD_openacc_async_wait_all): Remove.
(GOMP_OFFLOAD_openacc_async_wait_all_async): Remove.
(GOMP_OFFLOAD_openacc_async_set_async): Remove.
(GOMP_OFFLOAD_openacc_exec): Adjust declaration.
(GOMP_OFFLOAD_openacc_cuda_get_stream): Likewise.
(GOMP_OFFLOAD_openacc_cuda_set_stream): Likewise.

(GOMP_OFFLOAD_openacc_async_exec): Declare.
(GOMP_OFFLOAD_openacc_async_construct): Declare.
(GOMP_OFFLOAD_openacc_async_destruct): Declare.
(GOMP_OFFLOAD_openacc_async_test): Declare.
(GOMP_OFFLOAD_openacc_async_synchronize): Declare.
(GOMP_OFFLOAD_openacc_async_serialize): Declare.
(GOMP_OFFLOAD_openacc_async_queue_callback): Declare.
(GOMP_OFFLOAD_openacc_async_host2dev): Declare.
(GOMP_OFFLOAD_openacc_async_dev2host): Declare.

* libgomp.h (struct acc_dispatch_t): Define 'async' sub-struct.
(gomp_acc_insert_pointer): Adjust declaration.
(gomp_copy_host2dev): New declaration.
(gomp_copy_dev2host): Likewise.
(gomp_map_vars_async): Likewise.
(gomp_unmap_tgt): Likewise.
(gomp_unmap_vars_async): Likewise.
(gomp_fini_device): Likewise.

* libgomp.map (OACC_2.5): Add acc_get_default_async,
acc_get_default_async_h_, acc_set_default_async, and
acc_set_default_async_h_.
(GOMP_PLUGIN_1.0): Remove GOMP_PLUGIN_async_unmap_vars.

* oacc-async.c (get_goacc_thread): New function.
(get_goacc_thread_device): New function.
(lookup_goacc_asyncqueue): New function.
(get_goacc_asyncqueue): New function.
(acc_async_test): Adjust code to use new async design.
(acc_async_test_all): Likewise.
(acc_wait): Likewise.
(acc_wait_async): Likewise.
(acc_wait_all): Likewise.
(acc_wait_all_async): Likewise.
(acc_get_default_async): New API function.
(acc_set_default_async): Likewise.
(goacc_async_unmap_tgt): New function.
(goacc_async_copyout_unmap_vars): Likewise.
(goacc_async_free): Likewise.
(goacc_init_asyncqueues): Likewise.
(goacc_fini_asyncqueues): Likewise.
* oacc-cuda.c (acc_get_cuda_stream): Adjust code to use new async
design.
(acc_set_cuda_stream): Likewise.
* oacc-host.c (host_openacc_exec): Adjust parameters, remove 'async'.
(host_openacc_register_async_cleanup): Remove.
(host_openacc_async_exec): New function.
(host_openacc_async_test): Adjust parameters.
(host_openacc_async_test_all): Remove.
(host_openacc_async_wait): Remove.
(host_openacc_async_wait_async): Remove.
(host_openacc_async_wait_all): Remove.
(host_openacc_async_wait_all_async): Remove.
(host_openacc_async_set_async): Remove.
(host_openacc_async_synchronize): New function.
(host_openacc_async_serialize): New function.
(host_openacc_async_host2dev): New function.