Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v2)

2019-01-07 Thread Thomas Schwinge
Hi Chung-Lin!

On Wed, 2 Jan 2019 20:46:12 +0800, Chung-Lin Tang  
wrote:
> Hi Thomas, Happy New Year,

Thanks!  If I remember right, you still have a few weeks until "your" New
Year/Spring Festival, right?


> On 2018/12/19 5:03 AM, Thomas Schwinge wrote:
> >> +
> >> +  if (!dev->openacc.async.asyncqueue[async])
> >> +{
> >> +  dev->openacc.async.asyncqueue[async] = 
> >> dev->openacc.async.construct_func ();
> >> +
> >> +  if (!dev->openacc.async.asyncqueue[async])
> >> +  {
> >> +gomp_mutex_unlock (>openacc.async.lock);
> >> +gomp_fatal ("async %d creation failed", async);
> >> +  }
> > That will now always fail for host fallback, where
> > "host_openacc_async_construct" just always does "return NULL".
> > 
> > Actually, if the device doesn't support asyncqueues, this whole function
> > should turn into some kind of no-op, so that we don't again and again try
> > to create a new one for every call to "lookup_goacc_asyncqueue".
> > 
> > I'm attaching one possible solution.  I think it's fine to assume that
> > the majority of devices will support asyncqueues, and for those that
> > don't, this is just a one-time overhead per async-argument.  So, no
> > special handling required in "lookup_goacc_asyncqueue".
> 
> > --- a/libgomp/oacc-host.c
> > +++ b/libgomp/oacc-host.c
> > @@ -212,7 +212,8 @@ host_openacc_async_queue_callback (struct 
> > goacc_asyncqueue *aq
> >   static struct goacc_asyncqueue *
> >   host_openacc_async_construct (void)
> >   {
> > -  return NULL;
> > +  /* We have to return non-NULL here, but it's OK to use a dummy.  */
> > +  return (struct goacc_asyncqueue *) -1;
> >   }
> 
> I'm not sure I understand the meaning of this? Is there any use to 
> segfaulting somewhere else
> due to this 0x... pointer?

There will be no such dereferencing (and thus no such segfault), as you
(quite nicely!) made this is an opaque data type to the generic code.
The concrete type is specific to, and only ever dereferenced inside each
plugin, and the "host plugin" never dereferences it, so returning minus
one here only serves as a non-NULL value/identifier to the generic code.

> A feature of a NULL asyncqueue should mean that it is simply synchronous

OK, then that should be documented, and as I mentioned above, the
"lookup" code be adjusted so that it doesn't again and again try to
create an asyncqueue when the "construct" function returns NULL.

> however this does somewhat
> conflict with the case of async.construct_func() returning NULL on error...
> 
> Perhaps, again using an explicit success code as the return value (and return 
> asyncqueue using
> an out parameter)?

Sure, that's also fine.  I just did it as presented above, because of its
simplicity, and to avoid adjusting the "lookup" code, as mentioned above.


Grüße
 Thomas


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v2)

2019-01-02 Thread Chung-Lin Tang

Hi Thomas, Happy New Year,

On 2018/12/19 5:03 AM, Thomas Schwinge wrote:

+
+  if (!dev->openacc.async.asyncqueue[async])
+{
+  dev->openacc.async.asyncqueue[async] = dev->openacc.async.construct_func 
();
+
+  if (!dev->openacc.async.asyncqueue[async])
+   {
+ gomp_mutex_unlock (>openacc.async.lock);
+ gomp_fatal ("async %d creation failed", async);
+   }

That will now always fail for host fallback, where
"host_openacc_async_construct" just always does "return NULL".

Actually, if the device doesn't support asyncqueues, this whole function
should turn into some kind of no-op, so that we don't again and again try
to create a new one for every call to "lookup_goacc_asyncqueue".

I'm attaching one possible solution.  I think it's fine to assume that
the majority of devices will support asyncqueues, and for those that
don't, this is just a one-time overhead per async-argument.  So, no
special handling required in "lookup_goacc_asyncqueue".



--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -212,7 +212,8 @@ host_openacc_async_queue_callback (struct goacc_asyncqueue 
*aq
  static struct goacc_asyncqueue *
  host_openacc_async_construct (void)
  {
-  return NULL;
+  /* We have to return non-NULL here, but it's OK to use a dummy.  */
+  return (struct goacc_asyncqueue *) -1;
  }


I'm not sure I understand the meaning of this? Is there any use to segfaulting 
somewhere else
due to this 0x... pointer?

A feature of a NULL asyncqueue should mean that it is simply synchronous, 
however this does somewhat
conflict with the case of async.construct_func() returning NULL on error...

Perhaps, again using an explicit success code as the return value (and return 
asyncqueue using
an out parameter)?

Chung-Lin


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v2)

2018-12-18 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 18 Dec 2018 23:06:38 +0800, Chung-Lin Tang  
wrote:
> this part includes some of the lookup_goacc_asyncqueue fixes we talked about.
> I am still thinking about how the queue lock problem should really be solved, 
> so regard
> this patch as just fixing some of the problems.

Sure, thanks.

Two comments, though:

> --- libgomp/oacc-async.c  (revision 267226)
> +++ libgomp/oacc-async.c  (working copy)

> +attribute_hidden struct goacc_asyncqueue *
> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
> +{
> +  /* The special value acc_async_noval (-1) maps to the thread-specific
> + default async stream.  */
> +  if (async == acc_async_noval)
> +async = thr->default_async;
> +
> +  if (async == acc_async_sync)
> +return NULL;
> +
> +  if (async < 0)
> +gomp_fatal ("bad async %d", async);
> +
> +  struct gomp_device_descr *dev = thr->dev;
> +
> +  gomp_mutex_lock (>openacc.async.lock);
> +
> +  if (!create
> +  && (async >= dev->openacc.async.nasyncqueue
> +   || !dev->openacc.async.asyncqueue[async]))
> +{
> +  gomp_mutex_unlock (>openacc.async.lock);
> +  return NULL;
> +}
> +
> +  if (async >= dev->openacc.async.nasyncqueue)
> +{
> +  int diff = async + 1 - dev->openacc.async.nasyncqueue;
> +  dev->openacc.async.asyncqueue
> + = gomp_realloc (dev->openacc.async.asyncqueue,
> + sizeof (goacc_aq) * (async + 1));
> +  memset (dev->openacc.async.asyncqueue + dev->openacc.async.nasyncqueue,
> +   0, sizeof (goacc_aq) * diff);
> +  dev->openacc.async.nasyncqueue = async + 1;
> +}
> +
> +  if (!dev->openacc.async.asyncqueue[async])
> +{
> +  dev->openacc.async.asyncqueue[async] = 
> dev->openacc.async.construct_func ();
> +
> +  if (!dev->openacc.async.asyncqueue[async])
> + {
> +   gomp_mutex_unlock (>openacc.async.lock);
> +   gomp_fatal ("async %d creation failed", async);
> + }

That will now always fail for host fallback, where
"host_openacc_async_construct" just always does "return NULL".

Actually, if the device doesn't support asyncqueues, this whole function
should turn into some kind of no-op, so that we don't again and again try
to create a new one for every call to "lookup_goacc_asyncqueue".

I'm attaching one possible solution.  I think it's fine to assume that
the majority of devices will support asyncqueues, and for those that
don't, this is just a one-time overhead per async-argument.  So, no
special handling required in "lookup_goacc_asyncqueue".

> +  /* Link new async queue into active list.  */
> +  goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
> +  n->aq = dev->openacc.async.asyncqueue[async];
> +  n->next = dev->openacc.async.active;
> +  dev->openacc.async.active = n;
> +}
> +  gomp_mutex_unlock (>openacc.async.lock);

You still need to keep "async" locked during...

> +  return dev->openacc.async.asyncqueue[async];

... this dereference.

> +}


Oh, and:

> --- libgomp/oacc-plugin.c (revision 267226)
> +++ libgomp/oacc-plugin.c (working copy)
> @@ -31,14 +31,10 @@
>  #include "oacc-int.h"
>  
>  void
> -GOMP_PLUGIN_async_unmap_vars (void *ptr, int async)
> +GOMP_PLUGIN_async_unmap_vars (void *ptr __attribute__((unused)),
> +   int async __attribute__((unused)))
>  {
> -  struct target_mem_desc *tgt = ptr;
> -  struct gomp_device_descr *devicep = tgt->device_descr;
> -
> -  devicep->openacc.async_set_async_func (async);
> -  gomp_unmap_vars (tgt, true);
> -  devicep->openacc.async_set_async_func (acc_async_sync);
> +  gomp_fatal ("invalid plugin function");
>  }

Please add a comment here, something like: "Obsolete entry point, no
longer used."


Grüße
 Thomas


>From 4cb99c3691f95b6b299e7cb2603af36f723f9e8e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 18 Dec 2018 21:58:41 +0100
Subject: [PATCH] into async re-work: adjust host_openacc_async_construct

---
 libgomp/oacc-host.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 727f8866f45c..cfd8a24f0674 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -212,7 +212,8 @@ host_openacc_async_queue_callback (struct goacc_asyncqueue *aq
 static struct goacc_asyncqueue *
 host_openacc_async_construct (void)
 {
-  return NULL;
+  /* We have to return non-NULL here, but it's OK to use a dummy.  */
+  return (struct goacc_asyncqueue *) -1;
 }
 
 static bool
-- 
2.17.1



Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v2)

2018-12-18 Thread Chung-Lin Tang

On 2018/9/25 9:10 PM, Chung-Lin Tang wrote:

Hi Thomas,
These are the OpenACC specific changes, mostly the re-implementation of 
async-related acc_* runtime
library API functions to use the new backend plugin interfaces, in a non-target 
specific way.



Hi Thomas,
this part includes some of the lookup_goacc_asyncqueue fixes we talked about.
I am still thinking about how the queue lock problem should really be solved, 
so regard
this patch as just fixing some of the problems.


diff -ru trunk-orig/libgomp/oacc-async.c trunk-work/libgomp/oacc-async.c
--- trunk-orig/libgomp/oacc-async.c 2018-12-14 22:11:29.252251925 +0800
+++ trunk-work/libgomp/oacc-async.c 2018-12-18 22:19:51.923102938 +0800
@@ -70,12 +70,16 @@
 
   struct gomp_device_descr *dev = thr->dev;
 
+  gomp_mutex_lock (>openacc.async.lock);
+
   if (!create
   && (async >= dev->openacc.async.nasyncqueue
  || !dev->openacc.async.asyncqueue[async]))
-return NULL;
+{
+  gomp_mutex_unlock (>openacc.async.lock);
+  return NULL;
+}
 
-  gomp_mutex_lock (>openacc.async.lock);
   if (async >= dev->openacc.async.nasyncqueue)
 {
   int diff = async + 1 - dev->openacc.async.nasyncqueue;
@@ -91,6 +95,12 @@
 {
   dev->openacc.async.asyncqueue[async] = dev->openacc.async.construct_func 
();
 
+  if (!dev->openacc.async.asyncqueue[async])
+   {
+ gomp_mutex_unlock (>openacc.async.lock);
+ gomp_fatal ("async %d creation failed", async);
+   }
+  
   /* Link new async queue into active list.  */
   goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
   n->aq = dev->openacc.async.asyncqueue[async];
diff -ru trunk-orig/libgomp/oacc-host.c trunk-work/libgomp/oacc-host.c
--- trunk-orig/libgomp/oacc-host.c  2018-12-14 18:31:07.487203770 +0800
+++ trunk-work/libgomp/oacc-host.c  2018-12-18 22:23:26.771807667 +0800
@@ -266,6 +266,9 @@
 
   .exec_func = host_openacc_exec,
 
+  .create_thread_data_func = host_openacc_create_thread_data,
+  .destroy_thread_data_func = host_openacc_destroy_thread_data,
+
   .async = {
.construct_func = host_openacc_async_construct,
.destruct_func = host_openacc_async_destruct,
@@ -278,9 +281,6 @@
.host2dev_func = host_openacc_async_host2dev,
   },
 
-  .create_thread_data_func = host_openacc_create_thread_data,
-  .destroy_thread_data_func = host_openacc_destroy_thread_data,
-
   .cuda = {
.get_current_device_func = NULL,
.get_current_context_func = NULL,
diff -ru trunk-orig/libgomp/oacc-plugin.c trunk-work/libgomp/oacc-plugin.c
--- trunk-orig/libgomp/oacc-plugin.c2018-12-14 18:31:07.491203745 +0800
+++ trunk-work/libgomp/oacc-plugin.c2018-12-18 22:27:46.047722004 +0800
@@ -30,6 +30,13 @@
 #include "oacc-plugin.h"
 #include "oacc-int.h"
 
+void
+GOMP_PLUGIN_async_unmap_vars (void *ptr __attribute__((unused)),
+ int async __attribute__((unused)))
+{
+  gomp_fatal ("invalid plugin function");
+}
+
 /* Return the target-specific part of the TLS data for the current thread.  */
 
 void *
diff -ru trunk-orig/libgomp/plugin/plugin-nvptx.c 
trunk-work/libgomp/plugin/plugin-nvptx.c
Index: libgomp/oacc-async.c
===
--- libgomp/oacc-async.c(revision 267226)
+++ libgomp/oacc-async.c(working copy)
@@ -27,10 +27,97 @@
.  */
 
 #include 
+#include 
 #include "openacc.h"
 #include "libgomp.h"
 #include "oacc-int.h"
 
+static struct goacc_thread *
+get_goacc_thread (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+gomp_fatal ("no device active");
+
+  return thr;
+}
+
+static struct gomp_device_descr *
+get_goacc_thread_device (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+gomp_fatal ("no device active");
+
+  return thr->dev;
+}
+
+attribute_hidden struct goacc_asyncqueue *
+lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+{
+  /* The special value acc_async_noval (-1) maps to the thread-specific
+ default async stream.  */
+  if (async == acc_async_noval)
+async = thr->default_async;
+
+  if (async == acc_async_sync)
+return NULL;
+
+  if (async < 0)
+gomp_fatal ("bad async %d", async);
+
+  struct gomp_device_descr *dev = thr->dev;
+
+  gomp_mutex_lock (>openacc.async.lock);
+
+  if (!create
+  && (async >= dev->openacc.async.nasyncqueue
+ || !dev->openacc.async.asyncqueue[async]))
+{
+  gomp_mutex_unlock (>openacc.async.lock);
+  return NULL;
+}
+
+  if (async >= dev->openacc.async.nasyncqueue)
+{
+  int diff = async + 1 - dev->openacc.async.nasyncqueue;
+  dev->openacc.async.asyncqueue
+   = gomp_realloc (dev->openacc.async.asyncqueue,
+   sizeof (goacc_aq) * (async + 1));
+  memset (dev->openacc.async.asyncqueue +