On 22/02/2021 1:49 pm, Jakub Jelinek wrote:
I see three issues with the union of completion_sem and detach_team done
that way.

1) while linux --enable-futex and accel gomp_sem_t is small (int), rtems
    and especially posix gomp_sem_t is large; so while it might be a good
    idea to inline gomp_sem_t on config/{linux,accel} into the union, for
    the rest it might be better to use indirection; if it is only for the
    undeferred tasks, it could be just using an automatic variable and
    put into the struct address of that; could be done either always,
    or define some macro in config/{linux,accel}/sem.h that gomp_sem_t is
    small and decide on the indirection based on that macro

I think a pointer to an automatic variable would be simplest.

2) kind == GOMP_TASK_UNDEFERRED is true also for the deferred tasks while
    running the cpyfn callback; guess this could be dealt with making sure
    the detach handling is done only after
       thr->task = task;
       if (cpyfn)
         {
           cpyfn (arg, data);
           task->copy_ctors_done = true;
         }
       else
         memcpy (arg, data, arg_size);
       thr->task = parent;
       task->kind = GOMP_TASK_WAITING;
       task->fn = fn;
       task->fn_data = arg;
       task->final_task = (flags & GOMP_TASK_FLAG_FINAL) >> 1;
    I see you've instead removed the GOMP_TASK_UNDEFERRED but the rationale
    for that is that the copy constructors are being run synchronously

Can anything in cpyfn make use of the fact that kind==GOMP_TASK_UNDEFERRED while executing it? Anyway, if we want to keep this, then I suppose we could just add an extra field deferred_p that does not change for the lifetime of the task to indicate that the task is 'really' a deferred task.

3) kind is not constant, for the deferred tasks it can change over the
    lifetime of the task, as you've said in the comments, it is kind ==
    GOMP_TASK_UNDEFERRED vs. other values; while the changes of task->kind
    are done while holding the task lock, omp_fulfill_event reads it before
    locking that lock, so I think it needs to be done using
    if (__atomic_load_n (&task->kind, MEMMODEL_RELAXED) == GOMP_TASK_UNDEFERRED)
    Pedantically the stores to task->kind also need to be done
    with __atomic_store_n MEMMODEL_RELAXED.

If we check task->deferred_p instead (which never changes for a task after instantiation), is that still necessary?

Now, similarly for 3) on task->kind, task->detach_team is similar case,
again, some other omp_fulfill_event can clear it (under lock, but still read
outside of the lock), so it
probably should be read with
   struct gomp_team *team
     = __atomic_load_n (&task->detach_team, MEMMODEL_RELAXED);
And again, pedantically the detach_team stores should be atomic relaxed
stores too.


Done.

Looking at gomp_task_run_post_remove_parent, doesn't that function
already handle the in_taskwait and in_depend_wait gomp_sem_posts?

And into gomp_task_run_post_remove_taskgroup, doesn't that already
handle the in_taskgroup_wait gomp_sem_post?

The extra code has been removed.

- in gomp_barrier_handle_tasks the reason for if (new_tasks > 1)
is that if there is a single dependent task, the current thread
just finished handling one task and so can take that single task and so no
need to wake up.  While in the omp_fulfill_event case, even if there
is just one new task, we need to schedule it to some thread and so
is desirable to wake some thread.

In that case, we could just do 'if (new_tasks > 0)' instead?

> All we know
> (if team == gomp_thread ()->ts.team) is that at least one thread is doing
> something else but that one could be busy for quite some time.

Well, it should still get around to the new task eventually, so there is no problem in terms of correctness here. I suppose we could always wake up one more thread than strictly necessary, but that might have knock-on effects on performance elsewhere?

And the other case is the omp_fulfill_event call from unshackeled thread,
i.e. team != gomp_thread ()->ts.team.
Here, e.g. what gomp_target_task_completion talks about applies:
   /* I'm afraid this can't be done after releasing team->task_lock,
      as gomp_target_task_completion is run from unrelated thread and
      therefore in between gomp_mutex_unlock and gomp_team_barrier_wake
      the team could be gone already.  */
Even there are 2 different cases.
One is where team->task_running_count > 0, at that point we know
at least one task is running and so the only thing that is unsafe
gomp_team_barrier_wake (&team->barrier, do_wake);
after gomp_mutex_unlock (&team->task_lock); - there is a possibility
that in between the two calls the thread running omp_fulfill_event
gets interrupted or just delayed and the team finishes barrier and
is freed too.  So the gomp_team_barrier_wake needs to be done before
the unlock in that case.

The lock is now freed after the call for unshackeled threads, before otherwise.

And then there is the case where all tasks finish on a barrier but some
haven't been fulfilled yet.
In that case, when the last thread calls
...
So, I think for the team != gomp_thread ()->ts.team
&& !do_wake
&& gomp_team_barrier_waiting_for_tasks (&team->barrier)
&& team->task_detach_count == 0
case, we need to wake up 1 thread anyway and arrange for it to do:
               gomp_team_barrier_done (&team->barrier, state);
               gomp_mutex_unlock (&team->task_lock);
               gomp_team_barrier_wake (&team->barrier, 0);
Possibly in
       if (!priority_queue_empty_p (&team->task_queue, MEMMODEL_RELAXED))
add
       else if (team->task_count == 0
               && gomp_team_barrier_waiting_for_tasks (&team->barrier))
        {
          gomp_team_barrier_done (&team->barrier, state);
          gomp_mutex_unlock (&team->task_lock);
          gomp_team_barrier_wake (&team->barrier, 0);
          if (to_free)
            {
              gomp_finish_task (to_free);
              free (to_free);
            }
          return;
        }
but the:
           if (--team->task_count == 0
               && gomp_team_barrier_waiting_for_tasks (&team->barrier))
             {
               gomp_team_barrier_done (&team->barrier, state);
               gomp_mutex_unlock (&team->task_lock);
               gomp_team_barrier_wake (&team->barrier, 0);
               gomp_mutex_lock (&team->task_lock);
             }
in that case would then be incorrect, we don't want to do that twice.
So, either that second if would need to do the to_free handling
and return instead of taking the lock again and looping, or
perhaps we can just do
          --team->task_count;
there instead and let the above added else if handle that?

I have applied your patch to move the gomp_team_barrier_done, and in omp_fulfill_event, I ensure that a single thread is woken up so that gomp_barrier_handle_tasks can signal for the barrier to finish.

I'm having some trouble coming up with a testcase to test this scenario though. I tried having a testcase like this to have threads in separate teams:

  #pragma omp teams num_teams (2) shared (event, started)
    #pragma omp parallel num_threads (1)
      if (omp_get_team_num () == 0)
        {
          #pragma omp task detach (event)
            started = 1;
        }
      else
        // Wait for started to become 1
        omp_fulfill_event (event);

but it does not work because GOMP_teams_reg launches the enclosed block sequentially:

  for (gomp_team_num = 0; gomp_team_num < num_teams; gomp_team_num++)
    fn (data);

and when the first team launches, it blocks waiting for the detach event in GOMP_parallel_end->gomp_team_end->gomp_team_barrier_wait_end, and never gets around to launching the second team. If I omit the 'omp parallel' (to try to get an undeferred task), GCC refuses to compile (only 'distribute', 'parallel' or 'loop' regions are allowed to be strictly nested inside 'teams' region). And you can't nest 'omp teams' inside an 'omp parallel' either. Is there any way of doing this within OpenMP or do we have to resort to creating threads outside of OpenMP?

Thanks

Kwok
From 0fa4deb89f3778ccacd64b01de377ba2b7879db1 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <k...@codesourcery.com>
Date: Thu, 21 Jan 2021 05:38:47 -0800
Subject: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp
 tests [PR98738]

This adds support for the task detach clause to taskwait and taskgroup, and
simplifies the handling of the detach clause by moving most of the extra
handling required for detach tasks to omp_fulfill_event.

2021-02-24  Kwok Cheung Yeung  <k...@codesourcery.com>
            Jakub Jelinek  <ja...@redhat.com>

        libgomp/

        PR libgomp/98738
        * libgomp.h (enum gomp_task_kind): Add GOMP_TASK_DETACHED.
        (struct gomp_task): Replace detach and completion_sem fields with
        union containing completion_sem and detach_team.  Add deferred_p
        field.
        (struct gomp_team): Remove task_detach_queue.
        * task.c: Include assert.h.
        (gomp_init_task): Initialize deferred_p and detach_team fields.
        (task_fulfilled_p): Delete.
        (GOMP_task): Use address of task as the event handle.  Remove
        initialization of detach field.  Initialize deferred_p field.
        Use automatic local for completion_sem.  Initialize detach_team field
        for deferred tasks.
        (gomp_barrier_handle_tasks): Remove handling of task_detach_queue.
        Set kind of suspended detach task to GOMP_TASK_DETACHED and
        decrement task_running_count.  Move finish_cancelled block out of
        else branch.  Relocate call to gomp_team_barrier_done.
        (GOMP_taskwait): Handle tasks with completion events that have not
        been fulfilled.
        (GOMP_taskgroup_end): Likewise.
        (omp_fulfill_event): Use address of task as event handle.  Post to
        completion_sem for undeferred tasks.  Clear detach_team if task
        has not finished.  For finished tasks, handle post-execution tasks,
        call gomp_team_barrier_wake if necessary, and free task.
        * team.c (gomp_new_team): Remove initialization of task_detach_queue.
        (free_team): Remove free of task_detach_queue.
        * testsuite/libgomp.c-c++-common/task-detach-1.c: Fix formatting.
        * testsuite/libgomp.c-c++-common/task-detach-2.c: Fix formatting.
        * testsuite/libgomp.c-c++-common/task-detach-3.c: Fix formatting.
        * testsuite/libgomp.c-c++-common/task-detach-4.c: Fix formatting.
        * testsuite/libgomp.c-c++-common/task-detach-5.c: Fix formatting.
        Change data-sharing of detach events on enclosing parallel to private.
        * testsuite/libgomp.c-c++-common/task-detach-6.c: Likewise.  Remove
        taskwait directive.
        * testsuite/libgomp.c-c++-common/task-detach-7.c: New.
        * testsuite/libgomp.c-c++-common/task-detach-8.c: New.
        * testsuite/libgomp.c-c++-common/task-detach-9.c: New.
        * testsuite/libgomp.c-c++-common/task-detach-10.c: New.
        * testsuite/libgomp.c-c++-common/task-detach-11.c: New.
        * testsuite/libgomp.c-c++-common/task-detach-1.f90: Fix formatting.
        * testsuite/libgomp.c-c++-common/task-detach-2.f90: Fix formatting.
        * testsuite/libgomp.c-c++-common/task-detach-3.f90: Fix formatting.
        * testsuite/libgomp.c-c++-common/task-detach-4.f90: Fix formatting.
        * testsuite/libgomp.fortran/task-detach-5.f90: Fix formatting.
        Change data-sharing of detach events on enclosing parallel to private.
        * testsuite/libgomp.fortran/task-detach-6.f90: Likewise.  Remove
        taskwait directive.
        * testsuite/libgomp.c-c++-common/task-detach-7.f90: New.
        * testsuite/libgomp.c-c++-common/task-detach-8.f90: New.
        * testsuite/libgomp.c-c++-common/task-detach-9.f90: New.
        * testsuite/libgomp.c-c++-common/task-detach-10.f90: New.
        * testsuite/libgomp.c-c++-common/task-detach-11.f90: New.
---
 libgomp/libgomp.h                                  |  19 +-
 libgomp/task.c                                     | 236 ++++++++++++++-------
 libgomp/team.c                                     |   2 -
 .../testsuite/libgomp.c-c++-common/task-detach-1.c |   4 +-
 .../libgomp.c-c++-common/task-detach-10.c          |  45 ++++
 .../libgomp.c-c++-common/task-detach-11.c          |  13 ++
 .../testsuite/libgomp.c-c++-common/task-detach-2.c |   6 +-
 .../testsuite/libgomp.c-c++-common/task-detach-3.c |   6 +-
 .../testsuite/libgomp.c-c++-common/task-detach-4.c |   4 +-
 .../testsuite/libgomp.c-c++-common/task-detach-5.c |   8 +-
 .../testsuite/libgomp.c-c++-common/task-detach-6.c |   8 +-
 .../testsuite/libgomp.c-c++-common/task-detach-7.c |  45 ++++
 .../testsuite/libgomp.c-c++-common/task-detach-8.c |  47 ++++
 .../testsuite/libgomp.c-c++-common/task-detach-9.c |  43 ++++
 .../testsuite/libgomp.fortran/task-detach-1.f90    |   4 +-
 .../testsuite/libgomp.fortran/task-detach-10.f90   |  44 ++++
 .../testsuite/libgomp.fortran/task-detach-11.f90   |  13 ++
 .../testsuite/libgomp.fortran/task-detach-2.f90    |   6 +-
 .../testsuite/libgomp.fortran/task-detach-3.f90    |   6 +-
 .../testsuite/libgomp.fortran/task-detach-4.f90    |   4 +-
 .../testsuite/libgomp.fortran/task-detach-5.f90    |   8 +-
 .../testsuite/libgomp.fortran/task-detach-6.f90    |  16 +-
 .../testsuite/libgomp.fortran/task-detach-7.f90    |  42 ++++
 .../testsuite/libgomp.fortran/task-detach-8.f90    |  45 ++++
 .../testsuite/libgomp.fortran/task-detach-9.f90    |  41 ++++
 25 files changed, 584 insertions(+), 131 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-10.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-11.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-7.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-8.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-9.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-10.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-11.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-7.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-8.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-9.f90

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index b4d0c93..cd10d12 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -481,7 +481,10 @@ enum gomp_task_kind
      but not yet completed.  Once that completes, they will be readded
      into the queues as GOMP_TASK_WAITING in order to perform the var
      unmapping.  */
-  GOMP_TASK_ASYNC_RUNNING
+  GOMP_TASK_ASYNC_RUNNING,
+  /* Task that has finished executing but is waiting for its
+     completion event to be fulfilled.  */
+  GOMP_TASK_DETACHED
 };
 
 struct gomp_task_depend_entry
@@ -545,8 +548,15 @@ struct gomp_task
      entries and the gomp_task in which they reside.  */
   struct priority_node pnode[3];
 
-  bool detach;
-  gomp_sem_t completion_sem;
+  union {
+    /* Valid only if deferred_p is false.  */
+    gomp_sem_t *completion_sem;
+    /* Valid only if deferred_p is true.  Set to the team that executes the
+       task if the task is detached and the completion event has yet to be
+       fulfilled.  */
+    struct gomp_team *detach_team;
+  };
+  bool deferred_p;
 
   struct gomp_task_icv icv;
   void (*fn) (void *);
@@ -688,8 +698,7 @@ struct gomp_team
   int work_share_cancelled;
   int team_cancelled;
 
-  /* Tasks waiting for their completion event to be fulfilled.  */
-  struct priority_queue task_detach_queue;
+  /* Number of tasks waiting for their completion event to be fulfilled.  */
   unsigned int task_detach_count;
 
   /* This array contains structures for implicit tasks.  */
diff --git a/libgomp/task.c b/libgomp/task.c
index b242e7c..79df733 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -29,6 +29,7 @@
 #include "libgomp.h"
 #include <stdlib.h>
 #include <string.h>
+#include <assert.h>
 #include "gomp-constants.h"
 
 typedef struct gomp_task_depend_entry *hash_entry_type;
@@ -86,7 +87,8 @@ gomp_init_task (struct gomp_task *task, struct gomp_task 
*parent_task,
   task->dependers = NULL;
   task->depend_hash = NULL;
   task->depend_count = 0;
-  task->detach = false;
+  task->deferred_p = true;
+  task->detach_team = NULL;
 }
 
 /* Clean up a task, after completing it.  */
@@ -327,12 +329,6 @@ gomp_task_handle_depend (struct gomp_task *task, struct 
gomp_task *parent,
     }
 }
 
-static bool
-task_fulfilled_p (struct gomp_task *task)
-{
-  return gomp_sem_getcount (&task->completion_sem) > 0;
-}
-
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
    false, then we must not delay in executing the task.  If UNTIED is true,
    then the task may be executed by any member of the team.
@@ -398,6 +394,7 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) 
(void *, void *),
       || team->task_count > 64 * team->nthreads)
     {
       struct gomp_task task;
+      gomp_sem_t completion_sem;
 
       /* If there are depend clauses and earlier deferred sibling tasks
         with depend clauses, check if there isn't a dependency.  If there
@@ -414,16 +411,18 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) 
(void *, void *),
       task.final_task = (thr->task && thr->task->final_task)
                        || (flags & GOMP_TASK_FLAG_FINAL);
       task.priority = priority;
+      task.deferred_p = false;
 
       if ((flags & GOMP_TASK_FLAG_DETACH) != 0)
        {
-         task.detach = true;
-         gomp_sem_init (&task.completion_sem, 0);
-         *(void **) detach = &task.completion_sem;
+         gomp_sem_init (&completion_sem, 0);
+         task.completion_sem = &completion_sem;
+         *(void **) detach = &task;
          if (data)
-           *(void **) data = &task.completion_sem;
+           *(void **) data = &task;
 
-         gomp_debug (0, "New event: %p\n", &task.completion_sem);
+         gomp_debug (0, "Thread %d: new event: %p\n",
+                     thr->ts.team_id, &task);
        }
 
       if (thr->task)
@@ -443,8 +442,8 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) 
(void *, void *),
       else
        fn (data);
 
-      if (task.detach && !task_fulfilled_p (&task))
-       gomp_sem_wait (&task.completion_sem);
+      if ((flags & GOMP_TASK_FLAG_DETACH) != 0 && detach)
+       gomp_sem_wait (&completion_sem);
 
       /* Access to "children" is normally done inside a task_lock
         mutex region, but the only way this particular task.children
@@ -484,15 +483,16 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) 
(void *, void *),
       task->kind = GOMP_TASK_UNDEFERRED;
       task->in_tied_task = parent->in_tied_task;
       task->taskgroup = taskgroup;
+      task->deferred_p = true;
       if ((flags & GOMP_TASK_FLAG_DETACH) != 0)
        {
-         task->detach = true;
-         gomp_sem_init (&task->completion_sem, 0);
-         *(void **) detach = &task->completion_sem;
+         task->detach_team = team;
+
+         *(void **) detach = task;
          if (data)
-           *(void **) data = &task->completion_sem;
+           *(void **) data = task;
 
-         gomp_debug (0, "New event: %p\n", &task->completion_sem);
+         gomp_debug (0, "Thread %d: new event: %p\n", thr->ts.team_id, task);
        }
       thr->task = task;
       if (cpyfn)
@@ -1362,27 +1362,6 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
     {
       bool cancelled = false;
 
-      /* Look for a queued detached task with a fulfilled completion event
-        that is ready to finish.  */
-      child_task = priority_queue_find (PQ_TEAM, &team->task_detach_queue,
-                                       task_fulfilled_p);
-      if (child_task)
-       {
-         priority_queue_remove (PQ_TEAM, &team->task_detach_queue,
-                                child_task, MEMMODEL_RELAXED);
-         --team->task_detach_count;
-         gomp_debug (0, "thread %d: found task with fulfilled event %p\n",
-                     thr->ts.team_id, &child_task->completion_sem);
-
-       if (to_free)
-         {
-           gomp_finish_task (to_free);
-           free (to_free);
-           to_free = NULL;
-         }
-         goto finish_cancelled;
-       }
-
       if (!priority_queue_empty_p (&team->task_queue, MEMMODEL_RELAXED))
        {
          bool ignored;
@@ -1405,6 +1384,19 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
          team->task_running_count++;
          child_task->in_tied_task = true;
        }
+      else if (team->task_count == 0
+              && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+       {
+         gomp_team_barrier_done (&team->barrier, state);
+         gomp_mutex_unlock (&team->task_lock);
+         gomp_team_barrier_wake (&team->barrier, 0);
+         if (to_free)
+           {
+             gomp_finish_task (to_free);
+             free (to_free);
+           }
+         return;
+       }
       gomp_mutex_unlock (&team->task_lock);
       if (do_wake)
        {
@@ -1450,44 +1442,37 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
       gomp_mutex_lock (&team->task_lock);
       if (child_task)
        {
-         if (child_task->detach && !task_fulfilled_p (child_task))
+         if (child_task->detach_team)
            {
-             priority_queue_insert (PQ_TEAM, &team->task_detach_queue,
-                                    child_task, child_task->priority,
-                                    PRIORITY_INSERT_END,
-                                    false, false);
+             assert (child_task->detach_team == team);
+             child_task->kind = GOMP_TASK_DETACHED;
              ++team->task_detach_count;
-             gomp_debug (0, "thread %d: queueing task with event %p\n",
-                         thr->ts.team_id, &child_task->completion_sem);
+             --team->task_running_count;
+             gomp_debug (0,
+                         "thread %d: task with event %p finished without "
+                         "completion event fulfilled in team barrier\n",
+                         thr->ts.team_id, child_task);
              child_task = NULL;
+             continue;
            }
-         else
+
+        finish_cancelled:;
+         size_t new_tasks
+           = gomp_task_run_post_handle_depend (child_task, team);
+         gomp_task_run_post_remove_parent (child_task);
+         gomp_clear_parent (&child_task->children_queue);
+         gomp_task_run_post_remove_taskgroup (child_task);
+         to_free = child_task;
+         if (!cancelled)
+           team->task_running_count--;
+         child_task = NULL;
+         if (new_tasks > 1)
            {
-            finish_cancelled:;
-             size_t new_tasks
-               = gomp_task_run_post_handle_depend (child_task, team);
-             gomp_task_run_post_remove_parent (child_task);
-             gomp_clear_parent (&child_task->children_queue);
-             gomp_task_run_post_remove_taskgroup (child_task);
-             to_free = child_task;
-             child_task = NULL;
-             if (!cancelled)
-               team->task_running_count--;
-             if (new_tasks > 1)
-               {
-                 do_wake = team->nthreads - team->task_running_count;
-                 if (do_wake > new_tasks)
-                   do_wake = new_tasks;
-               }
-             if (--team->task_count == 0
-                 && gomp_team_barrier_waiting_for_tasks (&team->barrier))
-               {
-                 gomp_team_barrier_done (&team->barrier, state);
-                 gomp_mutex_unlock (&team->task_lock);
-                 gomp_team_barrier_wake (&team->barrier, 0);
-                 gomp_mutex_lock (&team->task_lock);
-               }
+             do_wake = team->nthreads - team->task_running_count;
+             if (do_wake > new_tasks)
+               do_wake = new_tasks;
            }
+         --team->task_count;
        }
     }
 }
@@ -1559,7 +1544,8 @@ GOMP_taskwait (void)
       else
        {
        /* All tasks we are waiting for are either running in other
-          threads, or they are tasks that have not had their
+          threads, are detached and waiting for the completion event to be
+          fulfilled, or they are tasks that have not had their
           dependencies met (so they're not even in the queue).  Wait
           for them.  */
          if (task->taskwait == NULL)
@@ -1614,6 +1600,19 @@ GOMP_taskwait (void)
       gomp_mutex_lock (&team->task_lock);
       if (child_task)
        {
+         if (child_task->detach_team)
+           {
+             assert (child_task->detach_team == team);
+             child_task->kind = GOMP_TASK_DETACHED;
+             ++team->task_detach_count;
+             gomp_debug (0,
+                         "thread %d: task with event %p finished without "
+                         "completion event fulfilled in taskwait\n",
+                         thr->ts.team_id, child_task);
+             child_task = NULL;
+             continue;
+           }
+
         finish_cancelled:;
          size_t new_tasks
            = gomp_task_run_post_handle_depend (child_task, team);
@@ -2069,6 +2068,19 @@ GOMP_taskgroup_end (void)
       gomp_mutex_lock (&team->task_lock);
       if (child_task)
        {
+         if (child_task->detach_team)
+           {
+             assert (child_task->detach_team == team);
+             child_task->kind = GOMP_TASK_DETACHED;
+             ++team->task_detach_count;
+             gomp_debug (0,
+                         "thread %d: task with event %p finished without "
+                         "completion event fulfilled in taskgroup\n",
+                         thr->ts.team_id, child_task);
+             child_task = NULL;
+             continue;
+           }
+
         finish_cancelled:;
          size_t new_tasks
            = gomp_task_run_post_handle_depend (child_task, team);
@@ -2402,17 +2414,77 @@ ialias (omp_in_final)
 void
 omp_fulfill_event (omp_event_handle_t event)
 {
-  gomp_sem_t *sem = (gomp_sem_t *) event;
-  struct gomp_thread *thr = gomp_thread ();
-  struct gomp_team *team = thr ? thr->ts.team : NULL;
+  struct gomp_task *task = (struct gomp_task *) event;
+  if (!task->deferred_p)
+    {
+      if (gomp_sem_getcount (task->completion_sem) > 0)
+       gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", task);
 
-  if (gomp_sem_getcount (sem) > 0)
-    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", sem);
+      gomp_debug (0, "omp_fulfill_event: %p event for undeferred task\n",
+                 task);
+      gomp_sem_post (task->completion_sem);
+      return;
+    }
 
-  gomp_debug (0, "omp_fulfill_event: %p\n", sem);
-  gomp_sem_post (sem);
-  if (team)
-    gomp_team_barrier_wake (&team->barrier, 1);
+  struct gomp_team *team = __atomic_load_n (&task->detach_team,
+                                           MEMMODEL_RELAXED);
+  if (!team)
+    gomp_fatal ("omp_fulfill_event: %p event is invalid or has already "
+               "been fulfilled!\n", task);
+
+  gomp_mutex_lock (&team->task_lock);
+  if (task->kind != GOMP_TASK_DETACHED)
+    {
+      /* The task has not finished running yet.  */
+      gomp_debug (0,
+                 "omp_fulfill_event: %p event fulfilled for unfinished "
+                 "task\n", task);
+      __atomic_store_n (&task->detach_team, NULL, MEMMODEL_RELAXED);
+      gomp_mutex_unlock (&team->task_lock);
+      return;
+    }
+
+  gomp_debug (0, "omp_fulfill_event: %p event fulfilled for finished task\n",
+             task);
+  size_t new_tasks = gomp_task_run_post_handle_depend (task, team);
+  gomp_task_run_post_remove_parent (task);
+  gomp_clear_parent (&task->children_queue);
+  gomp_task_run_post_remove_taskgroup (task);
+  team->task_count--;
+  team->task_detach_count--;
+
+  int do_wake = 0;
+  bool shackled_thread_p = team == gomp_thread ()->ts.team;
+  if (new_tasks > 0)
+    {
+      /* Wake up threads to run new tasks.  */
+      do_wake = team->nthreads - team->task_running_count;
+      if (do_wake > new_tasks)
+       do_wake = new_tasks;
+    }
+
+  if (!shackled_thread_p
+      && !do_wake
+      && gomp_team_barrier_waiting_for_tasks (&team->barrier)
+      && team->task_detach_count == 0)
+    {
+      /* Ensure that at least one thread is woken up to signal that the
+        barrier can finish.  */
+      do_wake = 1;
+    }
+
+  /* If we are running in an unshackled thread, the team might vanish before
+     gomp_team_barrier_wake is run if we release the lock first, so keep the
+     lock for the call in that case.  */
+  if (shackled_thread_p)
+    gomp_mutex_unlock (&team->task_lock);
+  if (do_wake)
+    gomp_team_barrier_wake (&team->barrier, do_wake);
+  if (!shackled_thread_p)
+    gomp_mutex_unlock (&team->task_lock);
+
+  gomp_finish_task (task);
+  free (task);
 }
 
 ialias (omp_fulfill_event)
diff --git a/libgomp/team.c b/libgomp/team.c
index 0f3707c..9662234 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -206,7 +206,6 @@ gomp_new_team (unsigned nthreads)
   team->work_share_cancelled = 0;
   team->team_cancelled = 0;
 
-  priority_queue_init (&team->task_detach_queue);
   team->task_detach_count = 0;
 
   return team;
@@ -224,7 +223,6 @@ free_team (struct gomp_team *team)
   gomp_barrier_destroy (&team->barrier);
   gomp_mutex_destroy (&team->task_lock);
   priority_queue_free (&team->task_queue);
-  priority_queue_free (&team->task_detach_queue);
   team_free (team);
 }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
index 8583e37..14932b0 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
@@ -14,10 +14,10 @@ int main (void)
   #pragma omp parallel
     #pragma omp single
     {
-      #pragma omp task detach(detach_event1)
+      #pragma omp task detach (detach_event1)
        x++;
 
-      #pragma omp task detach(detach_event2)
+      #pragma omp task detach (detach_event2)
       {
        y++;
        omp_fulfill_event (detach_event1);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-10.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-10.c
new file mode 100644
index 0000000..10d6746
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-10.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test tasks with detach clause on an offload device.  Each device
+   thread spawns off a chain of tasks in a taskgroup, that can then
+   be executed by any available thread.  */
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+  int thread_count;
+  omp_event_handle_t detach_event1, detach_event2;
+
+  #pragma omp target map (tofrom: x, y, z) map (from: thread_count)
+    #pragma omp parallel private (detach_event1, detach_event2)
+      #pragma omp taskgroup
+       {
+         #pragma omp single
+           thread_count = omp_get_num_threads ();
+
+         #pragma omp task detach (detach_event1) untied
+           #pragma omp atomic update
+             x++;
+
+         #pragma omp task detach (detach_event2) untied
+         {
+           #pragma omp atomic update
+             y++;
+           omp_fulfill_event (detach_event1);
+         }
+
+         #pragma omp task untied
+         {
+           #pragma omp atomic update
+             z++;
+           omp_fulfill_event (detach_event2);
+         }
+       }
+
+  assert (x == thread_count);
+  assert (y == thread_count);
+  assert (z == thread_count);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-11.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-11.c
new file mode 100644
index 0000000..dd002dc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-11.c
@@ -0,0 +1,13 @@
+/* { dg-do run } */
+
+#include <omp.h>
+
+/* Test the detach clause when the task is undeferred.  */
+
+int main (void)
+{
+  omp_event_handle_t event;
+
+  #pragma omp task detach (event)
+    omp_fulfill_event (event);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
index 943ac2a..3e33c40 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
@@ -12,13 +12,13 @@ int main (void)
   omp_event_handle_t detach_event1, detach_event2;
   int x = 0, y = 0, z = 0;
 
-  #pragma omp parallel num_threads(1)
+  #pragma omp parallel num_threads (1)
     #pragma omp single
     {
-      #pragma omp task detach(detach_event1)
+      #pragma omp task detach (detach_event1)
        x++;
 
-      #pragma omp task detach(detach_event2)
+      #pragma omp task detach (detach_event2)
       {
        y++;
        omp_fulfill_event (detach_event1);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
index 2609fb1..c85857d 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
@@ -14,16 +14,16 @@ int main (void)
   #pragma omp parallel
     #pragma omp single
     {
-      #pragma omp task depend(out:dep) detach(detach_event)
+      #pragma omp task depend (out:dep) detach (detach_event)
        x++;
 
       #pragma omp task
       {
        y++;
-       omp_fulfill_event(detach_event);
+       omp_fulfill_event (detach_event);
       }
 
-      #pragma omp task depend(in:dep)
+      #pragma omp task depend (in:dep)
        z++;
     }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
index eeb9554..cd0d2b3 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
@@ -14,10 +14,10 @@ int main (void)
 
   #pragma omp parallel
     #pragma omp single
-      #pragma omp task detach(detach_event)
+      #pragma omp task detach (detach_event)
       {
        x++;
-       omp_fulfill_event(detach_event);
+       omp_fulfill_event (detach_event);
       }
 
   assert (x == 1);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
index 5a01517..382f377 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
@@ -12,16 +12,16 @@ int main (void)
   int thread_count;
   omp_event_handle_t detach_event1, detach_event2;
 
-  #pragma omp parallel firstprivate(detach_event1, detach_event2)
+  #pragma omp parallel private (detach_event1, detach_event2)
   {
     #pragma omp single
-      thread_count = omp_get_num_threads();
+      thread_count = omp_get_num_threads ();
 
-    #pragma omp task detach(detach_event1) untied
+    #pragma omp task detach (detach_event1) untied
       #pragma omp atomic update
        x++;
 
-    #pragma omp task detach(detach_event2) untied
+    #pragma omp task detach (detach_event2) untied
     {
       #pragma omp atomic update
        y++;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
index b5f68cc..e5c2291 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
@@ -13,11 +13,11 @@ int main (void)
   int thread_count;
   omp_event_handle_t detach_event1, detach_event2;
 
-  #pragma omp target map(tofrom: x, y, z) map(from: thread_count)
-    #pragma omp parallel firstprivate(detach_event1, detach_event2)
+  #pragma omp target map (tofrom: x, y, z) map (from: thread_count)
+    #pragma omp parallel private (detach_event1, detach_event2)
       {
        #pragma omp single
-         thread_count = omp_get_num_threads();
+         thread_count = omp_get_num_threads ();
 
        #pragma omp task detach(detach_event1) untied
          #pragma omp atomic update
@@ -36,8 +36,6 @@ int main (void)
            z++;
          omp_fulfill_event (detach_event2);
        }
-
-       #pragma omp taskwait
       }
 
   assert (x == thread_count);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-7.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-7.c
new file mode 100644
index 0000000..3f025d6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-7.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test tasks with detach clause.  Each thread spawns off a chain of tasks,
+   that can then be executed by any available thread.  Each thread uses
+   taskwait to wait for the child tasks to complete.  */
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+  int thread_count;
+  omp_event_handle_t detach_event1, detach_event2;
+
+  #pragma omp parallel private (detach_event1, detach_event2)
+  {
+    #pragma omp single
+      thread_count = omp_get_num_threads ();
+
+    #pragma omp task detach (detach_event1) untied
+      #pragma omp atomic update
+       x++;
+
+    #pragma omp task detach (detach_event2) untied
+    {
+      #pragma omp atomic update
+       y++;
+      omp_fulfill_event (detach_event1);
+    }
+
+    #pragma omp task untied
+    {
+      #pragma omp atomic update
+       z++;
+      omp_fulfill_event (detach_event2);
+    }
+
+    #pragma omp taskwait
+  }
+
+  assert (x == thread_count);
+  assert (y == thread_count);
+  assert (z == thread_count);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-8.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-8.c
new file mode 100644
index 0000000..6f77f12
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-8.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test tasks with detach clause on an offload device.  Each device
+   thread spawns off a chain of tasks, that can then be executed by
+   any available thread.  Each thread uses taskwait to wait for the
+   child tasks to complete.  */
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+  int thread_count;
+  omp_event_handle_t detach_event1, detach_event2;
+
+  #pragma omp target map (tofrom: x, y, z) map (from: thread_count)
+    #pragma omp parallel private (detach_event1, detach_event2)
+      {
+       #pragma omp single
+         thread_count = omp_get_num_threads ();
+
+       #pragma omp task detach (detach_event1) untied
+         #pragma omp atomic update
+           x++;
+
+       #pragma omp task detach (detach_event2) untied
+       {
+         #pragma omp atomic update
+           y++;
+         omp_fulfill_event (detach_event1);
+       }
+
+       #pragma omp task untied
+       {
+         #pragma omp atomic update
+           z++;
+         omp_fulfill_event (detach_event2);
+       }
+
+       #pragma omp taskwait
+      }
+
+  assert (x == thread_count);
+  assert (y == thread_count);
+  assert (z == thread_count);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-9.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-9.c
new file mode 100644
index 0000000..5316ca5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-9.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test tasks with detach clause.  Each thread spawns off a chain of tasks
+   in a taskgroup, that can then be executed by any available thread.  */
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+  int thread_count;
+  omp_event_handle_t detach_event1, detach_event2;
+
+  #pragma omp parallel private (detach_event1, detach_event2)
+    #pragma omp taskgroup
+    {
+      #pragma omp single
+       thread_count = omp_get_num_threads ();
+
+      #pragma omp task detach (detach_event1) untied
+       #pragma omp atomic update
+         x++;
+
+      #pragma omp task detach (detach_event2) untied
+      {
+       #pragma omp atomic update
+         y++;
+       omp_fulfill_event (detach_event1);
+      }
+
+      #pragma omp task untied
+      {
+       #pragma omp atomic update
+         z++;
+       omp_fulfill_event (detach_event2);
+      }
+    }
+
+  assert (x == thread_count);
+  assert (y == thread_count);
+  assert (z == thread_count);
+}
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-1.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
index 217bf65..c53b1ca 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
@@ -11,11 +11,11 @@ program task_detach_1
 
   !$omp parallel
     !$omp single
-      !$omp task detach(detach_event1)
+      !$omp task detach (detach_event1)
         x = x + 1
       !$omp end task
 
-      !$omp task detach(detach_event2)
+      !$omp task detach (detach_event2)
         y = y + 1
        call omp_fulfill_event (detach_event1)
       !$omp end task
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-10.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-10.f90
new file mode 100644
index 0000000..61f0ea8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-10.f90
@@ -0,0 +1,44 @@
+! { dg-do run }
+
+! Test tasks with detach clause on an offload device.  Each device
+! thread spawns off a chain of tasks in a taskgroup, that can then
+! be executed by any available thread.
+
+program task_detach_10
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+  integer :: thread_count
+
+  !$omp target map (tofrom: x, y, z) map (from: thread_count)
+    !$omp parallel private (detach_event1, detach_event2)
+      !$omp taskgroup
+       !$omp single
+         thread_count = omp_get_num_threads ()
+       !$omp end single
+
+       !$omp task detach (detach_event1) untied
+         !$omp atomic update
+           x = x + 1
+       !$omp end task
+
+       !$omp task detach (detach_event2) untied
+         !$omp atomic update
+           y = y + 1
+         call omp_fulfill_event (detach_event1)
+       !$omp end task
+
+       !$omp task untied
+         !$omp atomic update
+           z = z + 1
+         call omp_fulfill_event (detach_event2)
+       !$omp end task
+      !$omp end taskgroup
+    !$omp end parallel
+  !$omp end target
+
+  if (x /= thread_count) stop 1
+  if (y /= thread_count) stop 2
+  if (z /= thread_count) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-11.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-11.f90
new file mode 100644
index 0000000..b33baff
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-11.f90
@@ -0,0 +1,13 @@
+! { dg-do run }
+
+! Test the detach clause when the task is undeferred.
+
+program task_detach_11
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event
+
+  !$omp task detach (detach_event)
+    call omp_fulfill_event (detach_event)
+  !$omp end task
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-2.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
index ecb4829..68e3ff2 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
@@ -10,13 +10,13 @@ program task_detach_2
   integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
   integer :: x = 0, y = 0, z = 0
 
-  !$omp parallel num_threads(1)
+  !$omp parallel num_threads (1)
     !$omp single
-      !$omp task detach(detach_event1)
+      !$omp task detach (detach_event1)
         x = x + 1
       !$omp end task
 
-      !$omp task detach(detach_event2)
+      !$omp task detach (detach_event2)
         y = y + 1
        call omp_fulfill_event (detach_event1)
       !$omp end task
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-3.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
index bdf93a5..5ac68d5 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
@@ -12,16 +12,16 @@ program task_detach_3
 
   !$omp parallel
     !$omp single
-      !$omp task depend(out:dep) detach(detach_event)
+      !$omp task depend (out:dep) detach (detach_event)
         x = x + 1
       !$omp end task
 
       !$omp task
         y = y + 1
-       call omp_fulfill_event(detach_event)
+       call omp_fulfill_event (detach_event)
       !$omp end task
 
-      !$omp task depend(in:dep)
+      !$omp task depend (in:dep)
         z = z + 1
       !$omp end task
     !$omp end single
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-4.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-4.f90
index 6d0843c..159624c 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-4.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-4.f90
@@ -11,9 +11,9 @@ program task_detach_4
 
   !$omp parallel
     !$omp single
-      !$omp task detach(detach_event)
+      !$omp task detach (detach_event)
         x = x + 1
-       call omp_fulfill_event(detach_event)
+       call omp_fulfill_event (detach_event)
       !$omp end task
     !$omp end single
   !$omp end parallel
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-5.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
index 955d687..95bd132 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
@@ -10,17 +10,17 @@ program task_detach_5
   integer :: x = 0, y = 0, z = 0
   integer :: thread_count
 
-  !$omp parallel firstprivate(detach_event1, detach_event2)
+  !$omp parallel private (detach_event1, detach_event2)
     !$omp single
-      thread_count = omp_get_num_threads()
+      thread_count = omp_get_num_threads ()
     !$omp end single
 
-    !$omp task detach(detach_event1) untied
+    !$omp task detach (detach_event1) untied
       !$omp atomic update
        x = x + 1
     !$omp end task
 
-    !$omp task detach(detach_event2) untied
+    !$omp task detach (detach_event2) untied
       !$omp atomic update
        y = y + 1
       call omp_fulfill_event (detach_event1);
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
index 0fe2155..b2c476f 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
@@ -11,30 +11,28 @@ program task_detach_6
   integer :: x = 0, y = 0, z = 0
   integer :: thread_count
 
-  !$omp target map(tofrom: x, y, z) map(from: thread_count)
-    !$omp parallel firstprivate(detach_event1, detach_event2)
+  !$omp target map (tofrom: x, y, z) map (from: thread_count)
+    !$omp parallel private (detach_event1, detach_event2)
       !$omp single
-       thread_count = omp_get_num_threads()
+       thread_count = omp_get_num_threads ()
       !$omp end single
 
-      !$omp task detach(detach_event1) untied
+      !$omp task detach (detach_event1) untied
        !$omp atomic update
          x = x + 1
       !$omp end task
 
-      !$omp task detach(detach_event2) untied
+      !$omp task detach (detach_event2) untied
        !$omp atomic update
          y = y + 1
-       call omp_fulfill_event (detach_event1);
+       call omp_fulfill_event (detach_event1)
       !$omp end task
 
       !$omp task untied
        !$omp atomic update
          z = z + 1
-       call omp_fulfill_event (detach_event2);
+       call omp_fulfill_event (detach_event2)
       !$omp end task
-
-      !$omp taskwait
     !$omp end parallel
   !$omp end target
 
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-7.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-7.f90
new file mode 100644
index 0000000..32e715e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-7.f90
@@ -0,0 +1,42 @@
+! { dg-do run }
+
+! Test tasks with detach clause.  Each thread spawns off a chain of tasks,
+! that can then be executed by any available thread.  Each thread uses
+! taskwait to wait for the child tasks to complete.
+
+program task_detach_7
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+  integer :: thread_count
+
+  !$omp parallel private (detach_event1, detach_event2)
+    !$omp single
+      thread_count = omp_get_num_threads()
+    !$omp end single
+
+    !$omp task detach (detach_event1) untied
+      !$omp atomic update
+       x = x + 1
+    !$omp end task
+
+    !$omp task detach (detach_event2) untied
+      !$omp atomic update
+       y = y + 1
+      call omp_fulfill_event (detach_event1)
+    !$omp end task
+
+    !$omp task untied
+      !$omp atomic update
+       z = z + 1
+      call omp_fulfill_event (detach_event2)
+    !$omp end task
+
+    !$omp taskwait
+  !$omp end parallel
+
+  if (x /= thread_count) stop 1
+  if (y /= thread_count) stop 2
+  if (z /= thread_count) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-8.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-8.f90
new file mode 100644
index 0000000..e760eab
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-8.f90
@@ -0,0 +1,45 @@
+! { dg-do run }
+
+! Test tasks with detach clause on an offload device.  Each device
+! thread spawns off a chain of tasks, that can then be executed by
+! any available thread.  Each thread uses taskwait to wait for the
+! child tasks to complete.
+
+program task_detach_8
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+  integer :: thread_count
+
+  !$omp target map (tofrom: x, y, z) map (from: thread_count)
+    !$omp parallel private (detach_event1, detach_event2)
+      !$omp single
+       thread_count = omp_get_num_threads ()
+      !$omp end single
+
+      !$omp task detach (detach_event1) untied
+       !$omp atomic update
+         x = x + 1
+      !$omp end task
+
+      !$omp task detach (detach_event2) untied
+       !$omp atomic update
+         y = y + 1
+       call omp_fulfill_event (detach_event1)
+      !$omp end task
+
+      !$omp task untied
+       !$omp atomic update
+         z = z + 1
+       call omp_fulfill_event (detach_event2)
+      !$omp end task
+
+      !$omp taskwait
+    !$omp end parallel
+  !$omp end target
+
+  if (x /= thread_count) stop 1
+  if (y /= thread_count) stop 2
+  if (z /= thread_count) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-9.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-9.f90
new file mode 100644
index 0000000..540c6de
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-9.f90
@@ -0,0 +1,41 @@
+! { dg-do run }
+
+! Test tasks with detach clause.  Each thread spawns off a chain of tasks
+! in a taskgroup, that can then be executed by any available thread.
+
+program task_detach_9
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+  integer :: thread_count
+
+  !$omp parallel private (detach_event1, detach_event2)
+    !$omp taskgroup
+      !$omp single
+       thread_count = omp_get_num_threads ()
+      !$omp end single
+
+      !$omp task detach (detach_event1) untied
+       !$omp atomic update
+         x = x + 1
+      !$omp end task
+
+      !$omp task detach (detach_event2) untied
+       !$omp atomic update
+         y = y + 1
+       call omp_fulfill_event (detach_event1);
+      !$omp end task
+
+      !$omp task untied
+       !$omp atomic update
+         z = z + 1
+       call omp_fulfill_event (detach_event2);
+      !$omp end task
+    !$omp end taskgroup
+  !$omp end parallel
+
+  if (x /= thread_count) stop 1
+  if (y /= thread_count) stop 2
+  if (z /= thread_count) stop 3
+end program
-- 
2.8.1

Reply via email to