Hi,

On Fri, Dec 11, 2015 at 07:05:29PM +0100, Jakub Jelinek wrote:
> On Thu, Dec 10, 2015 at 06:52:23PM +0100, Martin Jambor wrote:
> > > > --- a/libgomp/task.c
> > > > +++ b/libgomp/task.c
> > > > @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data)
> > > >        gomp_mutex_unlock (&team->task_lock);
> > > >      }
> > > >    ttask->state = GOMP_TARGET_TASK_FINISHED;
> > > > +  free (ttask->firstprivate_copies);
> > > >    gomp_target_task_completion (team, task);
> > > >    gomp_mutex_unlock (&team->task_lock);
> > > >  }
> > > 
> > > So, this function should have a special case for the SHARED_MEM case, 
> > > handle
> > > it closely to say how GOMP_taskgroup_end handles the finish_cancelled:
> > > case.  Just note that the target task is missing from certain queues at 
> > > that
> > > point.
> > 
> > I'm afraid I need some help here.  I do not quite understand how is
> > finish_cancelled in GOMP_taskgroup_end similar, it seems to be doing
> > much more than freeing one pointer.  What is exactly the issue with
> > the above?
> > 
> > Nevertheless, after reading through bits of task.c again, I wonder
> > whether any copying (for both shared memory target and the host) in
> > gomp_target_task_fn is actually necessary because it seems to be also
> > done in gomp_create_target_task.  Does that not apply somehow?
> 
> The target task is scheduled for the first action as normal task, and the
> scheduling of it already removes it from some of the queues (each task is
> put into 1-3 queues), i.e. actions performed mostly by
> gomp_task_run_pre.  Then the team task lock is unlocked and the task is run.
> Finally, for normal tasks, gomp_task_run_post_handle_depend,
> gomp_task_run_post_remove_parent, etc. is run.  Now, for async target tasks
> that have something running on some other device at that point, we don't do
> that, but instead make it GOMP_TASK_ASYNC_RUNNING.  And continue with other
> stuff, until gomp_target_task_completion is run.
> For non-shared mem that needs to readd the task again into the queues, so
> that it will be scheduled again.  But you don't need that for shared mem
> target tasks, they can just free the firstprivate_copies and finalize the
> task.
> At the time gomp_target_task_completion is called, the task is pretty much
> in the same state as it is around the finish_cancelled:; label.
> So instead of what the gomp_target_task_completion function does,
> you would for SHARED_MEM do something like:
>           size_t new_tasks
>             = gomp_task_run_post_handle_depend (task, team);
>           gomp_task_run_post_remove_parent (task);
>           gomp_clear_parent (&task->children_queue);
>           gomp_task_run_post_remove_taskgroup (task);
>           team->task_count--;
>         do_wake = 0;
>           if (new_tasks > 1)
>             {
>               do_wake = team->nthreads - team->task_running_count
>                         - !task->in_tied_task;
>               if (do_wake > new_tasks)
>                 do_wake = new_tasks;
>             }
> // Unlike other places, the following will be also run with the
> // task_lock held, but I'm afraid there is nothing to do about it.
> // See the comment in gomp_target_task_completion.
>         gomp_finish_task (task);
>         free (task);
>         if (do_wake)
>           gomp_team_barrier_wake (&team->barrier, do_wake);
> 

I tried the above but libgomp testcase target-33.c always got stuck
within GOMP_taskgroup_end call, more specifically in
gomp_team_barrier_wait_end in config/linux/bar.c where the the first
call to gomp_barrier_handle_tasks left the barrier->generation as
BAR_WAITING_FOR_TASK and then nothing ever happened, even as the
callbacks fired.

After looking into the tasking mechanism for basically the whole day
yesterday, I *think* I fixed it by calling
gomp_team_barrier_set_task_pending from the callback and another hunk
in gomp_barrier_handle_tasks so that it clears that barrier flag even
if it has not picked up any tasks.  Please let me know if you think it
makes sense.

If so, I'll include it in an HSA patch set I hope to generate today.
Otherwise I guess I'd prefer to remove the shared-memory path and
revert to old behavior as a temporary measure until we find out what
was wrong.

Thanks and sorry that it took me so long to resolve this,

Martin


diff --git a/libgomp/task.c b/libgomp/task.c
index ab5df51..828c1fb 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -566,6 +566,14 @@ gomp_target_task_completion (struct gomp_team *team, 
struct gomp_task *task)
     gomp_team_barrier_wake (&team->barrier, 1);
 }
 
+static inline size_t
+gomp_task_run_post_handle_depend (struct gomp_task *child_task,
+                                 struct gomp_team *team);
+static inline void
+gomp_task_run_post_remove_parent (struct gomp_task *child_task);
+static inline void
+gomp_task_run_post_remove_taskgroup (struct gomp_task *child_task);
+
 /* Signal that a target task TTASK has completed the asynchronously
    running phase and should be requeued as a task to handle the
    variable unmapping.  */
@@ -584,8 +592,34 @@ GOMP_PLUGIN_target_task_completion (void *data)
       gomp_mutex_unlock (&team->task_lock);
     }
   ttask->state = GOMP_TARGET_TASK_FINISHED;
-  free (ttask->firstprivate_copies);
-  gomp_target_task_completion (team, task);
+
+  if (ttask->devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    {
+      free (ttask->firstprivate_copies);
+      size_t new_tasks
+       = gomp_task_run_post_handle_depend (task, team);
+      gomp_task_run_post_remove_parent (task);
+      gomp_clear_parent (&task->children_queue);
+      gomp_task_run_post_remove_taskgroup (task);
+      team->task_count--;
+      int do_wake = 0;
+      if (new_tasks)
+       {
+         do_wake = team->nthreads - team->task_running_count
+           - !task->in_tied_task;
+         if (do_wake > new_tasks)
+           do_wake = new_tasks;
+       }
+      /* Unlike other places, the following will be also run with the task_lock
+         held, but there is nothing to do about it.  See the comment in
+         gomp_target_task_completion.  */
+      gomp_finish_task (task);
+      free (task);
+      gomp_team_barrier_set_task_pending (&team->barrier);
+      gomp_team_barrier_wake (&team->barrier, do_wake ? do_wake : 1);
+    }
+  else
+    gomp_target_task_completion (team, task);
   gomp_mutex_unlock (&team->task_lock);
 }
 
@@ -1275,7 +1309,12 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
          thr->task = task;
        }
       else
-       return;
+       {
+         if (team->task_count == 0
+             && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+           gomp_team_barrier_done (&team->barrier, state);
+         return;
+       }
       gomp_mutex_lock (&team->task_lock);
       if (child_task)
        {

Reply via email to