https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 50627
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50627&action=edit
debug patch

A bit more analysis.

I'm working with this example, with an actual task to be able to perform a
check afterwards:
...
#include <assert.h>

int i = 1;

int
main (void)
{

#pragma omp target map(tofrom:i)
#pragma omp parallel num_threads(2)
#pragma omp task
  {
    __atomic_add_fetch (&i, 1, __ATOMIC_SEQ_CST);
  }

  assert (i == 3);

  return 0;
}
...

And I've forced the plugin to launch with two omp-threads to limit the
dimensions to the minimium:
...
(cuda-gdb) info cuda kernels
  Kernel Parent Dev Grid Status   SMs Mask GridDim BlockDim Invocation 
*      0      -   0    1 Active 0x00000010 (1,1,1) (32,2,1) main$_omp_fn() 
...

Furthermore I've made specific instances for the bar.sync team barrier, to get
more meaningful backtraces.  So the lifetimes of the two omp-threads look like
this.

THREAD 0:
...
#0  0x0000000000b73aa8 in bar_sync_thread_0 ()
#1  0x0000000000b74a80 in bar_sync_n ()
#2  0x0000000000b72598 in bar_sync_1 ()
#3  0x0000000000b760b8 in gomp_team_barrier_wake ()
#4  0x0000000000b5bc38 in GOMP_task ()
#5  0x0000000000b36a58 in main$_omp_fn () # $1
#6  0x0000000000a7e618 in GOMP_parallel ()
#7  0x0000000000b377a0 in main$_omp_fn$0$impl ()
#8  0x0000000000b3c700 in gomp_nvptx_main ()
#9  0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x0000000000b380e8 in main$_omp_fn () # $2
#1  0x0000000000b95178 in gomp_barrier_handle_tasks ()
#2  0x0000000000b76e38 in gomp_team_barrier_wait_end ()
#3  0x0000000000b77dd8 in gomp_team_barrier_wait_final ()
#4  0x0000000000b2a1b8 in gomp_team_end ()
#5  0x0000000000b318d8 in GOMP_parallel_end ()
#6  0x0000000000a7e620 in GOMP_parallel ()
#7  0x0000000000b377a0 in main$_omp_fn$0$impl ()
#8  0x0000000000b3c700 in gomp_nvptx_main ()
#9  0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x0000000000b380e8 in main$_omp_fn () # $2
#1  0x0000000000b95178 in gomp_barrier_handle_tasks ()
#2  0x0000000000b76e38 in gomp_team_barrier_wait_end ()
#3  0x0000000000b77dd8 in gomp_team_barrier_wait_final ()
#4  0x0000000000b2a1b8 in gomp_team_end ()
#5  0x0000000000b318d8 in GOMP_parallel_end ()
#6  0x0000000000a7e620 in GOMP_parallel ()
#7  0x0000000000b377a0 in main$_omp_fn$0$impl ()
#8  0x0000000000b3c700 in gomp_nvptx_main ()
#9  0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x0000000000b73aa8 in bar_sync_thread_0 ()
#1  0x0000000000b74a80 in bar_sync_n ()
#2  0x0000000000b72598 in bar_sync_1 ()
#3  0x0000000000b760b8 in gomp_team_barrier_wake ()
#4  0x0000000000b94c98 in gomp_barrier_handle_tasks ()
#5  0x0000000000b76e38 in gomp_team_barrier_wait_end ()
#6  0x0000000000b77dd8 in gomp_team_barrier_wait_final ()
#7  0x0000000000b2a1b8 in gomp_team_end ()
#8  0x0000000000b318d8 in GOMP_parallel_end ()
#9  0x0000000000a7e620 in GOMP_parallel ()
#10 0x0000000000b377a0 in main$_omp_fn$0$impl ()
#11 0x0000000000b3c700 in gomp_nvptx_main ()
#12 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x0000000000b73aa8 in bar_sync_thread_0 ()
#1  0x0000000000b74a80 in bar_sync_n ()
#2  0x0000000000b719b8 in bar_sync_3 ()
#3  0x0000000000b76f50 in gomp_team_barrier_wait_end ()
#4  0x0000000000b77dd8 in gomp_team_barrier_wait_final ()
#5  0x0000000000b2a1b8 in gomp_team_end ()
#6  0x0000000000b318d8 in GOMP_parallel_end ()
#7  0x0000000000a7e620 in GOMP_parallel ()
#8  0x0000000000b377a0 in main$_omp_fn$0$impl ()
#9  0x0000000000b3c700 in gomp_nvptx_main ()
#10 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

^C

#0  0x0000000000b73da8 in bar_sync_thread_0 ()
#1  0x0000000000b74a80 in bar_sync_n ()
#2  0x0000000000b719b8 in bar_sync_3 ()
#3  0x0000000000b76f50 in gomp_team_barrier_wait_end ()
#4  0x0000000000b77dd8 in gomp_team_barrier_wait_final ()
#5  0x0000000000b2a1b8 in gomp_team_end ()
#6  0x0000000000b318d8 in GOMP_parallel_end ()
#7  0x0000000000a7e620 in GOMP_parallel ()
#8  0x0000000000b377a0 in main$_omp_fn$0$impl ()
#9  0x0000000000b3c700 in gomp_nvptx_main ()
#10 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()
...

THREAD 1:
...
#0  0x0000000000b70ae8 in bar_sync_thread_1 ()
#1  0x0000000000b74b80 in bar_sync_n ()
#2  0x0000000000b72598 in bar_sync_1 ()
#3  0x0000000000b760b8 in gomp_team_barrier_wake ()
#4  0x0000000000b5bc38 in GOMP_task ()
#5  0x0000000000b36a58 in main$_omp_fn () # $1
#6  0x0000000000b3cbb8 in gomp_nvptx_main ()
#7  0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x0000000000b70ae8 in bar_sync_thread_1 ()
#1  0x0000000000b74b80 in bar_sync_n ()
#2  0x0000000000b719b8 in bar_sync_3 ()
#3  0x0000000000b76f50 in gomp_team_barrier_wait_end ()
#4  0x0000000000b77dd8 in gomp_team_barrier_wait_final ()
#5  0x0000000000b3cd50 in gomp_nvptx_main ()
#6  0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

^C

#0  0x0000000000b3ca30 in gomp_nvptx_main ()
#1  0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()
...


Weaving together this information, I get the following scenario:
- both threads execute GOMP_task and deposit a task and execute
  gomp_team_barrier_wake
- thread 1 proceeds to wait at the team barrier
- thread 0 proceeds to execute both tasks
- thread 0 then executes a gomp_team_barrier_wake from
  gomp_barrier_handle_tasks, which makes thread 1 exit the team barrier
- thread 0 then goes to wait at the team barrier, which results in a hang
  given that thread 1 already has exited.

Reply via email to