https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555
--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> --- Created attachment 50627 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50627&action=edit debug patch A bit more analysis. I'm working with this example, with an actual task to be able to perform a check afterwards: ... #include <assert.h> int i = 1; int main (void) { #pragma omp target map(tofrom:i) #pragma omp parallel num_threads(2) #pragma omp task { __atomic_add_fetch (&i, 1, __ATOMIC_SEQ_CST); } assert (i == 3); return 0; } ... And I've forced the plugin to launch with two omp-threads to limit the dimensions to the minimium: ... (cuda-gdb) info cuda kernels Kernel Parent Dev Grid Status SMs Mask GridDim BlockDim Invocation * 0 - 0 1 Active 0x00000010 (1,1,1) (32,2,1) main$_omp_fn() ... Furthermore I've made specific instances for the bar.sync team barrier, to get more meaningful backtraces. So the lifetimes of the two omp-threads look like this. THREAD 0: ... #0 0x0000000000b73aa8 in bar_sync_thread_0 () #1 0x0000000000b74a80 in bar_sync_n () #2 0x0000000000b72598 in bar_sync_1 () #3 0x0000000000b760b8 in gomp_team_barrier_wake () #4 0x0000000000b5bc38 in GOMP_task () #5 0x0000000000b36a58 in main$_omp_fn () # $1 #6 0x0000000000a7e618 in GOMP_parallel () #7 0x0000000000b377a0 in main$_omp_fn$0$impl () #8 0x0000000000b3c700 in gomp_nvptx_main () #9 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x0000000000b380e8 in main$_omp_fn () # $2 #1 0x0000000000b95178 in gomp_barrier_handle_tasks () #2 0x0000000000b76e38 in gomp_team_barrier_wait_end () #3 0x0000000000b77dd8 in gomp_team_barrier_wait_final () #4 0x0000000000b2a1b8 in gomp_team_end () #5 0x0000000000b318d8 in GOMP_parallel_end () #6 0x0000000000a7e620 in GOMP_parallel () #7 0x0000000000b377a0 in main$_omp_fn$0$impl () #8 0x0000000000b3c700 in gomp_nvptx_main () #9 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x0000000000b380e8 in main$_omp_fn () # $2 #1 0x0000000000b95178 in gomp_barrier_handle_tasks () #2 0x0000000000b76e38 in gomp_team_barrier_wait_end () #3 0x0000000000b77dd8 in gomp_team_barrier_wait_final () #4 0x0000000000b2a1b8 in gomp_team_end () #5 0x0000000000b318d8 in GOMP_parallel_end () #6 0x0000000000a7e620 in GOMP_parallel () #7 0x0000000000b377a0 in main$_omp_fn$0$impl () #8 0x0000000000b3c700 in gomp_nvptx_main () #9 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x0000000000b73aa8 in bar_sync_thread_0 () #1 0x0000000000b74a80 in bar_sync_n () #2 0x0000000000b72598 in bar_sync_1 () #3 0x0000000000b760b8 in gomp_team_barrier_wake () #4 0x0000000000b94c98 in gomp_barrier_handle_tasks () #5 0x0000000000b76e38 in gomp_team_barrier_wait_end () #6 0x0000000000b77dd8 in gomp_team_barrier_wait_final () #7 0x0000000000b2a1b8 in gomp_team_end () #8 0x0000000000b318d8 in GOMP_parallel_end () #9 0x0000000000a7e620 in GOMP_parallel () #10 0x0000000000b377a0 in main$_omp_fn$0$impl () #11 0x0000000000b3c700 in gomp_nvptx_main () #12 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x0000000000b73aa8 in bar_sync_thread_0 () #1 0x0000000000b74a80 in bar_sync_n () #2 0x0000000000b719b8 in bar_sync_3 () #3 0x0000000000b76f50 in gomp_team_barrier_wait_end () #4 0x0000000000b77dd8 in gomp_team_barrier_wait_final () #5 0x0000000000b2a1b8 in gomp_team_end () #6 0x0000000000b318d8 in GOMP_parallel_end () #7 0x0000000000a7e620 in GOMP_parallel () #8 0x0000000000b377a0 in main$_omp_fn$0$impl () #9 0x0000000000b3c700 in gomp_nvptx_main () #10 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () ^C #0 0x0000000000b73da8 in bar_sync_thread_0 () #1 0x0000000000b74a80 in bar_sync_n () #2 0x0000000000b719b8 in bar_sync_3 () #3 0x0000000000b76f50 in gomp_team_barrier_wait_end () #4 0x0000000000b77dd8 in gomp_team_barrier_wait_final () #5 0x0000000000b2a1b8 in gomp_team_end () #6 0x0000000000b318d8 in GOMP_parallel_end () #7 0x0000000000a7e620 in GOMP_parallel () #8 0x0000000000b377a0 in main$_omp_fn$0$impl () #9 0x0000000000b3c700 in gomp_nvptx_main () #10 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () ... THREAD 1: ... #0 0x0000000000b70ae8 in bar_sync_thread_1 () #1 0x0000000000b74b80 in bar_sync_n () #2 0x0000000000b72598 in bar_sync_1 () #3 0x0000000000b760b8 in gomp_team_barrier_wake () #4 0x0000000000b5bc38 in GOMP_task () #5 0x0000000000b36a58 in main$_omp_fn () # $1 #6 0x0000000000b3cbb8 in gomp_nvptx_main () #7 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x0000000000b70ae8 in bar_sync_thread_1 () #1 0x0000000000b74b80 in bar_sync_n () #2 0x0000000000b719b8 in bar_sync_3 () #3 0x0000000000b76f50 in gomp_team_barrier_wait_end () #4 0x0000000000b77dd8 in gomp_team_barrier_wait_final () #5 0x0000000000b3cd50 in gomp_nvptx_main () #6 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () ^C #0 0x0000000000b3ca30 in gomp_nvptx_main () #1 0x0000000000b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () ... Weaving together this information, I get the following scenario: - both threads execute GOMP_task and deposit a task and execute gomp_team_barrier_wake - thread 1 proceeds to wait at the team barrier - thread 0 proceeds to execute both tasks - thread 0 then executes a gomp_team_barrier_wake from gomp_barrier_handle_tasks, which makes thread 1 exit the team barrier - thread 0 then goes to wait at the team barrier, which results in a hang given that thread 1 already has exited.