https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124620

            Bug ID: 124620
           Summary: [libgomp] Deadlock in omp_fulfill_event when called
                    from unshackled thread without dependent tasks
           Product: gcc
           Version: 13.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: cq at smail dot nju.edu.cn
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Created attachment 64008
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64008&action=edit
Proposed fix: set BAR_TASK_PENDING in unshackled-thread path

`omp_fulfill_event` called from an unshackled (non-OpenMP) thread for a
detached task with no dependent tasks (`new_tasks == 0`) causes a deadlock. All
team threads sleep in the barrier and nobody calls `gomp_team_barrier_done`.

Root cause: in `omp_fulfill_event` (task.c), the unshackled-thread path calls
`gomp_team_barrier_wake` but does not set BAR_TASK_PENDING on
`bar->generation`:

```c
if (!shackled_thread_p
    && !do_wake
    && team->task_detach_count == 0
    && gomp_team_barrier_waiting_for_tasks (&team->barrier))
  do_wake = 1;  // missing gomp_team_barrier_set_task_pending
```

The barrier wait loop (same in centralized/flat/POSIX) uses BAR_TASK_PENDING as
the sole gate for entering `gomp_barrier_handle_tasks`. `futex_wake` wakes a
thread, but `bar->generation` is unchanged. The woken thread sees no change,
goes back to `futex_wait`. Nobody enters `gomp_barrier_handle_tasks` to call
`gomp_team_barrier_done`. Deadlock.

The `new_tasks > 0` path already does this correctly (added in commit ba886d0c,
May 2021). The symmetric `!shackled_thread_p` path (commit d656bfda, Feb 2021)
is missing the same call.

Affects all GCC versions since 11.

Reproducer (standard OpenMP 5.0 + pthread, no patched libgomp needed):

```c
#include <omp.h>
#include <pthread.h>
#include <stdio.h>
#include <stdatomic.h>
#include <unistd.h>

static omp_event_handle_t global_event;
static atomic_int event_ready = 0;

static void *fulfill_thread(void *arg) {
  while (!atomic_load_explicit(&event_ready, memory_order_acquire))
    ;
  usleep(200000);
  omp_fulfill_event(global_event);
  return NULL;
}

int main(void) {
  pthread_t thr;
  pthread_create(&thr, NULL, fulfill_thread, NULL);

  #pragma omp parallel num_threads(4)
  {
    #pragma omp single
    {
      omp_event_handle_t ev;
      #pragma omp task detach(ev)
      {
        global_event = ev;
        atomic_store_explicit(&event_ready, 1, memory_order_release);
      }
    }
  }

  pthread_join(thr, NULL);
  printf("OK\n");
  return 0;
}
```

Build and run:
```
gcc -fopenmp -O2 -o repro repro.c
timeout 5 ./repro    # exit 124 = deadlock
```

Result: 5/5 deadlock unpatched, 5/5 pass patched.

Fix: add `gomp_team_barrier_set_task_pending` before `do_wake = 1` in the
unshackled-thread path. Patch attached.

gcc -v:
```
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
13.3.0-6ubuntu2~24.04.1' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-13
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-13-EldibY/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-EldibY/gcc-13-13.3.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04.1)
```

Reply via email to