Hi Chung-Lin, Tom!

It's been a while:

On 2018-09-25T21:11:58+0800, Chung-Lin Tang <chunglin_t...@mentor.com> wrote:
> [...] NVPTX/CUDA-specific implementation
> of the new-style goacc_asyncqueues.

In an OpenACC 'async' setting, where the device kernel (expectedly)
crashes because of "an illegal memory access was encountered", I'm
running into a deadlock here:

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c

> +static void
> +cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr)
> +{
> +  if (res != CUDA_SUCCESS)
> +    GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res));
> +  struct nvptx_callback *cb = (struct nvptx_callback *) ptr;
> +  cb->fn (cb->ptr);
> +  free (ptr);
> +}
> +
> +void
> +GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *aq,
> +                                        void (*callback_fn)(void *),
> +                                        void *userptr)
> +{
> +  struct nvptx_callback *b = GOMP_PLUGIN_malloc (sizeof (*b));
> +  b->fn = callback_fn;
> +  b->ptr = userptr;
> +  b->aq = aq;
> +  CUDA_CALL_ASSERT (cuStreamAddCallback, aq->cuda_stream,
> +                 cuda_callback_wrapper, (void *) b, 0);
> +}

In my case, 'cuda_callback_wrapper' (expectedly) gets invoked with
'res != CUDA_SUCCESS' ("an illegal memory access was encountered").
When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device
(..., which deadlocks); that's generally problematic: per
<https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483>
"'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls".

Given that eventually we must reach a host/device synchronization point
(latest when the device is shut down at program termination), and the
non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to
replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error' as per the
"nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case"
attached.  OK to push?

(Might we even skip 'GOMP_PLUGIN_error' here, understanding that the
error will be caught and reported at the next host/device synchronization
point?  But I've not verified that.)


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From b7ddcc0807967750e3c884326ed4c53c05cde81f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tho...@codesourcery.com>
Date: Thu, 12 Jan 2023 14:39:46 +0100
Subject: [PATCH] nvptx: Avoid deadlock in 'cuStreamAddCallback' callback,
 error case

When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device
(..., which may deadlock); that's generally problematic: per
<https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483>
"'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls".

Given that eventually we must reach a host/device synchronization point
(latest when the device is shut down at program termination), and the
non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to
replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error'.

	libgomp/
	* plugin/plugin-nvptx.c (cuda_callback_wrapper): Invoke
	'GOMP_PLUGIN_error' instead of 'GOMP_PLUGIN_fatal'.
---
 libgomp/plugin/plugin-nvptx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 395639537e83..cdb3d435bdc8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1927,7 +1927,7 @@ static void
 cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr)
 {
   if (res != CUDA_SUCCESS)
-    GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res));
+    GOMP_PLUGIN_error ("%s error: %s", __FUNCTION__, cuda_error (res));
   struct nvptx_callback *cb = (struct nvptx_callback *) ptr;
   cb->fn (cb->ptr);
   free (ptr);
-- 
2.39.0

Reply via email to