Hi, On 15.12.22 20:42, Tobias Burnus wrote:
If the libgomp plugin doesn't request special 'host_to_dev_cpy'/'dev_to_host_cpy' for 'gomp_target_rev', then standard 'gomp_copy_host2dev'/'gomp_copy_dev2host' are used, which use 'gomp_device_copy', which expects the device to be locked. (As can be told by the unconditional 'gomp_mutex_unlock (&devicep->lock);' before 'gomp_fatal'.) However, in a number of the 'gomp_copy_host2dev'/'gomp_copy_dev2host' calls from 'gomp_target_rev', the device definitely is not locked; see
Actually, reading it + the source code again, I think it makes sense to return a boolean – similar to devicep->host2dev_func and devicep->dev2host_func — and possibly wrap it into some convenience function, similar to gomp_device_copy – at least a bare exit() without further diagnostic does not seem to userfriendly. BTW: In line with the other code, you could use CUDA_CALL instead of CUDA_CALL_ERET; the fomer already calls the latter with 'false' as first argument + is used elsewhere. Regarding the lock: It seems the problem is the copying of devaddrs/sizes/kinds; this does not need any lock as the stack variables are on the device and only used for this reverse offload. Thus, there is no need for a lock as there are no races. However, as the existing gomp_copy_dev2host removes the lock, we could simply keep this lock – and probably should move it down to just before the user-function call – removing all (non-error) locks and unlocks on the way. — I mean something like the attached patch. Finally, I think we need to find a solution for the issue Andrew tried to address. — The current code invokes CUDA_CALL_ASSERT – which calls GOMP_PLUGIN_fatal. Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
diff --git a/libgomp/target.c b/libgomp/target.c index e38cc3b6f1c..4b7233307cd 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -3319,5 +3319,6 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, gomp_mutex_lock (&devicep->lock); n = gomp_map_lookup_rev (&devicep->mem_map_rev, &k); - gomp_mutex_unlock (&devicep->lock); + if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + gomp_mutex_unlock (&devicep->lock); if (n == NULL) @@ -3409,5 +3410,4 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, cdata = gomp_alloca (sizeof (*cdata) * mapnum); memset (cdata, '\0', sizeof (*cdata) * mapnum); - gomp_mutex_lock (&devicep->lock); for (uint64_t i = 0; i < mapnum; i++) { @@ -3643,4 +3643,5 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, uint64_t struct_cpy = 0; bool clean_struct = false; + gomp_mutex_lock (&devicep->lock); for (uint64_t i = 0; i < mapnum; i++) { @@ -3695,5 +3696,5 @@ gomp_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr, gomp_aligned_free ((void *) (uintptr_t) devaddrs[i]); } - + gomp_mutex_unlock (&devicep->lock); free (devaddrs); free (sizes);