Andrew Stubbs wrote:
Subject: [PATCH v3] openmp, nvptx: ompx_gnu_managed_mem_alloc
This adds support for using Cuda Managed Memory with omp_alloc. AMD support
will be added in a future patch.
There is one new predefined allocator, "ompx_gnu_managed_mem_alloc", plus a
corresponding memory space, which can be used to allocate memory in the
"managed" space.
The nvptx plugin is modified to make the necessary Cuda calls, via two new
(optional) plugin interfaces.
* * *
+++ b/libgomp/env.c
@@ -1265,6 +1269,7 @@ parse_allocator (const char *env, const char *val, void
*const params[])
C (omp_pteam_mem_alloc, false)
C (omp_thread_mem_alloc, false)
C (ompx_gnu_pinned_mem_alloc, false)
+ C (ompx_gnu_managed_mem_alloc, false)
C (omp_default_mem_space, true)
[…]
C (omp_const_mem_space, true)
Can you also add ompx_gnu_managed_mem_space?
* * *
+++ b/libgomp/libgomp.texi
@@ -6890,6 +6890,7 @@ GCC supports the following predefined allocators and
predefined memory spaces:
@item omp_pteam_mem_alloc @tab omp_low_lat_mem_space (implementation
defined)
@item omp_thread_mem_alloc @tab omp_low_lat_mem_space (implementation
defined)
@item ompx_gnu_pinned_mem_alloc @tab omp_default_mem_space (GNU extension)
+@item ompx_gnu_managed_mem_alloc @tab ompx_gnu_managed_mem_space (GNU
extension)
The documentation does not describe how ompx_gnu_managed_mem_space
acts, in particular in the case it is not supported.
Answer: On a Linux host: Managed memory is used if supported by the
default device (at call time), if not, the fallback is invoked.
On other systems, the default memspace is unconditionally used.
I think some wording is needed.
* * *
+@item @code{ompx_gnu_managed_mem_space} is a GNU extension that provides
+ managed memory accessible by both host and device; it is only available
+ on supported offload targets (see @ref{Offload-Target Specifics}).
I am stumbling over "available *on* supported offload targets",
which somehow sounds like the following is supported:
#pragma omp target
ptr = omp_alloc (sizeof(*ptr), ompx_gnu_managed_mem_alloc);
which isn't. (Or, actually, it is - as the memspace is ignored and
normal 'malloc' is used.)
Thus, I wonder whether it should be *with* instead of *on* for the
allocator. — And I also wonder whether is should be "host and devices"
('-s', i.e. multiple devices).
The multi-GPU case is rather common, albeit usually only of a single
type. (Ignoring low-performance APUs + powerful GPU cards, which might
mix vendors.)
* * *
Plus: I still miss more explicit wording here that the default-device-var
ICV is relevant.
* * *
@@ -1843,8 +1845,8 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version,
const void *target_data)
return ret;
}
-void *
-GOMP_OFFLOAD_alloc (int ord, size_t size)
+static void *
+GOMP_OFFLOAD_alloc_1 (int ord, size_t size, bool managed)
{
I still dislike the GOMP_OFFLOAD_ prefix for functions that
aren't library entry functions but denote local static function.
* * *
+++ b/libgomp/target.c
...
+void *
+gomp_managed_alloc (size_t size)
+{
+ struct gomp_task_icv *icv = gomp_icv (false);
+ struct gomp_device_descr *devicep = resolve_device (icv->default_device_var,
+ false);
+ if (devicep == NULL)
+ return NULL;
+
+ void *ret = NULL;
+ gomp_mutex_lock (&devicep->lock);
+ if (devicep->managed_alloc_func)
+ ret = devicep->managed_alloc_func (devicep->target_id, size);
+ gomp_mutex_unlock (&devicep->lock);
+ return ret;
+}
+
+void
+gomp_managed_free (void *device_ptr)
+{
+ if (device_ptr == NULL)
+ return;
+
+ struct gomp_task_icv *icv = gomp_icv (false);
+ struct gomp_device_descr *devicep = resolve_device (icv->default_device_var,
+ false);
+ if (devicep == NULL)
+ gomp_fatal ("attempted to free managed memory at %p, but the default "
+ "device is set to the host device", device_ptr);
This still feels wrong :-/
There is still no clear note in the documentation that
this allocator uses the default device when doing the
allocation – and even less so that it must be the same
device (actually: device runtime) as for the allocation.
* * *
I wonder whether it makes sense to state in libgomp.texi
something like
The first managed allocation will set the device-type
to be used for all following managed allocations.
And then add in libgomp/config/linux/allocator.c:
static int used_device = GOMP_ICV_DEFAULT_DEVICE;
or
static struct gomp_device_descr *used_device = NULL;
and pass it to target.c as
'gomp_managed_alloc (size, &used_device);' and
'gomp_managed_free (ptr, used_device);'
(Note: With some handling to avoid races.)
Regarding omp_set_default_device, a common use seems to
partition work and then use multiple offload devices to
use them, e.g.
https://github.com/tyatharva/MPAS-Model/blob/b5bdf5a124456c23d618b4c0797e54e189936ab6/src/core_atmosphere/mpas_atm_core_interface.F#L220-L222
but if one searches github, one finds several similar codes.
Otherwise, it seems as if omp_set_default_device is most typcially
only called once during startup.
* * *
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -722,3 +722,12 @@ int main() {
+# return 1 if OpenMP Device Managed Memory is supported
+
+proc check_effective_target_omp_managedmem { } {
+ if { [libgomp_check_effective_target_offload_target "nvptx"] } {
+ return 1
+ }
+ return 0
+}
I think this test is wrong. For instance, if I run my build on a system
with an AMD GPU, it will return true here - as my GCC supports compiling
for AMD and Nvidia GPUs, but there won't be a CUDA runtime nor Nvidia GPU
available. – Or when executing on a system without GPUs but with an
offloading-enabled GCC.
I think that should be
check_effective_target_offload_device_nvptx
i.e. check whether the default device an Nvidia GPU not whether compiling
(also) for an Nvidia GPU.
* * *
Thanks,
Tobias