Andrew Stubbs wrote:
On 10/11/2025 10:44, Tobias Burnus wrote:
There is still no clear note in the documentation that
this allocator uses the default device when doing the
allocation – and even less so that it must be the same
device (actually: device runtime) as for the allocation.

The documentation is there now.
Thanks.
and pass it to target.c as
'gomp_managed_alloc (size, &used_device);' and
'gomp_managed_free (ptr, used_device);'

(Note: With some handling to avoid races.)

I don't think I like the idea of hidden magic that a) might prevent clever solutions, and b) once we start doing it we can never stop. I prefer the clear documentation together with diagnosing the cases that we can.

Well, I am not sure whether one hidden logic is better than the other; we could also store the value device value in the descriptor.

But that solution is also fine if sufficiently documented.

* * *

--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -1265,11 +1271,13 @@ parse_allocator (const char *env, const char *val, void 
*const params[])
    C (omp_pteam_mem_alloc, false)
    C (omp_thread_mem_alloc, false)
    C (ompx_gnu_pinned_mem_alloc, false)
+  C (ompx_gnu_managed_mem_alloc, false)
    C (omp_default_mem_space, true)
    C (omp_large_cap_mem_space, true)
    C (omp_const_mem_space, true)
    C (omp_high_bw_mem_space, true)
    C (omp_low_lat_mem_space, true)
+  C (ompx_gnu_managed_mem_space, false)
  #undef C

Not quite. The second argument is:
#define C(v, m) \
...
      memspace = m;                                     \

Thus: please use ', true)' for the last line.

* * *

+@item @code{ompx_gnu_managed_mem_space} is a GNU extension that provides
...
+      already accessible on the device.  If managed memory is not supported by
+      the default device, as configured at the moment the allocator is called,
+      then the allocator will use the fall-back setting.

Actually, on non-Linux, the normal 'malloc' is used – and not the fallback.

Example:

omp target device(dev)
{
  const omp_alloctrait_t traits[]
     = { { omp_atk_alignment, 256},
         {omp_atk_fallback, omp_atv_abort_fb} };
  omp_allocator_handle_t myalloc
    = omp_init_allocator(ompx_gnu_managed_mem_space, 2, traits);
  void *ptr = omp_alloc(n, ompx_gnu_managed_mem_alloc);
}


For dev == omp_initial_device (on a Linux host):
* default device == an nvidia device
  → managed memory is used (honoring the alignment)
* default device != nvidia device (e.g. the host)
  → fall back is used → ABORT
  (with default_fb, 'malloc' would be used but
   without the 265-bit alignment).

For dev == any non-host device
* default mem space is used ('malloc'), honoring
  the alignment.

The current wording does not really make his clear:

+@item @code{ompx_gnu_managed_mem_space} is a GNU extension that provides
+      managed memory accessible by both host and device (as determined by the
+      @var{default-device-var} ICV); it is only available for supported offload
+      targets (see @ref{Offload-Target Specifics}).  This memory is accessible

Actually, the "and device" is also not quite right. For Nvidia devices, all
Nvidia devices can access that memory – it is not device specific, either.

* * *

How about, e.g. (note also 'devices' [-s]):

@item @code{ompx_gnu_managed_mem_space} is a GNU extension that provides
      managed memory accessible by both host and devices.  The memory space
      is available if the offload target associated with the
      @var{default-device-var} ICV supports managed memory (see
      @ref{Offload-Target Specifics}).  Otherwise, on Linux the
      fall-back setting of the allocator is used and on other systems
      the default memory space.

and continuing as in the patch:

+      This memory is accessible
+      by both the host and the device at the same address, so it need not be
+      mapped with @code{map} clauses.  Instead, use the @code{is_device_ptr}
+      clause or @code{has_device_addr} clause to indicate that the pointer is
+      already accessible on the device.  If managed memory is not supported by
+      the default device, as configured at the moment the allocator is called,
+      then the allocator will use the fall-back setting. If the default device
+      is configured differently when the memory is freed, via @code{omp_free}
+      or @code{omp_realloc}, the result may be undefined.
Albeit the "If managed ... fall-back setting." can be removed.

* * *

[AMD GPUs]

+@item Managed memory allocated with the OpenMP
+      @code{ompx_gnu_managed_mem_alloc} allocator or in the
+      @code{ompx_gnu_managed_mem_space} is not currently supported on AMD GPU
+      devices; attempting to use it in an allocator will trigger the fall-back
+      trait.

I think we need again "for" instead of "on" – as using
it on the device is fine, except that it will not be
managed but the default mem space. (Possibly adding 'on the host', but
it is not really needed.)

* * *

[Nvidia GPUs]

+@item Managed memory allocated with the @code{ompx_gnu_managed_mem_alloc}
Maybe change this to  'Managed memory allocated *on* *the* *host*'?
+      allocator or in the @code{ompx_gnu_managed_mem_space} (both GNU
+      extensions) allocate memory in the CUDA Managed Memory space using
+      @code{cuMemAllocManaged}.  This memory is accessible by both the host and
+      the device at the same address, so it need not be mapped with @code{map}
+      clauses.  Instead, use the @code{is_device_ptr} clause or
+      @code{has_device_addr} clause to indicate that the pointer is already
+      accessible on the device.  The CUDA runtime will automatically handle
+      data migration between host and device as needed.

+      If managed memory is not supported by the default device, as configured
+      at the moment the allocator is called, then the allocator will use the
+      fall-back setting.

I wonder whether this is needed – or the wording in the other section is enough? I am slightly inclined of removing it.

+ If the default device is configured differently when
+      the memory is freed, via @code{omp_free} or @code{omp_realloc}, the
+      result may be undefined.

While this one is redundant, I think it is sensible to keep it – to
avoid user surprises. It also hints at the default device in case
the user missed it in the other section.

* * *

All in all: LGTM with the env.c issue fixed and considering some
.texi wording changes.

Thanks for the patch!

Tobias

Reply via email to