Re: [PATCH] libgomp, amdgcn: Implement Managed Memory

Tobias Burnus Mon, 24 Nov 2025 02:45:21 -0800

Andrew Stubbs wrote:

This patch implemenents "managed" memory for AMD GCN GPUs in OpenMP.  It
builds on the support added to the NVPTX libgomp for CUDA Managed Memory, a
week or two ago.


Since we do not use HIP we cannot use hipManagedMalloc, so this patch attempts


Should be: hipMallocManaged (as you also write in .texi).

The HSA header file update uses the most recent files relicensed for us by AMD,
at the time of the first patch posting.  Those files have certainly moved on in
the upstream sources, but I did not ask to get those relicensed.


Thanks. I think eventually we should ask AMD for the permission to
update to a even newer version, but for now that's enough and better
than what we had before.

* * *

diff --git a/include/hsa.h b/include/hsa.h
old mode 100644
new mode 100755


As noted by Arsen: those files shouldn't excutable ...

* * *

The current code does:

<<<<<<<<<<<<<
/* Register the heap allocation as coarse grained, "Managed" memory.  */
struct hsa_amd_svm_attribute_pair_s attr = {
  HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG,
  HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED
};
.. hsa_fns.hsa_amd_svm_attributes_set_fn (new_pages, size, &attr, 1);
...
hsa_fns.hsa_system_get_info_fn
(HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT, &svm_accessible);
...
> +  if (svm_accessible == false)
> +      attr2.attribute = HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE;
> +      attr2.value = agent->id.handle;

> + status = hsa_fns.hsa_amd_svm_attributes_set_fn (new_pages,size, &attr2,

>>>>>>>>>>>>>

Note 1:

* If USM is not available (SVM by default, might require HSA_XNACK = 1),
  it will be only tagged as agent accessible for that specific device
  (→ current default device) → SVM_ATTRIB_AGENT_ACCESSIBLE.

That's effectively different to CUDA where all Nvidia devices
can access it - not only the default.
I come back to this below.

* * *

Note 2:

Looking at HIP, I notice that they use:
  CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_USE_HOST_PTR

And looking at the header file:

"HSA_AMD_SVM_GLOBAL_FLAG_FINE_GRAINED      
Updates to memory with this attribute conform to HSA memory
consistency model.

"HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED    
Writes to memory with this attribute can be performed by
a single agent at a time."

I have to admit that I do not fully understand the implications
and in particular not what "HSA memory consistency model"
means in this context. I tried to get it from
https://hsafoundation.com/wp-content/uploads/2021/02/HSA-SysArch-1.01.pdf
(Chapter 3) but only with limited success.


Hmm.

* * *

--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi


[AMD GCN]

+@item Managed memory allocated on the host with the
        @code{ompx_gnu_managed_mem_alloc} allocator or in the

+      @code{ompx_gnu_managed_mem_space} (both GNU extensions) allocate memory
+      equivalent to HIP Managed Memory, although @emph{not} actually allocated
+      using @code{hipMallocManaged}.  This memory is accessible by both the
+      host and the device at the same address, so it need not be mapped with
+      @code{map} clauses.  Instead, use the @code{is_device_ptr} clause or
+      @code{has_device_addr} clause to indicate that the pointer is already
+      accessible on the device.  The ROCm runtime will automatically handle
+      data migration between host and device as needed.  Not all AMD GPU
+      devices support this feature, and many that do require that
+      @code{-mxnack=on} is configured at compile time.  If managed memory is
+      not supported by the default device, as configured at the moment the
+      allocator is called, then the allocator will use the fall-back setting.
+      If the default device is configured differently when the memory is freed,
+      via @code{omp_free} or @code{omp_realloc}, the result may be undefined.


I wonder whether it should be stated that it might only be accessible
on the default device, unless USM?
(cf. also next comment)

* * *

+++ b/libgomp/plugin/plugin-gcn.c

...

+/* {{{ Managed Memory
+
+   This implements an allocator equivalent to CUDA "Managed" memory, in which
+   the pages automatically migrate between host and device memory, as needed.
+   These allocations are visible from both the host and devices without the
+   need for explicit mappings.  However, OpenMP does need "is_device_ptr" or
+   "has_device_addr" to function properly.
+
+   There isn't a high-level HSA/ROCr API to allocate managed memory, so we
+   use regular memory and register it with the driver by setting it to
+   "coarse-grained" mode, and setting the "accessible by default" attribute
+   on devices where that isn't set as standard.


That's not complete true – it is only set for the device associated
with the passed agent handle, which is the current default device.


The question is whether that's what we want - or more. I think there
are two systems around:

(A) Those having an AMD CPU with GPU on the same die (APU) and a more
powerfull AMD GPU in the PCI bus – in that case, a user might explicitly
use only one or the other - and without gfx*-generic they are likely
also incompatible (with generic both might work).

(B) HPC systems where multiple identical AMD GPUs are installed; this is
rather common with supercomputers - and without HSA_XNACK=1 set, even
MI300A does not have SVM_ACCESSIBLE_BY_DEFAULT. On the other hand,
even without calling with SVM_ATTRIB_AGENT_ACCESSIBLE, at least on
MI300A it is expected that it just works. However, I have no idea about
MI200 systems - I could imagine that then it is only accessible on the
GPU for which hsa_amd_svm_attributes_set has been set.

[Let's assume Frontier - when not partitioning the data via MPI but
inside the program and depending how data is partitioned, a user
might expect that it works on all GPUs and not only one one. While
there might be an allocation per device, before OpenMP 6 it is
difficult to allocate per device and GCC uses the default device.]


→ Thus, the question is whether we keep the current algorithm - of only
setting SVM_ATTRIB_AGENT_ACCESSIBLE for the default device (and update
the comment) - or set it for all AMD GPU devices (possibly accepting
status != HSA_STATUS_SUCCESS if it worked for the default device).

In any case, it seems as if either all AMD devices should get this
flag or the documentation/comment should be updated.

* * *

+  if (!new_pages)
+    {
+      GCN_DEBUG ("Could not allocate Unified Shared Memory heap.");
+      __atomic_store_n (&lock, 0, __ATOMIC_RELEASE);
+      return false;


I wonder whether it makes sense to also output the requested size
or not? And whether it should be USM or Managed Memory in the message.

* * *

+/* Allocate memory suitable for Unified Shared Memory.  */


Managed/device-accessible memory?

* * *

+++ b/libgomp/simple-allocator.c
+/* Ensure that the splay tree will link into the plugin. */
+#define gomp_fatal GOMP_PLUGIN_fatal


Why is this required? The file already uses GOMP_PLUGIN_fatal
unconditionally - and I also don't see any gomp_fatal in an
included file.

* * *

+++ b/libgomp/testsuite/lib/libgomp.exp

...

@@ -729,5 +729,38 @@ proc check_effective_target_omp_managedmem { } {

...

+    if { [libgomp_check_effective_target_offload_target "amdgcn"] } {
+       if [check_runtime_nocache usm_available_ {

...

+    }


I think this requires a comment of the kind: Managed memory is
always supported if USM is available; however, it might also be
supported if not, cf. plugin-gcn.c's managed_heap_create. Err
on the side of caution.

* * *

--- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
@@ -2,6 +2,7 @@
  /* { dg-additional-options "-flto" } */
  /* { dg-additional-options "-foffload-options=nvptx-none=-misa=sm_35" { 
target { offload_target_nvptx } } } */
  /* { dg-additional-sources requires-4-aux.c } */
+/* { dg-excess-errors "Unified Shared Memory is enabled, but XNACK is 
disabled" { target offload_target_amdgcn } } */


Can we add a comment above/below like: "/* Note: GCC explicitly
disables XNACK for gfx908 as the hardware support is limited.  */"

It avoids being puzzled about this message.

(likewise for requires-4a.c)

* * *

Thanks for the patch. Please consider the comments above.
Otherwise, it LGTM.

Tobias

Re: [PATCH] libgomp, amdgcn: Implement Managed Memory

Reply via email to