Hi Jakub,

Am 05.05.2022 um 11:33 schrieb Jakub Jelinek:
On Mon, Mar 14, 2022 at 04:42:14PM +0100, Marcel Vollweiler wrote:
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -226,6 +226,11 @@ OMP_5.1 {
     omp_get_teams_thread_limit_;
  } OMP_5.0.2;

+OMP_5.1.1 {
+  global:
+    omp_target_is_accessible;
+} OMP_5.1;
+

You've already added another OMP_5.1.1 symbol, so this hunk will need to be
adjusted.  Keep the names in there alphabetically sorted.

Adjusted.

--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -835,6 +835,16 @@
            end function omp_target_disassociate_ptr
          end interface

+        interface
+          function omp_target_is_accessible (ptr, size, device_num) bind(c)
+            use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t, c_int
+            integer(c_int) :: omp_target_is_accessible

The function returning integer(c_int) rather than logical seems like
a screw up in the standard, but too late to fix that :(.

--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -3666,6 +3666,24 @@ omp_target_disassociate_ptr (const void *ptr, int 
device_num)
  }

  int
+omp_target_is_accessible (const void *ptr, size_t size, int device_num)
+{
+  if (device_num < 0 || device_num > gomp_get_num_devices ())
+    return false;
+
+  if (device_num == gomp_get_num_devices ())
+    return true;
+
+  struct gomp_device_descr *devicep = resolve_device (device_num);
+  if (devicep == NULL)
+    return false;
+
+  /* TODO: Unified shared memory must be handled when available.  */
+
+  return devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM;

I guess for now it is reasonable, but I wonder if even without
GOMP_OFFLOAD_CAP_SHARED_MEM one can't for CUDA or GCN allocate host
memory (not all, but just some subset) that will be accessible on the
device (I bet that means accessible through the same address on the host and
device, aka partial shared mem).

Currently, I am only aware of

(a) physically shared memory which is used for some architectures where CPU and
GPU are close together (handled via GOMP_OFFLOAD_CAP_SHARED_MEM) and
(b) unified shared memory as being more a logical memory sharing via managed
memory (using sth. like cudaMallocManaged).

For (b) I will submit a follow up patch very soon that depends on the submitted
but not yet approved/committed usm patches:
   https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591349.html


So, ok for trunk.

OT, tried to look how libomptarget implements it and they don't at least
on llvm-project trunk, but while looking at that, noticed that for
omp_target_is_present they do return false from omp_target_is_present
while we return true.  It is unclear if NULL has corresponding storage
on the device (NULL always corresponds to NULL on the device) or not.

That's indeed an interesting point. I am not sure whether returning "true" for a
given NULL pointer is the desired behaviour for omp_target_is_present. For the
host that might be ok (for whatever reason) but for offload devices this implies
that NULL is actually mapped to some address on the device (as far as I
understand the definition):

"The omp_target_is_present routine tests whether a host pointer refers to
storage that is mapped to a given device."

I don't know if such a "NULL mapping" is valid/useful.

Marcel
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to