Re: [PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-06-06 Thread Tobias Burnus

Hi Andrew, hi Jakub, hello world,

Andrew Stubbs wrote:


Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100"


100 is a bad value - as can be seen below.

As Jakub suggested at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640432.html
"given that LLVM uses 100-102 range, perhaps pick a different one, 200 or 150"

(I know that the first review email suggested 100.)


This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.


Namely: ompx_pinned_mem_alloc

RFC: Should we use this name or - similar to LLVM - prefix this by
a vendor prefix instead (gnu_omp_ or gcc_omp_ instead of ompx_)?

IMHO it is fine to use ompx_ for pinned as the semantic is clear
and should be compatible with IBM and AMD.

For other additional memspaces / allocators, I am less sure, i.e.
on OG13 there are:
- ompx_unified_shared_mem_space, ompx_host_mem_space
- ompx_unified_shared_mem_alloc, ompx_host_mem_alloc

(BTW: In light of TR13 naming, the USM one could be
..._devices_all_mem_{alloc,space}, just to start some bikeshading
or following LLVM + Intel '…target_{host,shared}…'.)

* * *

Looking at other compilers:

IBM's compiler, https://www.ibm.com/docs/en/SSXVZZ_16.1.1/pdf/compiler.pdf , 
has:
- ompx_pinned_mem_alloc, tagged as IBM extension and otherwise without 
documenting it further

Checking omp.h, they define it as:
  ompx_pinned_mem_alloc = 9, /* Preview of host pinned memory support */
and additionally have:
  LOMP_MAX_MEM_ALLOC = 1024,

AMD's compiler based on clang has:
  /* Preview of pinned memory support */
  ompx_pinned_mem_alloc = 120,
in addition to the LLVM defines shown below.

Regarding LLVM:
- they don't offer 'pinned'
- they use the prefix 'llvm_omp' not 'ompx'

Namely:
typedef enum omp_allocator_handle_t
...
  llvm_omp_target_host_mem_alloc = 100,
  llvm_omp_target_shared_mem_alloc = 101,
  llvm_omp_target_device_mem_alloc = 102,
...
typedef enum omp_memspace_handle_t
...
  llvm_omp_target_host_mem_space = 100,
  llvm_omp_target_shared_mem_space = 101,
  llvm_omp_target_device_mem_space = 102,

Remark: I did not find a documentation - and while I
understand in principle host and shared, I wonder how
LLVM handles 'device_mem_space' when there is more than
one device.

BTW: OpenMP TR13 avoids this issue by adding two sets of
API routines. Namely:

First, for memspaces,
- omp_get_{device,devices}_memspace
- omp_get_{device,devices}_and_host_memspace
- omp_get_devices_all_memspace

and, secondly, for allocators:
- omp_get_{device,devices}_allocator
- omp_get_{device,devices}_and_host_allocator
- omp_get_devices_all_allocator

where omp_get_device_* takes a single device number and
omp_get_devices_* a list of device numbers while _and_host
automatically adds the initial device to the list.

* * *

Looking at Intel, they even use extensions without prefix:

omp_target_{host,shared,device}_mem_{space,alloc}

and contrary to LLVM they document it with the semantic, cf.
https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/openmp-memory-spaces-and-allocators.html

* * *


The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.


...


diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index cdedc7d80e9..18e3f525ec6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)


...


   #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-_Static_assert (ARRAY_SIZE (predefined_alloc_mapping)
+_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping)
== omp_max_predefined_alloc + 1,
-   "predefined_alloc_mapping must match omp_memspace_handle_t");
+   "predefined_omp_alloc_mapping must match 
omp_memspace_handle_t");
+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))


I am surprised that this compiles: Why do you re-#define this macro?

* * *


--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
 omp_cgroup_mem_alloc = 6,
 omp_pteam_mem_alloc = 7,
 omp_thread_mem_alloc = 8,
+  ompx_pinned_mem_alloc = 100,


See remark regarding "100" at the top of this email.


--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
+integer (kind=omp_allocator_handle_kind), &
+ parameter :: ompx_pinned_mem_alloc = 100


Likewise.

* * *

Why didn't you also update omp_lib.h.in?

* * *

I think you really want to update the checking code inside GCC itself,

i.e. for Fortran:

3 |   !$omp allocate(a) allocator(100)

  | 21

Error: Predefined allocator required in ALLOCATOR clause at (1) as the list 
item 'a' at (2) has the 

[PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-05-31 Thread Andrew Stubbs
Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100".

-

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.

libgomp/ChangeLog:

* allocator.c (ompx_min_predefined_alloc): New.
(ompx_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_alloc_mapping): New.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New (as a function).
(omp_aligned_alloc): Support ompx_pinned_mem_alloc. Use
predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* libgomp.texi: Document ompx_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/allocator.c   | 115 +-
 libgomp/libgomp.texi  |   7 +-
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 103 
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 101 +++
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
 7 files changed, 312 insertions(+), 33 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index cdedc7d80e9..18e3f525ec6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)
 
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
+#define ompx_min_predefined_alloc ompx_pinned_mem_alloc
+#define ompx_max_predefined_alloc ompx_pinned_mem_alloc
 
 /* These macros may be overridden in config//allocator.c.
The defaults (no override) are to return NULL for pinned memory requests
@@ -131,7 +133,7 @@ GOMP_is_alloc (void *ptr)
The index to this table is the omp_allocator_handle_t enum value.
When the user calls omp_alloc with a predefined allocator this
table determines what memory they get.  */
-static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+static const omp_memspace_handle_t predefined_omp_alloc_mapping[] = {
   omp_default_mem_space,   /* omp_null_allocator doesn't actually use this. */
   omp_default_mem_space,   /* omp_default_mem_alloc. */
   omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
@@ -142,11 +144,41 @@ static const omp_memspace_handle_t 
predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc (implementation defined). */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc (implementation defined). */
 };
+static const omp_memspace_handle_t predefined_ompx_alloc_mapping[] = {
+  omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
+};
 
 #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-_Static_assert (ARRAY_SIZE (predefined_alloc_mapping)
+_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping)
== omp_max_predefined_alloc + 1,
-   "predefined_alloc_mapping must match omp_memspace_handle_t");
+   "predefined_omp_alloc_mapping must match 
omp_memspace_handle_t");
+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
+_Static_assert (ARRAY_SIZE (predefined_ompx_alloc_mapping)
+   == ompx_max_predefined_alloc - ompx_min_predefined_alloc + 1,
+   "predefined_ompx_alloc_mapping must match"
+   " omp_memspace_handle_t");
+
+static inline bool
+predefined_allocator_p (omp_allocator_handle_t allocator)
+{
+  return allocator <= ompx_max_predefined_alloc;
+}
+
+static inline omp_memspace_handle_t
+predefined_alloc_mapping (omp_allocator_handle_t allocator)
+{
+  if (allocator <= omp_max_predefined_alloc)
+return predefined_omp_alloc_mapping[allocator];
+  else if (allocator >= ompx_min_predefined_alloc
+  && allocator <= ompx_max_predefined_alloc)
+{
+  int index = allocator - ompx_min_predefined_alloc;
+  return predefined_ompx_alloc_mapping[index];
+}
+  else
+/* This should never happen.  */
+return omp_default_mem_space;
+}
 
 enum gomp_numa_memkind_kind
 {
@@ -556,7 +588,7 @@ retry: