Hi Andrew, hi Jakub, hello world,
Andrew Stubbs wrote:
Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100"
100 is a bad value - as can be seen below.
As Jakub suggested at
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640432.html
"given that LLVM uses 100-102 range, perhaps pick a different one, 200 or 150"
(I know that the first review email suggested 100.)
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.
Namely: ompx_pinned_mem_alloc
RFC: Should we use this name or - similar to LLVM - prefix this by
a vendor prefix instead (gnu_omp_ or gcc_omp_ instead of ompx_)?
IMHO it is fine to use ompx_ for pinned as the semantic is clear
and should be compatible with IBM and AMD.
For other additional memspaces / allocators, I am less sure, i.e.
on OG13 there are:
- ompx_unified_shared_mem_space, ompx_host_mem_space
- ompx_unified_shared_mem_alloc, ompx_host_mem_alloc
(BTW: In light of TR13 naming, the USM one could be
..._devices_all_mem_{alloc,space}, just to start some bikeshading
or following LLVM + Intel '…target_{host,shared}…'.)
* * *
Looking at other compilers:
IBM's compiler, https://www.ibm.com/docs/en/SSXVZZ_16.1.1/pdf/compiler.pdf ,
has:
- ompx_pinned_mem_alloc, tagged as IBM extension and otherwise without
documenting it further
Checking omp.h, they define it as:
ompx_pinned_mem_alloc = 9, /* Preview of host pinned memory support */
and additionally have:
LOMP_MAX_MEM_ALLOC = 1024,
AMD's compiler based on clang has:
/* Preview of pinned memory support */
ompx_pinned_mem_alloc = 120,
in addition to the LLVM defines shown below.
Regarding LLVM:
- they don't offer 'pinned'
- they use the prefix 'llvm_omp' not 'ompx'
Namely:
typedef enum omp_allocator_handle_t
...
llvm_omp_target_host_mem_alloc = 100,
llvm_omp_target_shared_mem_alloc = 101,
llvm_omp_target_device_mem_alloc = 102,
...
typedef enum omp_memspace_handle_t
...
llvm_omp_target_host_mem_space = 100,
llvm_omp_target_shared_mem_space = 101,
llvm_omp_target_device_mem_space = 102,
Remark: I did not find a documentation - and while I
understand in principle host and shared, I wonder how
LLVM handles 'device_mem_space' when there is more than
one device.
BTW: OpenMP TR13 avoids this issue by adding two sets of
API routines. Namely:
First, for memspaces,
- omp_get_{device,devices}_memspace
- omp_get_{device,devices}_and_host_memspace
- omp_get_devices_all_memspace
and, secondly, for allocators:
- omp_get_{device,devices}_allocator
- omp_get_{device,devices}_and_host_allocator
- omp_get_devices_all_allocator
where omp_get_device_* takes a single device number and
omp_get_devices_* a list of device numbers while _and_host
automatically adds the initial device to the list.
* * *
Looking at Intel, they even use extensions without prefix:
omp_target_{host,shared,device}_mem_{space,alloc}
and contrary to LLVM they document it with the semantic, cf.
https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/openmp-memory-spaces-and-allocators.html
* * *
The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.
...
diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index cdedc7d80e9..18e3f525ec6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)
...
#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-_Static_assert (ARRAY_SIZE (predefined_alloc_mapping)
+_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping)
== omp_max_predefined_alloc + 1,
- "predefined_alloc_mapping must match omp_memspace_handle_t");
+ "predefined_omp_alloc_mapping must match
omp_memspace_handle_t");
+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
I am surprised that this compiles: Why do you re-#define this macro?
* * *
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
omp_cgroup_mem_alloc = 6,
omp_pteam_mem_alloc = 7,
omp_thread_mem_alloc = 8,
+ ompx_pinned_mem_alloc = 100,
See remark regarding "100" at the top of this email.
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
+integer (kind=omp_allocator_handle_kind), &
+ parameter :: ompx_pinned_mem_alloc = 100
Likewise.
* * *
Why didn't you also update omp_lib.h.in?
* * *
I think you really want to update the checking code inside GCC itself,
i.e. for Fortran:
3 | !$omp allocate(a) allocator(100)
| 21
Error: Predefined allocator required in ALLOCATOR clause at (1) as the list
item 'a' at (2) has the