[PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2021-12-20 Thread Andrew Stubbs
This patch is submitted now for review and so I can commit a backport it 
to the OG11 branch, but isn't suitable for mainline until stage 1.


The patch implements support for omp_low_lat_mem_space and 
omp_low_lat_mem_alloc on NVPTX offload devices. The omp_pteam_mem_alloc, 
omp_cgroup_mem_alloc and omp_thread_mem_alloc allocators are also 
configured to use this space (this to match the current or intended 
behaviour in other toolchains).


The memory is drawn from the ".shared" space that is accessible only 
from within the team in which it is allocated, and which effectively 
ceases to exist when the kernel exits.  By default, 8 KiB of space is 
reserved for each team at launch time. This can be adjusted, at runtime, 
via a new environment variable "GOMP_NVPTX_LOWLAT_POOL". Reserving a 
larger amount may limit the number of teams that can be run in parallel 
(due to hardware limitations). Conversely, reducing the allocation may 
increase the number of teams that can be run in parallel. (I have not 
yet attempted to tune the default too precisely.) The actual maximum 
size will vary according to the available hardware and the number of 
variables that the compiler has placed in .shared space.


The allocator implementation is designed to add no extra space-overhead 
than omp_alloc already does (aside from rounding allocations up to a 
multiple of 8 bytes), thus the internal free and realloc must be told 
how big the original allocation was. The free algorithm maintains an 
in-order linked-list of free memory chunks. Memory is allocated on a 
first-fit basis.


If the allocation fails the NVPTX allocator returns NULL and omp_alloc 
handles the fall-back. Now that this is a thing that is likely to happen 
(low-latency memory is small) this patch also implements appropriate 
fall-back modes for the predefined allocators (fall-back for custom 
allocators already worked).


In order to support the %dynamic_smem_size PTX feature is is necessary 
to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014).


OK for stage 1?

Andrewlibgomp, nvptx: low-latency memory allocator

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that the minimum version
requirement is now bumped to 4.1 (still old at this point).

gcc/ChangeLog:

* config/nvptx/nvptx.c (nvptx_file_start): Bump minimum PTX version
to 4.1.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(dynamic_smem_size): New constants.
(omp_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
Implement fall-backs for predefined allocators.
(omp_realloc): Use MEMSPACE_REALLOC.
Implement fall-backs for predefined allocators.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
(__nvptx_lowlat_pool): New asm varaible.
(gomp_nvptx_main): Initialize the low-latency heap.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/allocators-1.c: New test.
* testsuite/libgomp.c/allocators-2.c: New test.
* testsuite/libgomp.c/allocators-3.c: New test.
* testsuite/libgomp.c/allocators-4.c: New test.
* testsuite/libgomp.c/allocators-5.c: New test.
* testsuite/libgomp.c/allocators-6.c: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index ff44d9fdbef..9bc26d7de0c 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5409,7 +5409,7 @@ nvptx_file_start (void)
   else if (TARGET_PTX_6_3)
 fputs ("\t.version\t6.3\n", asm_out_file);
   else
-fputs ("\t.version\t3.1\n", asm_out_file);
+fputs ("\t.version\t4.1\n", asm_out_file);
   if (TARGET_SM80)
 fputs ("\t.target\tsm_80\n", asm_out_file);
   else if (TARGET_SM75)
diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index deebb6a79fa..b14f991c148 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -34,6 +34,38 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config//allocator.c.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+  ((void)MEMSPACE, malloc (SIZE))
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+  ((void)MEM

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-05 Thread Tom de Vries via Gcc-patches

On 12/20/21 16:58, Andrew Stubbs wrote:
This patch is submitted now for review and so I can commit a backport it 
to the OG11 branch, but isn't suitable for mainline until stage 1.


The patch implements support for omp_low_lat_mem_space and 
omp_low_lat_mem_alloc on NVPTX offload devices. The omp_pteam_mem_alloc, 
omp_cgroup_mem_alloc and omp_thread_mem_alloc allocators are also 
configured to use this space (this to match the current or intended 
behaviour in other toolchains).


The memory is drawn from the ".shared" space that is accessible only 
from within the team in which it is allocated, and which effectively 
ceases to exist when the kernel exits.  By default, 8 KiB of space is 
reserved for each team at launch time. This can be adjusted, at runtime, 
via a new environment variable "GOMP_NVPTX_LOWLAT_POOL". Reserving a 
larger amount may limit the number of teams that can be run in parallel 
(due to hardware limitations). Conversely, reducing the allocation may 
increase the number of teams that can be run in parallel. (I have not 
yet attempted to tune the default too precisely.) The actual maximum 
size will vary according to the available hardware and the number of 
variables that the compiler has placed in .shared space.


The allocator implementation is designed to add no extra space-overhead 
than omp_alloc already does (aside from rounding allocations up to a 
multiple of 8 bytes), thus the internal free and realloc must be told 
how big the original allocation was. The free algorithm maintains an 
in-order linked-list of free memory chunks. Memory is allocated on a 
first-fit basis.


If the allocation fails the NVPTX allocator returns NULL and omp_alloc 
handles the fall-back. Now that this is a thing that is likely to happen 
(low-latency memory is small) this patch also implements appropriate 
fall-back modes for the predefined allocators (fall-back for custom 
allocators already worked).


In order to support the %dynamic_smem_size PTX feature is is necessary 
to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014).


I applied the patch (but used the libgomp/configure.tgt patch to force 
-mptx=4.1, rather than changing the default).


I ran into the following (using export GOMP_NVPTX_JIT=-O0 to work around 
known driver problems), and observed these extra FAILs:

...
FAIL: libgomp.c/../libgomp.c-c++-common/alloc-7.c execution test
FAIL: libgomp.c/../libgomp.c-c++-common/alloc-8.c execution test
FAIL: libgomp.c/allocators-1.c (test for excess errors)
FAIL: libgomp.c/allocators-2.c (test for excess errors)
FAIL: libgomp.c/allocators-3.c (test for excess errors)
FAIL: libgomp.c/allocators-4.c (test for excess errors)
FAIL: libgomp.c/allocators-5.c (test for excess errors)
FAIL: libgomp.c/allocators-6.c (test for excess errors)
FAIL: libgomp.c++/../libgomp.c-c++-common/alloc-7.c execution test
FAIL: libgomp.c++/../libgomp.c-c++-common/alloc-8.c execution test
FAIL: libgomp.fortran/alloc-10.f90   -O  execution test
FAIL: libgomp.fortran/alloc-9.f90   -O  execution test
...

The allocators-1.c test-case doesn't compile because:
...
FAIL: libgomp.c/allocators-1.c (test for excess errors)
Excess errors:
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22: 
sorry, unimplemented: '	' clause on 'requires' directive not supported yet


UNRESOLVED: libgomp.c/allocators-1.c compilation failed to produce 
executable

...

So, I suppose I need "[PATCH] OpenMP front-end: allow requires 
dynamic_allocators" as well, I'll try again with that applied.


The alloc-7.c execution test failure is a regression, AFAICT.  It fails 
here:

...
38if uintptr_t) p) % __alignof (int)) != 0 || p[0] || p[1] 
|| p[2])

39  abort ();
...
because:
...
(gdb) p p[0]
$2 = 772014104
(gdb) p p[1]
$3 = 0
(gdb) p p[2]
$4 = 9
...

In other words, the pointer returned by omp_calloc does not point to 
zeroed out memory.


Thanks,
- Tom


Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-05 Thread Tom de Vries via Gcc-patches

On 1/5/22 12:08, Tom de Vries wrote:

The allocators-1.c test-case doesn't compile because:
...
FAIL: libgomp.c/allocators-1.c (test for excess errors)
Excess errors:
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22: 
sorry, unimplemented: '    ' clause on 'requires' directive not 
supported yet


UNRESOLVED: libgomp.c/allocators-1.c compilation failed to produce 
executable

...

So, I suppose I need "[PATCH] OpenMP front-end: allow requires 
dynamic_allocators" as well, I'll try again with that applied.


After applying that, I get:
...
WARNING: program timed out.
FAIL: libgomp.c/allocators-2.c execution test
WARNING: program timed out.
FAIL: libgomp.c/allocators-3.c execution test
...

Thanks,
- Tom


Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-05 Thread Andrew Stubbs

On 05/01/2022 11:08, Tom de Vries wrote:
The alloc-7.c execution test failure is a regression, AFAICT.  It fails 
here:

...
38    if uintptr_t) p) % __alignof (int)) != 0 || p[0] || p[1] 
|| p[2])

39  abort ();
...
because:
...
(gdb) p p[0]
$2 = 772014104
(gdb) p p[1]
$3 = 0
(gdb) p p[2]
$4 = 9
...

In other words, the pointer returned by omp_calloc does not point to 
zeroed out memory.


The version that was applied to OG11 had this bug fixed, but I didn't 
get around to posting the update because Christmas got in the way and 
it's gone out of my mind, sorry.


The attached patch has the fix and also removes the hunk related to the 
PTX update.


Andrew
libgomp, nvptx: low-latency memory allocator

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that the minimum version
requirement is now bumped to 4.1 (still old at this point).

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(dynamic_smem_size): New constants.
(omp_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
Implement fall-backs for predefined allocators.
(omp_realloc): Use MEMSPACE_REALLOC.
Implement fall-backs for predefined allocators.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
(__nvptx_lowlat_pool): New asm varaible.
(gomp_nvptx_main): Initialize the low-latency heap.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/allocators-1.c: New test.
* testsuite/libgomp.c/allocators-2.c: New test.
* testsuite/libgomp.c/allocators-3.c: New test.
* testsuite/libgomp.c/allocators-4.c: New test.
* testsuite/libgomp.c/allocators-5.c: New test.
* testsuite/libgomp.c/allocators-6.c: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 07a5645f4cc..b1f5fe0a5e2 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -34,6 +34,38 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config//allocator.c.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+  ((void)MEMSPACE, malloc (SIZE))
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+  ((void)MEMSPACE, calloc (1, SIZE))
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
+  ((void)MEMSPACE, (void)OLDSIZE, realloc (ADDR, SIZE))
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
+  ((void)MEMSPACE, (void)SIZE, free (ADDR))
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_default_mem_space,   /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 struct omp_allocator_data
 {
   omp_memspace_handle_t memspace;
@@ -281,7 +313,7 @@ retry:
   allocator_data->used_pool_size = used_pool_size;
   gomp_mutex_unlock (&allocator_data->lock);
 #endif
-  ptr = malloc (new_size);
+  ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
   if (ptr == NULL)
{
 #ifdef HAVE_SYNC_BUILTINS
@@ -297,7 +329,10 @@ retry:
 }
   else
 {
-  ptr = malloc (new_size);
+  omp_memspace_handle_t memspace = (allocator_data
+   ? allocator_data->memspace
+   : predefined_alloc_mapping[allocator]);
+  ptr = MEMSPACE_ALLOC (memspace, new_size);
   if (ptr == NULL)
goto fail;
 }
@@ -315,32 +350,35 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+ ? allocator_data->fallback
+ : allocator == omp_default_mem_alloc
+

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-05 Thread Andrew Stubbs

On 05/01/2022 13:04, Tom de Vries wrote:

On 1/5/22 12:08, Tom de Vries wrote:

The allocators-1.c test-case doesn't compile because:
...
FAIL: libgomp.c/allocators-1.c (test for excess errors)
Excess errors:
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22: 
sorry, unimplemented: '    ' clause on 'requires' directive not 
supported yet


UNRESOLVED: libgomp.c/allocators-1.c compilation failed to produce 
executable

...

So, I suppose I need "[PATCH] OpenMP front-end: allow requires 
dynamic_allocators" as well, I'll try again with that applied.


After applying that, I get:
...
WARNING: program timed out.
FAIL: libgomp.c/allocators-2.c execution test
WARNING: program timed out.
FAIL: libgomp.c/allocators-3.c execution test
...


It works for me.

Those tests are doing some large number of allocations repeatedly and in 
parallel to stress the atomics. They're also slightly longer running 
than the other tests.
  - allocators-2 calls omp_alloc 8080 times, over 16 kernel launches, 
some of which will fall back to PTX malloc.
  - allocators-3 calls omp_alloc and omp_free 8 million times each, 
over 8 kernel launches, and takes about a minute to run on my device 
(whether that falls back depends entirely on how the free calls interleave).


Either there is a flaw in the concurrency causing some kind of deadlock, 
or else your timeout is set too short for your device. I hope it's the 
latter. We may need to tweak this.


Andrew


Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-06 Thread Tom de Vries via Gcc-patches

On 1/5/22 15:36, Andrew Stubbs wrote:

On 05/01/2022 13:04, Tom de Vries wrote:

On 1/5/22 12:08, Tom de Vries wrote:

The allocators-1.c test-case doesn't compile because:
...
FAIL: libgomp.c/allocators-1.c (test for excess errors)
Excess errors:
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22: 
sorry, unimplemented: '    ' clause on 'requires' directive not 
supported yet


UNRESOLVED: libgomp.c/allocators-1.c compilation failed to produce 
executable

...

So, I suppose I need "[PATCH] OpenMP front-end: allow requires 
dynamic_allocators" as well, I'll try again with that applied.


After applying that, I get:
...
WARNING: program timed out.
FAIL: libgomp.c/allocators-2.c execution test
WARNING: program timed out.
FAIL: libgomp.c/allocators-3.c execution test
...


It works for me.

Those tests are doing some large number of allocations repeatedly and in 
parallel to stress the atomics. They're also slightly longer running 
than the other tests.
   - allocators-2 calls omp_alloc 8080 times, over 16 kernel launches, 
some of which will fall back to PTX malloc.


I've minimized the test-case by enabling a single call in main at a 
time.  All but the last 4 take about two seconds, the last 4 hang (and 
time out at 5min).


So, this already times out for me:
...
int
main ()
{
  test (1000, omp_low_lat_mem_alloc);
  return 0;
}
...

I tried playing around with the n, and roughly there's no hang below 
100, and a hang above 200, and inbetween there may or may not be a hang.


Again the same dynamic: if there's no hang, it just takes a few seconds.

   - allocators-3 calls omp_alloc and omp_free 8 million times each, 
over 8 kernel launches, and takes about a minute to run on my device 
(whether that falls back depends entirely on how the free calls 
interleave).


Either there is a flaw in the concurrency causing some kind of deadlock, 
or else your timeout is set too short for your device. I hope it's the 
latter. We may need to tweak this.


At first glance, the above behaviour doesn't look like a too short timeout.

[ FTR, I'm using a GT 1030 with production branch driver version 470.86 
(which is one version behind the latest 470.94) ]


Thanks,
- Tom


Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-06 Thread Tom de Vries via Gcc-patches

On 1/6/22 10:29, Tom de Vries wrote:

At first glance, the above behaviour doesn't look like a too short timeout.


Using patch below, this passes for me, I'm currently doing a full build 
and test to confirm.


Looks like it has to do with:
...
For sm_6x and earlier architectures, atom operations on .shared state 
space do not guarantee atomicity with respect to normal store 
instructions to the same address. It is the programmer's responsibility 
to guarantee correctness of programs that use shared memory
atomic instructions, e.g., by inserting barriers between normal stores 
and atomic operations to a common address, or by using atom.exch to 
store to locations accessed by other atomic operations.

...

My current understanding is that this is a backend problem, and needs to 
be fixed by defining atomic_store patterns which take care of this 
peculiarity.


Thanks,
- Tom

diff --git a/libgomp/config/nvptx/allocator.c 
b/libgomp/config/nvptx/allocator.c

index 6bc2ea48043..4524122b3e7 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -122,7 +122,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t 
memspace, size_t size)


}

   /* Update the free chain root and release the lock.  */
-  __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, 
MEMMODEL_RELEASE);

+  __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+  root.raw, MEMMODEL_RELEASE);
   return result;
 }
   else-
@@ -221,7 +222,8 @@ nvptx_memspace_free (omp_memspace_handle_t memspace, 
void *addr, s

ize_t size)
root.raw = newfreechunk.raw;

   /* Update the free chain root and release the lock.  */
-  __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, 
MEMMODEL_RELEASE);

+  __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+  root.raw, MEMMODEL_RELEASE);
 }
   else
 free (addr);
@@ -331,7 +333,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t 
memspace, void *addr

,
   /* Else realloc in-place has failed and result remains NULL.  */

   /* Update the free chain root and release the lock.  */
-  __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, 
MEMMODEL_RELEASE);

+  __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+  root.raw, MEMMODEL_RELEASE);

   if (result == NULL)
{


Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-07 Thread Andrew Stubbs

On 06/01/2022 17:53, Tom de Vries wrote:
My current understanding is that this is a backend problem, and needs to 
be fixed by defining atomic_store patterns which take care of this 
peculiarity.


You mentioned on IRC that I ought to initialize the free chain using 
atomics also, and that you are working on an atomic store implementation.


This patch fixes the initialization issue. It works with my device, I 
think. Please test it with your device when the backend issue is fixed.


Thanks very much!

Andrewlibgomp, nvptx: low-latency memory allocator

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that the minimum version
requirement is now bumped to 4.1 (still old at this point).

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(dynamic_smem_size): New constants.
(omp_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
Implement fall-backs for predefined allocators.
(omp_realloc): Use MEMSPACE_REALLOC.
Implement fall-backs for predefined allocators.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
(__nvptx_lowlat_pool): New asm varaible.
(gomp_nvptx_main): Initialize the low-latency heap.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/allocators-1.c: New test.
* testsuite/libgomp.c/allocators-2.c: New test.
* testsuite/libgomp.c/allocators-3.c: New test.
* testsuite/libgomp.c/allocators-4.c: New test.
* testsuite/libgomp.c/allocators-5.c: New test.
* testsuite/libgomp.c/allocators-6.c: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 07a5645f4cc..b1f5fe0a5e2 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -34,6 +34,38 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config//allocator.c.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+  ((void)MEMSPACE, malloc (SIZE))
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+  ((void)MEMSPACE, calloc (1, SIZE))
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
+  ((void)MEMSPACE, (void)OLDSIZE, realloc (ADDR, SIZE))
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
+  ((void)MEMSPACE, (void)SIZE, free (ADDR))
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_default_mem_space,   /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 struct omp_allocator_data
 {
   omp_memspace_handle_t memspace;
@@ -281,7 +313,7 @@ retry:
   allocator_data->used_pool_size = used_pool_size;
   gomp_mutex_unlock (&allocator_data->lock);
 #endif
-  ptr = malloc (new_size);
+  ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
   if (ptr == NULL)
{
 #ifdef HAVE_SYNC_BUILTINS
@@ -297,7 +329,10 @@ retry:
 }
   else
 {
-  ptr = malloc (new_size);
+  omp_memspace_handle_t memspace = (allocator_data
+   ? allocator_data->memspace
+   : predefined_alloc_mapping[allocator]);
+  ptr = MEMSPACE_ALLOC (memspace, new_size);
   if (ptr == NULL)
goto fail;
 }
@@ -315,32 +350,35 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+ ? allocator_data->fallback
+ : allocator == omp_default_mem_alloc
+ ? omp_atv_null_fb
+ : omp_atv_default_mem_fb);
+  switch (fallback)
 {
-  switch (allocator_data->fallback)
+case omp_atv_defa

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-13 Thread Andrew Stubbs
Updated patch: this version fixes some missed cases of malloc in the 
realloc implementation. It also reworks the unused variable workarounds 
so that the work better with my reworked pinned memory patches I've not 
posted yet.


Andrewlibgomp, nvptx: low-latency memory allocator

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that the minimum version
requirement is now bumped to 4.1 (still old at this point).

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(dynamic_smem_size): New constants.
(omp_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
Implement fall-backs for predefined allocators.
(omp_realloc): Use MEMSPACE_REALLOC and MEMSPACE_ALLOC..
Implement fall-backs for predefined allocators.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
(__nvptx_lowlat_pool): New asm varaible.
(gomp_nvptx_main): Initialize the low-latency heap.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/allocators-1.c: New test.
* testsuite/libgomp.c/allocators-2.c: New test.
* testsuite/libgomp.c/allocators-3.c: New test.
* testsuite/libgomp.c/allocators-4.c: New test.
* testsuite/libgomp.c/allocators-5.c: New test.
* testsuite/libgomp.c/allocators-6.c: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 07a5645f4cc..1cc7486fc4c 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -34,6 +34,34 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config//allocator.c.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_default_mem_space,   /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 struct omp_allocator_data
 {
   omp_memspace_handle_t memspace;
@@ -281,7 +309,7 @@ retry:
   allocator_data->used_pool_size = used_pool_size;
   gomp_mutex_unlock (&allocator_data->lock);
 #endif
-  ptr = malloc (new_size);
+  ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
   if (ptr == NULL)
{
 #ifdef HAVE_SYNC_BUILTINS
@@ -297,7 +325,11 @@ retry:
 }
   else
 {
-  ptr = malloc (new_size);
+  omp_memspace_handle_t memspace __attribute__((unused))
+   = (allocator_data
+  ? allocator_data->memspace
+  : predefined_alloc_mapping[allocator]);
+  ptr = MEMSPACE_ALLOC (memspace, new_size);
   if (ptr == NULL)
goto fail;
 }
@@ -315,32 +347,35 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+ ? allocator_data->fallback
+ : allocator == omp_default_mem_alloc
+ ? omp_atv_null_fb
+ : omp_atv_default_mem_fb);
+  switch (fallback)
 {
-  switch (allocator_data->fallback)
+case omp_atv_default_mem_fb:
+  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
+ || (allocator_data
+ && allocator_data->pool_size < ~(uintptr_t) 0)
+ || !allocator_data)
{
-   case omp_atv_default_mem_fb:
- if ((new_alignment > sizeof (void *) && new_alignment > alignment)
- || (allocator_data
- && allocator_data->pool

[og12] In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE' (was: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator)

2023-02-14 Thread Thomas Schwinge
Hi Andrew!

On 2022-01-13T11:13:51+, Andrew Stubbs  wrote:
> Updated patch: this version fixes some missed cases of malloc in the
> realloc implementation.

Right, and as it seems I've run into another issue: a stray 'free'.

> --- a/libgomp/allocator.c
> +++ b/libgomp/allocator.c

Re 'omp_realloc':

> @@ -660,9 +709,10 @@ retry:
>gomp_mutex_unlock (&allocator_data->lock);
>  #endif
>if (prev_size)
> - new_ptr = realloc (data->ptr, new_size);
> + new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
> + data->size, new_size);
>else
> - new_ptr = malloc (new_size);
> + new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
>if (new_ptr == NULL)
>   {
>  #ifdef HAVE_SYNC_BUILTINS
> @@ -690,7 +740,11 @@ retry:
>  && (free_allocator_data == NULL
>  || free_allocator_data->pool_size == ~(uintptr_t) 0))
>  {
> -  new_ptr = realloc (data->ptr, new_size);
> +  omp_memspace_handle_t memspace __attribute__((unused))
> + = (allocator_data
> +? allocator_data->memspace
> +: predefined_alloc_mapping[allocator]);
> +  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size);
>if (new_ptr == NULL)
>   goto fail;
>ret = (char *) new_ptr + sizeof (struct omp_mem_header);
> @@ -701,7 +755,11 @@ retry:
>  }
>else
>  {
> -  new_ptr = malloc (new_size);
> +  omp_memspace_handle_t memspace __attribute__((unused))
> + = (allocator_data
> +? allocator_data->memspace
> +: predefined_alloc_mapping[allocator]);
> +  new_ptr = MEMSPACE_ALLOC (memspace, new_size);
>if (new_ptr == NULL)
>   goto fail;
>  }
> @@ -735,32 +793,35 @@ retry:
|free (data->ptr);
>return ret;

I run into a SIGSEGV if a non-'malloc'-based allocation is 'free'd here.

The attached
"In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE'"
appears to resolve my issue, but not yet regression-tested.  Does that
look correct to you?

Or, instead of invoking 'MEMSPACE_FREE', should we scrap the
'used_pool_size' bookkeeping here, and just invoke 'omp_free' instead?

--- libgomp/allocator.c
+++ libgomp/allocator.c
@@ -842,19 +842,7 @@ retry:
   if (old_size - old_alignment < size)
 size = old_size - old_alignment;
   memcpy (ret, ptr, size);
-  if (__builtin_expect (free_allocator_data
-   && free_allocator_data->pool_size < ~(uintptr_t) 0, 0))
-{
-#ifdef HAVE_SYNC_BUILTINS
-  __atomic_add_fetch (&free_allocator_data->used_pool_size, 
-data->size,
- MEMMODEL_RELAXED);
-#else
-  gomp_mutex_lock (&free_allocator_data->lock);
-  free_allocator_data->used_pool_size -= data->size;
-  gomp_mutex_unlock (&free_allocator_data->lock);
-#endif
-}
-  free (data->ptr);
+  ialias_call (omp_free) (ptr, free_allocator);
   return ret;

(I've not yet analyzed whether that's completely equivalent.)


Note that this likewise applies to the current upstream submission:

"libgomp, nvptx: low-latency memory allocator".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From d49d0b9dc4f96c496afb2d5caac4addb382fdf39 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 14 Feb 2023 13:35:03 +0100
Subject: [PATCH] In 'libgomp/allocator.c:omp_realloc', route 'free' through
 'MEMSPACE_FREE'

... to not run into a SIGSEGV if a non-'malloc'-based allocation is 'free'd
here.

Fix-up for og12 commit c5d1d7651297a273321154a5fe1b01eba9dcf604
"libgomp, nvptx: low-latency memory allocator".

	libgomp/
	* allocator.c (omp_realloc): Route 'free' through 'MEMSPACE_FREE'.
---
 libgomp/allocator.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 05b323d458e2..ba9a4e17cc20 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -854,7 +854,17 @@ retry:
   gomp_mutex_unlock (&free_allocator_data->lock);
 #endif
 }
-  free (data->ptr);
+  {
+omp_memspace_handle_t was_memspace __attribute__((unused))
+  = (free_allocator_data
+	 ? free_allocator_data->memspace
+	 : predefined_alloc_mapping[free_allocator]);
+int was_pinned __attribute__((unused))
+  = (free_allocator_data
+	 ? free_allocator_data->pinned
+	 : free_allocator == ompx_pinned_mem_alloc);
+MEMSPACE_FREE (was_memspace, data->ptr, data->size, was_pinned);
+  }
   return ret;
 
 fail:
-- 
2.39.1



Re: [og12] In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE' (was: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator)

2023-02-14 Thread Andrew Stubbs

On 14/02/2023 12:54, Thomas Schwinge wrote:

Hi Andrew!

On 2022-01-13T11:13:51+, Andrew Stubbs  wrote:

Updated patch: this version fixes some missed cases of malloc in the
realloc implementation.


Right, and as it seems I've run into another issue: a stray 'free'.


--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c


Re 'omp_realloc':


@@ -660,9 +709,10 @@ retry:
gomp_mutex_unlock (&allocator_data->lock);
  #endif
if (prev_size)
- new_ptr = realloc (data->ptr, new_size);
+ new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
+ data->size, new_size);
else
- new_ptr = malloc (new_size);
+ new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
if (new_ptr == NULL)
   {
  #ifdef HAVE_SYNC_BUILTINS
@@ -690,7 +740,11 @@ retry:
  && (free_allocator_data == NULL
  || free_allocator_data->pool_size == ~(uintptr_t) 0))
  {
-  new_ptr = realloc (data->ptr, new_size);
+  omp_memspace_handle_t memspace __attribute__((unused))
+ = (allocator_data
+? allocator_data->memspace
+: predefined_alloc_mapping[allocator]);
+  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size);
if (new_ptr == NULL)
   goto fail;
ret = (char *) new_ptr + sizeof (struct omp_mem_header);
@@ -701,7 +755,11 @@ retry:
  }
else
  {
-  new_ptr = malloc (new_size);
+  omp_memspace_handle_t memspace __attribute__((unused))
+ = (allocator_data
+? allocator_data->memspace
+: predefined_alloc_mapping[allocator]);
+  new_ptr = MEMSPACE_ALLOC (memspace, new_size);
if (new_ptr == NULL)
   goto fail;
  }
@@ -735,32 +793,35 @@ retry:

|free (data->ptr);

return ret;


I run into a SIGSEGV if a non-'malloc'-based allocation is 'free'd here.

The attached
"In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE'"
appears to resolve my issue, but not yet regression-tested.  Does that
look correct to you?


That looks correct. The only remaining use of "free" should be the one 
referring to the allocator object itself (i.e. the destructor).



Or, instead of invoking 'MEMSPACE_FREE', should we scrap the
'used_pool_size' bookkeeping here, and just invoke 'omp_free' instead?

 --- libgomp/allocator.c
 +++ libgomp/allocator.c
 @@ -842,19 +842,7 @@ retry:
if (old_size - old_alignment < size)
  size = old_size - old_alignment;
memcpy (ret, ptr, size);
 -  if (__builtin_expect (free_allocator_data
 -   && free_allocator_data->pool_size < ~(uintptr_t) 0, 0))
 -{
 -#ifdef HAVE_SYNC_BUILTINS
 -  __atomic_add_fetch (&free_allocator_data->used_pool_size, 
-data->size,
 - MEMMODEL_RELAXED);
 -#else
 -  gomp_mutex_lock (&free_allocator_data->lock);
 -  free_allocator_data->used_pool_size -= data->size;
 -  gomp_mutex_unlock (&free_allocator_data->lock);
 -#endif
 -}
 -  free (data->ptr);
 +  ialias_call (omp_free) (ptr, free_allocator);
return ret;

(I've not yet analyzed whether that's completely equivalent.)


The used_pool_size code comes from upstream, so if you want to go beyond 
the mechanical substitution of "free" then you're adding a new patch 
(rather than tweaking an old one). I'll leave that for others to comment on.


Andrew


Re: [og12] In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE' (was: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator)

2023-02-16 Thread Thomas Schwinge
Hi!

On 2023-02-14T15:11:14+, Andrew Stubbs  wrote:
> On 14/02/2023 12:54, Thomas Schwinge wrote:
>> On 2022-01-13T11:13:51+, Andrew Stubbs  wrote:
>>> Updated patch: this version fixes some missed cases of malloc in the
>>> realloc implementation.
>>
>> Right, and as it seems I've run into another issue: a stray 'free'.
>>
>>> --- a/libgomp/allocator.c
>>> +++ b/libgomp/allocator.c
>>
>> Re 'omp_realloc':
>>
>>> @@ -660,9 +709,10 @@ retry:
>>> gomp_mutex_unlock (&allocator_data->lock);
>>>   #endif
>>> if (prev_size)
>>> - new_ptr = realloc (data->ptr, new_size);
>>> + new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
>>> + data->size, new_size);
>>> else
>>> - new_ptr = malloc (new_size);
>>> + new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
>>> if (new_ptr == NULL)
>>>{
>>>   #ifdef HAVE_SYNC_BUILTINS
>>> @@ -690,7 +740,11 @@ retry:
>>>   && (free_allocator_data == NULL
>>>   || free_allocator_data->pool_size == ~(uintptr_t) 0))
>>>   {
>>> -  new_ptr = realloc (data->ptr, new_size);
>>> +  omp_memspace_handle_t memspace __attribute__((unused))
>>> + = (allocator_data
>>> +? allocator_data->memspace
>>> +: predefined_alloc_mapping[allocator]);
>>> +  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, 
>>> new_size);
>>> if (new_ptr == NULL)
>>>goto fail;
>>> ret = (char *) new_ptr + sizeof (struct omp_mem_header);
>>> @@ -701,7 +755,11 @@ retry:
>>>   }
>>> else
>>>   {
>>> -  new_ptr = malloc (new_size);
>>> +  omp_memspace_handle_t memspace __attribute__((unused))
>>> + = (allocator_data
>>> +? allocator_data->memspace
>>> +: predefined_alloc_mapping[allocator]);
>>> +  new_ptr = MEMSPACE_ALLOC (memspace, new_size);
>>> if (new_ptr == NULL)
>>>goto fail;
>>>   }
>>> @@ -735,32 +793,35 @@ retry:
>> |free (data->ptr);
>>> return ret;
>>
>> I run into a SIGSEGV if a non-'malloc'-based allocation is 'free'd here.
>>
>> The attached
>> "In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE'"
>> appears to resolve my issue, but not yet regression-tested.

No issues in testing.

>> Does that
>> look correct to you?
>
> That looks correct.

Thanks.  I've pushed to devel/omp/gcc-12 branch
commit 3a2c07395b0a565955a7b86f0eba866937e15989
"In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE'",
see attached.

> The only remaining use of "free" should be the one
> referring to the allocator object itself (i.e. the destructor).

ACK.

>> Or, instead of invoking 'MEMSPACE_FREE', should we scrap the
>> 'used_pool_size' bookkeeping here, and just invoke 'omp_free' instead?
>>
>>  --- libgomp/allocator.c
>>  +++ libgomp/allocator.c
>>  @@ -842,19 +842,7 @@ retry:
>> if (old_size - old_alignment < size)
>>   size = old_size - old_alignment;
>> memcpy (ret, ptr, size);
>>  -  if (__builtin_expect (free_allocator_data
>>  -   && free_allocator_data->pool_size < ~(uintptr_t) 0, 
>> 0))
>>  -{
>>  -#ifdef HAVE_SYNC_BUILTINS
>>  -  __atomic_add_fetch (&free_allocator_data->used_pool_size, 
>> -data->size,
>>  - MEMMODEL_RELAXED);
>>  -#else
>>  -  gomp_mutex_lock (&free_allocator_data->lock);
>>  -  free_allocator_data->used_pool_size -= data->size;
>>  -  gomp_mutex_unlock (&free_allocator_data->lock);
>>  -#endif
>>  -}
>>  -  free (data->ptr);
>>  +  ialias_call (omp_free) (ptr, free_allocator);
>> return ret;
>>
>> (I've not yet analyzed whether that's completely equivalent.)
>
> The used_pool_size code comes from upstream, so if you want to go beyond
> the mechanical substitution of "free" then you're adding a new patch
> (rather than tweaking an old one). I'll leave that for others to comment on.

And I'll leave that for another day, and/or another person.  ;-)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3a2c07395b0a565955a7b86f0eba866937e15989 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 14 Feb 2023 13:35:03 +0100
Subject: [PATCH] In 'libgomp/allocator.c:omp_realloc', route 'free' through
 'MEMSPACE_FREE'

... to not run into a SIGSEGV if a non-'malloc'-based allocation is 'free'd
here.

Fix-up for og12 commit c5d1d7651297a273321154a5fe1b01eba9dcf604
"libgomp, nvptx: low-latency memory allocator".

	libgomp/
	* allocator.c (omp_realloc): Route 'free' through 'MEMSPACE_FREE'.
---
 libgomp/ChangeLog.omp |  2 ++
 libgomp/allocator.c   | 12 +++-
 2 files changed, 13 insertions(+), 1