[PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2020-12-04 Thread Chung-Lin Tang

Hi Jakub,
this is a new version of the structure element mapping patch for OpenMP 5.0 
requirement
changes.

This one uses the approach you've outlined in your concept patch [1], basically 
to
use more special REFCOUNT_* values to mark them, and link following structure 
element
splay_tree_keys back to the first key's refcount.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557622.html

Implementation notes of the attached patch:

(1) This patch solves the 5.0 requirements of "not already 
incremented/decremented
because of the effect of a map clause on the construct" by pulling in 
libgomp/hashtab.h
and using htab_t as a pointer set. A "htab_t *refcount_set" is added in 
map/unmap
routines to track the processing status of the uintptr_t* addresses of refcount
fields in splay_tree_keys.

   * Currently this patch is using the same htab_create/htab_free routines like 
in task.c.
 I toyed with creating a 'htab_alloca' macro (allocating a fixed size htab) 
to speed
 things further, but decided to play it safer for the current patch.

(2) Because of the use of pointer-to-refcounts as the basis, and structure 
element
siblings all share a same refcount, uniform increment/decrement without 
repeating is
also naturally achieved.

(3) Because of the need to remove whole structure element sibling sequences out 
of
context, it appears we need to mark the first/last of such a sequence. You'll 
see that
the special REFCOUNT_* values have been expanded a bit more than your concept 
patch
(at some point we should think about stop abusing it and add a proper flags 
word)

(4) The new increment/decrement routines combine most of the new refcount_set 
lookup
code with the refcount adjusting. For the decrement routine, "copy" and 
"removal" are
now separate return values, since for structure element sequences, even when 
signalling
"removal" you may still need to finish the "copy" work of following 
target_var_descs.

(5) There are some re-organizing changes to oacc-parallel.c and oacc-mem.c, but 
most
of the code that matters is in target.c.

(6) New testcases have been added to reflect the cases discussed on omp-lang 
list.

This patch has been tested for libgomp with no regressions on x86_64-linux with
nvptx offloading. Since I submitted the first "v1" patch long ago, is this okay 
to be
considered as committable now after approval?

Thanks,
Chung-Lin

2020-12-04  Chung-Lin Tang  

libgomp/
* hashtab.h (htab_clear): New function with initialization code
factored out from...
(htab_create): ...here, adjust to use htab_clear function.

* libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
special refcount values, add comments.
(REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
(REFCOUNT_LINK): Likewise.
(REFCOUNT_STRUCTELEM): New special refcount range for structure
element siblings.
(REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
sibling maps.
(REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
(REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
(REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
(REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
(struct splay_tree_key_s): Add structelem_refcount and
structelem_refcount_ptr fields into a union with dynamic_refcount.
Add comments.
(gomp_map_vars): Delete declaration.
(gomp_map_vars_async): Likewise.
(gomp_unmap_vars): Likewise.
(gomp_unmap_vars_async): Likewise.
(goacc_map_vars): New declaration.
(goacc_unmap_vars): Likewise.

* oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
(goacc_enter_datum): Likewise.
(goacc_enter_data_internal): Likewise.
* oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
and goacc_unmap_vars.
(GOACC_data_start): Adjust to use goacc_map_vars.
(GOACC_data_end): Adjust to use goacc_unmap_vars.

* target.c (hash_entry_type): New typedef.
(htab_alloc): New function hook for hashtab.h.
(htab_free): Likewise.
(htab_hash): Likewise.
(htab_eq): Likewise.
(hashtab.h): Add file include.
(gomp_increment_refcount): New function.
(gomp_decrement_refcount): Likewise.
(gomp_map_vars_existing): Add refcount_set parameter, adjust to use
gomp_increment_refcount.
(gomp_map_fields_existing): Add refcount_set parameter, adjust calls
to gomp_map_vars_existing.

(gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
variable to guard OpenMP specific paths, adjust calls to
gomp_map_vars_existing, add structure element sibling splay_tree_key
sequence creation code, adjust Fortran map case to avoid increment
under OpenMP.
(gomp_map_vars): Adju

Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2021-01-13 Thread Chung-Lin Tang

Ping x2.

Hi Jakub, would like this part of OpenMP 5.0 to be considered for GCC 11.

Thanks,
Chung-Lin

On 2020/12/14 6:32 PM, Chung-Lin Tang wrote:

Ping.

On 2020/12/4 10:15 PM, Chung-Lin Tang wrote:

Hi Jakub,
this is a new version of the structure element mapping patch for OpenMP 5.0 
requirement
changes.

This one uses the approach you've outlined in your concept patch [1], basically 
to
use more special REFCOUNT_* values to mark them, and link following structure 
element
splay_tree_keys back to the first key's refcount.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557622.html

Implementation notes of the attached patch:

(1) This patch solves the 5.0 requirements of "not already 
incremented/decremented
because of the effect of a map clause on the construct" by pulling in 
libgomp/hashtab.h
and using htab_t as a pointer set. A "htab_t *refcount_set" is added in 
map/unmap
routines to track the processing status of the uintptr_t* addresses of refcount
fields in splay_tree_keys.

    * Currently this patch is using the same htab_create/htab_free routines 
like in task.c.
  I toyed with creating a 'htab_alloca' macro (allocating a fixed size 
htab) to speed
  things further, but decided to play it safer for the current patch.

(2) Because of the use of pointer-to-refcounts as the basis, and structure 
element
siblings all share a same refcount, uniform increment/decrement without 
repeating is
also naturally achieved.

(3) Because of the need to remove whole structure element sibling sequences out 
of
context, it appears we need to mark the first/last of such a sequence. You'll 
see that
the special REFCOUNT_* values have been expanded a bit more than your concept 
patch
(at some point we should think about stop abusing it and add a proper flags 
word)

(4) The new increment/decrement routines combine most of the new refcount_set 
lookup
code with the refcount adjusting. For the decrement routine, "copy" and 
"removal" are
now separate return values, since for structure element sequences, even when 
signalling
"removal" you may still need to finish the "copy" work of following 
target_var_descs.

(5) There are some re-organizing changes to oacc-parallel.c and oacc-mem.c, but 
most
of the code that matters is in target.c.

(6) New testcases have been added to reflect the cases discussed on omp-lang 
list.

This patch has been tested for libgomp with no regressions on x86_64-linux with
nvptx offloading. Since I submitted the first "v1" patch long ago, is this okay 
to be
considered as committable now after approval?

Thanks,
Chung-Lin

2020-12-04  Chung-Lin Tang  

 libgomp/
 * hashtab.h (htab_clear): New function with initialization code
 factored out from...
 (htab_create): ...here, adjust to use htab_clear function.

 * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
 special refcount values, add comments.
 (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
 (REFCOUNT_LINK): Likewise.
 (REFCOUNT_STRUCTELEM): New special refcount range for structure
 element siblings.
 (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
 sibling maps.
 (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
 (REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
 (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
 (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
 (struct splay_tree_key_s): Add structelem_refcount and
 structelem_refcount_ptr fields into a union with dynamic_refcount.
 Add comments.
 (gomp_map_vars): Delete declaration.
 (gomp_map_vars_async): Likewise.
 (gomp_unmap_vars): Likewise.
 (gomp_unmap_vars_async): Likewise.
 (goacc_map_vars): New declaration.
 (goacc_unmap_vars): Likewise.

 * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
 (goacc_enter_datum): Likewise.
 (goacc_enter_data_internal): Likewise.
 * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
 and goacc_unmap_vars.
 (GOACC_data_start): Adjust to use goacc_map_vars.
 (GOACC_data_end): Adjust to use goacc_unmap_vars.

 * target.c (hash_entry_type): New typedef.
 (htab_alloc): New function hook for hashtab.h.
 (htab_free): Likewise.
 (htab_hash): Likewise.
 (htab_eq): Likewise.
 (hashtab.h): Add file include.
 (gomp_increment_refcount): New function.
 (gomp_decrement_refcount): Likewise.
 (gomp_map_vars_existing): Add refcount_set parameter, adjust to use
 gomp_increment_refcount.
 (gomp_map_fields_existing): Add refcount_set parameter, adjust calls
 to gomp_map_vars_existing.

 (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
 variable to guard OpenMP specific paths, adjust calls to
 gomp_map_vars_existing, add structure element sibling splay_tree_key
 sequence creation code, adjust Fortran map case to avoid

Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2021-01-16 Thread Jakub Jelinek via Gcc-patches
On Fri, Dec 04, 2020 at 10:15:46PM +0800, Chung-Lin Tang wrote:
> this is a new version of the structure element mapping patch for OpenMP 5.0 
> requirement
> changes.

Sorry for the delay.

> +/* Unified reference count for structure element siblings, this is used
> +   when REFCOUNT_STRUCTELEM_FIRST_P(k->refcount) == true, the first 
> sibling
> +   in a structure element sibling list item sequence.  */
> +uintptr_t structelem_refcount;
> +
> +/* When REFCOUNT_STRUCTELEM_P (k->refcount) == true, this field points

REFCOUNT_STRUCTELEM_P (k->refcount) is true even for
REFCOUNT_STRUCTELEM_FIRST_P(k->refcount), so shouldn't the description say
that structelem_refcount_ptr is only used if
REFCOUNT_STRUCTELEM_P (k->refcount) && !REFCOUNT_STRUCTELEM_FIRST_P 
(k->refcount)
?
> +   into the (above) structelem_refcount field of the _FIRST 
> splay_tree_key,
> +   the first key in the created sequence. All structure element siblings
> +   share a single refcount in this manner. Since these two fields won't 
> be
> +   used at the same time, they are stashed in a union.  */
> +uintptr_t *structelem_refcount_ptr;
> +  };
>struct splay_tree_aux *aux;
>  };
>  
>  /* The comparison function.  */

Anyway, most of the patch looks good, but I'd like to understand the
rationale for choosing a htab over what I've been trying to suggest, which
was essentially instead of incrementing or decrementing refcounts push them
into a vector for later incrementing/decrementing, then qsort the vector
(by the pointers to refcounts) and increment what the elements point to unless
the same address has been incremented/decremented already.

Jakub



Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2021-01-19 Thread Chung-Lin Tang




On 2021/1/16 5:45 下午, Jakub Jelinek wrote:

+/* Unified reference count for structure element siblings, this is used
+   when REFCOUNT_STRUCTELEM_FIRST_P(k->refcount) == true, the first sibling
+   in a structure element sibling list item sequence.  */
+uintptr_t structelem_refcount;
+
+/* When REFCOUNT_STRUCTELEM_P (k->refcount) == true, this field points


REFCOUNT_STRUCTELEM_P (k->refcount) is true even for
REFCOUNT_STRUCTELEM_FIRST_P(k->refcount), so shouldn't the description say
that structelem_refcount_ptr is only used if
REFCOUNT_STRUCTELEM_P (k->refcount) && !REFCOUNT_STRUCTELEM_FIRST_P 
(k->refcount)
?


Sure, I'll revise the comments a bit.


+   into the (above) structelem_refcount field of the _FIRST splay_tree_key,
+   the first key in the created sequence. All structure element siblings
+   share a single refcount in this manner. Since these two fields won't be
+   used at the same time, they are stashed in a union.  */
+uintptr_t *structelem_refcount_ptr;
+  };
struct splay_tree_aux *aux;
  };
  
  /* The comparison function.  */


Anyway, most of the patch looks good, but I'd like to understand the
rationale for choosing a htab over what I've been trying to suggest, which
was essentially instead of incrementing or decrementing refcounts push them
into a vector for later incrementing/decrementing, then qsort the vector
(by the pointers to refcounts) and increment what the elements point to unless
the same address has been incremented/decremented already.

Jakub


Essentially the requirement is to increment/decrement a refcount only once per 
construct,
so using a pointer-set (implemented by htab_t here) to track the processing 
status
seemed to be more intuitive in code, and probably faster than sorting a vector 
I think
(at least in most cases).

Chung-Lin


Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2021-01-19 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 19, 2021 at 04:46:36PM +0800, Chung-Lin Tang wrote:
> > > +   into the (above) structelem_refcount field of the _FIRST 
> > > splay_tree_key,
> > > +   the first key in the created sequence. All structure element 
> > > siblings
> > > +   share a single refcount in this manner. Since these two fields 
> > > won't be
> > > +   used at the same time, they are stashed in a union.  */
> > > +uintptr_t *structelem_refcount_ptr;
> > > +  };
> > > struct splay_tree_aux *aux;
> > >   };
> > >   /* The comparison function.  */
> > 
> > Anyway, most of the patch looks good, but I'd like to understand the
> > rationale for choosing a htab over what I've been trying to suggest, which
> > was essentially instead of incrementing or decrementing refcounts push them
> > into a vector for later incrementing/decrementing, then qsort the vector
> > (by the pointers to refcounts) and increment what the elements point to 
> > unless
> > the same address has been incremented/decremented already.
> > 
> > Jakub
> 
> Essentially the requirement is to increment/decrement a refcount only once 
> per construct,
> so using a pointer-set (implemented by htab_t here) to track the processing 
> status
> seemed to be more intuitive in code, and probably faster than sorting a 
> vector I think
> (at least in most cases).

I agree about the more intuitive, but think it will be actually slower, and
performance is what we care about most here, the mapping is already too
slow.
The common case is only a few mappings and no repeated mappings (e.g. the
compiler ought to help there and just remove mappings that are provably
duplicate if possible).  E.g. with one mapping, no qsort is needed at all,
and generally should be O(n log n).  The hash set needs larger memory
allocation than the vector and needs it cleared, plus it is a hash table
without chains, so there is some cost on collisions and if ever the hash
table needs to be expanded.  But I'll be happy to be proven wrong.

Jakub



Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2020-12-14 Thread Chung-Lin Tang

Ping.

On 2020/12/4 10:15 PM, Chung-Lin Tang wrote:

Hi Jakub,
this is a new version of the structure element mapping patch for OpenMP 5.0 
requirement
changes.

This one uses the approach you've outlined in your concept patch [1], basically 
to
use more special REFCOUNT_* values to mark them, and link following structure 
element
splay_tree_keys back to the first key's refcount.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557622.html

Implementation notes of the attached patch:

(1) This patch solves the 5.0 requirements of "not already 
incremented/decremented
because of the effect of a map clause on the construct" by pulling in 
libgomp/hashtab.h
and using htab_t as a pointer set. A "htab_t *refcount_set" is added in 
map/unmap
routines to track the processing status of the uintptr_t* addresses of refcount
fields in splay_tree_keys.

    * Currently this patch is using the same htab_create/htab_free routines 
like in task.c.
  I toyed with creating a 'htab_alloca' macro (allocating a fixed size 
htab) to speed
  things further, but decided to play it safer for the current patch.

(2) Because of the use of pointer-to-refcounts as the basis, and structure 
element
siblings all share a same refcount, uniform increment/decrement without 
repeating is
also naturally achieved.

(3) Because of the need to remove whole structure element sibling sequences out 
of
context, it appears we need to mark the first/last of such a sequence. You'll 
see that
the special REFCOUNT_* values have been expanded a bit more than your concept 
patch
(at some point we should think about stop abusing it and add a proper flags 
word)

(4) The new increment/decrement routines combine most of the new refcount_set 
lookup
code with the refcount adjusting. For the decrement routine, "copy" and 
"removal" are
now separate return values, since for structure element sequences, even when 
signalling
"removal" you may still need to finish the "copy" work of following 
target_var_descs.

(5) There are some re-organizing changes to oacc-parallel.c and oacc-mem.c, but 
most
of the code that matters is in target.c.

(6) New testcases have been added to reflect the cases discussed on omp-lang 
list.

This patch has been tested for libgomp with no regressions on x86_64-linux with
nvptx offloading. Since I submitted the first "v1" patch long ago, is this okay 
to be
considered as committable now after approval?

Thanks,
Chung-Lin

2020-12-04  Chung-Lin Tang  

 libgomp/
 * hashtab.h (htab_clear): New function with initialization code
 factored out from...
 (htab_create): ...here, adjust to use htab_clear function.

 * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
 special refcount values, add comments.
 (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
 (REFCOUNT_LINK): Likewise.
 (REFCOUNT_STRUCTELEM): New special refcount range for structure
 element siblings.
 (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
 sibling maps.
 (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
 (REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
 (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
 (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
 (struct splay_tree_key_s): Add structelem_refcount and
 structelem_refcount_ptr fields into a union with dynamic_refcount.
 Add comments.
 (gomp_map_vars): Delete declaration.
 (gomp_map_vars_async): Likewise.
 (gomp_unmap_vars): Likewise.
 (gomp_unmap_vars_async): Likewise.
 (goacc_map_vars): New declaration.
 (goacc_unmap_vars): Likewise.

 * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
 (goacc_enter_datum): Likewise.
 (goacc_enter_data_internal): Likewise.
 * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
 and goacc_unmap_vars.
 (GOACC_data_start): Adjust to use goacc_map_vars.
 (GOACC_data_end): Adjust to use goacc_unmap_vars.

 * target.c (hash_entry_type): New typedef.
 (htab_alloc): New function hook for hashtab.h.
 (htab_free): Likewise.
 (htab_hash): Likewise.
 (htab_eq): Likewise.
 (hashtab.h): Add file include.
 (gomp_increment_refcount): New function.
 (gomp_decrement_refcount): Likewise.
 (gomp_map_vars_existing): Add refcount_set parameter, adjust to use
 gomp_increment_refcount.
 (gomp_map_fields_existing): Add refcount_set parameter, adjust calls
 to gomp_map_vars_existing.

 (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
 variable to guard OpenMP specific paths, adjust calls to
 gomp_map_vars_existing, add structure element sibling splay_tree_key
 sequence creation code, adjust Fortran map case to avoid increment
 under OpenMP.
 (gomp_map_vars): Adjust to static, add refcount_set parameter, manage
 local refcount_set if caller passed in N

[og12] In 'libgomp/target.c:gomp_unmap_vars_internal', defer 'gomp_remove_var' (was: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0)

2023-03-24 Thread Thomas Schwinge
Hi!

On 2020-12-04T22:15:46+0800, Chung-Lin Tang  wrote:
> this is a new version of the structure element mapping patch for OpenMP 5.0 
> requirement
> changes.

>   (gomp_exit_data): [...]
>   adjust to queue splay-tree keys for removal
>   after main loop.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -2485,14 +2714,17 @@ gomp_exit_data (struct gomp_device_descr *devicep, 
> size_t mapnum,

> +  int nrmvars = 0;
> +  splay_tree_key remove_vars[mapnum];
> +
>for (i = 0; i < mapnum; i++)
>  {

> -   if (k->refcount == 0)
> - gomp_remove_var (devicep, k);
> +
> +   /* Structure elements lists are removed altogether at once, which
> +  may cause immediate deallocation of the target_mem_desc, causing
> +  errors if we still have following element siblings to copy back.
> +  While we're at it, it also seems more disciplined to simply
> +  queue all removals together for processing below.
> +
> +  Structured block unmapping (i.e. gomp_unmap_vars_internal) should
> +  not have this problem, since they maintain an additional
> +  tgt->refcount = 1 reference to the target_mem_desc to start with.
> +   */
> +   if (do_remove)
> + remove_vars[nrmvars++] = k;

>  }
>
> +  for (int i = 0; i < nrmvars; i++)
> +gomp_remove_var (devicep, remove_vars[i]);
> +
>gomp_mutex_unlock (&devicep->lock);
>  }

Upcoming work of mine actually now does require this change also for
'gomp_unmap_vars_internal', such that 'gomp_remove_var' be deferred until
after all 'gomp_copy_dev2host' calls have been handled.
I've pushed to devel/omp/gcc-12
commit 65037818987ffce7d6f466fa8bde13e9f59a3218
"In 'libgomp/target.c:gomp_unmap_vars_internal', defer 'gomp_remove_var'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 65037818987ffce7d6f466fa8bde13e9f59a3218 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 14 Mar 2023 19:42:12 +0100
Subject: [PATCH] In 'libgomp/target.c:gomp_unmap_vars_internal', defer
 'gomp_remove_var'

An upcoming change requires that 'gomp_remove_var' be deferred until after all
'gomp_copy_dev2host' calls have been handled.

Do this likewise to how commit 275c736e732d29934e4d22e8f030d5aae8c12a52
"libgomp: Structure element mapping for OpenMP 5.0" changed 'gomp_exit_data'.

	libgomp/
	* target.c (gomp_unmap_vars_internal): Queue splay-tree keys for
	removal after main loop.
---
 libgomp/ChangeLog.omp |  3 +++
 libgomp/target.c  | 34 +++---
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 85ebab14ba8..9360db66b03 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,8 @@
 2023-03-24  Thomas Schwinge  
 
+	* target.c (gomp_unmap_vars_internal): Queue splay-tree keys for
+	removal after main loop.
+
 	PR other/76739
 	* oacc-parallel.c (GOACC_parallel_keyed): Given OpenACC 'async',
 	defer 'free' of non-contiguous array support data structures.
diff --git a/libgomp/target.c b/libgomp/target.c
index aaa597f6610..107c3567a30 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2180,6 +2180,9 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
 			 false, NULL);
 }
 
+  size_t nrmvars = 0;
+  splay_tree_key remove_vars[tgt->list_count];
+
   for (i = 0; i < tgt->list_count; i++)
 {
   splay_tree_key k = tgt->list[i].key;
@@ -2201,16 +2204,21 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
 			(void *) (k->tgt->tgt_start + k->tgt_offset
   + tgt->list[i].offset),
 			tgt->list[i].length);
+  /* Queue all removals together for processing below.
+	 See also 'gomp_exit_data'.  */
   if (do_remove)
-	{
-	  struct target_mem_desc *k_tgt = k->tgt;
-	  bool is_tgt_unmapped = gomp_remove_var (devicep, k);
-	  /* It would be bad if TGT got unmapped while we're still iterating
-	 over its LIST_COUNT, and also expect to use it in the following
-	 code.  */
-	  assert (!is_tgt_unmapped
-		  || k_tgt != tgt);
-	}
+	remove_vars[nrmvars++] = k;
+}
+
+  for (i = 0; i < nrmvars; i++)
+{
+  splay_tree_key k = remove_vars[i];
+  struct target_mem_desc *k_tgt = k->tgt;
+  bool is_tgt_unmapped = gomp_remove_var (devicep, k);
+  /* It would be bad if TGT got unmapped while we're still iterating over
+	 its LIST_COUNT, and also expect to use it in the following code.  */
+  assert (!is_tgt_unmapped
+	  || k_tgt != tgt);
 }
 
   if (aq)
@@ -4157,7 +4165,7 @@ gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
 			   false, NULL);
   }
 
-  int nrmvars = 0;
+  size_t nrmvars = 0;
   splay_tre