On Tue, Dec 06, 2022 at 08:45:07AM +0100, Tobias Burnus wrote:
> 32bit vs. 64bit: libgomp itself is compiled with both -m32 and -m64; however,
> nvptx and gcn requires -m64 on the device side and assume that the device
> pointers are representable on the host (i.e. all are 64bit). The new code
> tries to be in principle compatible with uint32_t pointers and uses uint64_t
> to represent it consistently. – The code should be mostly fine, except that
> one called function requires an array of void* and size_t. Instead of handling
> that case, I added some code to permit optimizing away the function content
> without offloading - and a run-time assert if it should ever happen that this
> function gets called on a 32bit host from the target side.

I think we just shouldn't support libgomp plugins for 32-bit libgomp, only
host fallback.  If you want offloading, use 64-bit host...

> libgomp: Handle OpenMP's reverse offloads
> 
> This commit enabled reverse offload for nvptx such that gomp_target_rev
> actually gets called.  And it fills the latter function to do all of
> the following: finding the host function to the device func ptr and
> copying the arguments to the host, processing the mapping/firstprivate,
> calling the host function, copying back the data and freeing as needed.
> 
> The data handling is made easier by assuming that all host variables
> either existed before (and are in the mapping) or that those are
> devices variables not yet available on the host. Thus, the reverse
> mapping can do without refcounts etc. Note that the spec disallows
> inside a target region device-affecting constructs other than target
> plus ancestor device-modifier and it also limits the clauses permitted
> on this construct.
> 
> For the function addresses, an additional splay tree is used; for
> the lookup of mapped variables, the existing splay-tree is used.
> Unfortunately, its data structure requires a full walk of the tree;
> Additionally, the just mapped variables are recorded in a separate
> data structure an extra lookup. While the lookup is slow, assuming
> that only few variables get mapped in each reverse offload construct
> and that reverse offload is the exception and not performance critical,
> this seems to be acceptable.
> 
> libgomp/ChangeLog:
> 
>       * libgomp.h (struct target_mem_desc): Predeclare; move
>       below after 'reverse_splay_tree_node' and add rev_array
>       member.
>       (struct reverse_splay_tree_key_s, reverse_splay_compare): New.
>       (reverse_splay_tree_node, reverse_splay_tree,
>       reverse_splay_tree_key): New typedef.
>       (struct gomp_device_descr): Add mem_map_rev member.
>       * oacc-host.c (host_dispatch): NULL init .mem_map_rev.
>       * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
>       support for GOMP_REQUIRES_REVERSE_OFFLOAD.
>       * splay-tree.h (splay_tree_callback_stop): New typedef; like
>       splay_tree_callback but returning int not void.
>       (splay_tree_foreach_lazy): Define; like splay_tree_foreach but
>       taking splay_tree_callback_stop as argument.
>       * splay-tree.c (splay_tree_foreach_internal_lazy,
>       splay_tree_foreach_lazy): New; but early exit if callback returns
>       nonzero.
>       * target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
>       (gomp_map_lookup_rev): New.
>       (gomp_load_image_to_device): Handle reverse-offload function
>       lookup table.
>       (gomp_unload_image_from_device): Free devicep->mem_map_rev.
>       (struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
>       gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
>       gomp_map_cdata_lookup): New auxiliary structs and functions for
>       gomp_target_rev.
>       (gomp_target_rev): Implement reverse offloading and its mapping.
>       (gomp_target_init): Init current_device.mem_map_rev.root.
>       * testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
>       * testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
>       * testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
>       * testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
>       * testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
>       mapping of on-device allocated variables.

> +  /* Likeverse for the reverse lookup device->host for reverse offload. */

Likewise

> +  reverse_splay_tree_node rev_array;

Do we need reverse_splay_tree* stuff in libgomp.h?
As splay_tree_node is just a pointer, perhaps just
struct reverse_splay_tree_node_s;
early and
  struct reverse_splay_tree_node_s *rev_array;
in libgomp.h and include the extra splay-tree.h only in target.c?
Unless one needs it anywhere else...

Otherwise LGTM.

        Jakub

Reply via email to