On Tue, Dec 06, 2022 at 08:45:07AM +0100, Tobias Burnus wrote: > 32bit vs. 64bit: libgomp itself is compiled with both -m32 and -m64; however, > nvptx and gcn requires -m64 on the device side and assume that the device > pointers are representable on the host (i.e. all are 64bit). The new code > tries to be in principle compatible with uint32_t pointers and uses uint64_t > to represent it consistently. – The code should be mostly fine, except that > one called function requires an array of void* and size_t. Instead of handling > that case, I added some code to permit optimizing away the function content > without offloading - and a run-time assert if it should ever happen that this > function gets called on a 32bit host from the target side.
I think we just shouldn't support libgomp plugins for 32-bit libgomp, only host fallback. If you want offloading, use 64-bit host... > libgomp: Handle OpenMP's reverse offloads > > This commit enabled reverse offload for nvptx such that gomp_target_rev > actually gets called. And it fills the latter function to do all of > the following: finding the host function to the device func ptr and > copying the arguments to the host, processing the mapping/firstprivate, > calling the host function, copying back the data and freeing as needed. > > The data handling is made easier by assuming that all host variables > either existed before (and are in the mapping) or that those are > devices variables not yet available on the host. Thus, the reverse > mapping can do without refcounts etc. Note that the spec disallows > inside a target region device-affecting constructs other than target > plus ancestor device-modifier and it also limits the clauses permitted > on this construct. > > For the function addresses, an additional splay tree is used; for > the lookup of mapped variables, the existing splay-tree is used. > Unfortunately, its data structure requires a full walk of the tree; > Additionally, the just mapped variables are recorded in a separate > data structure an extra lookup. While the lookup is slow, assuming > that only few variables get mapped in each reverse offload construct > and that reverse offload is the exception and not performance critical, > this seems to be acceptable. > > libgomp/ChangeLog: > > * libgomp.h (struct target_mem_desc): Predeclare; move > below after 'reverse_splay_tree_node' and add rev_array > member. > (struct reverse_splay_tree_key_s, reverse_splay_compare): New. > (reverse_splay_tree_node, reverse_splay_tree, > reverse_splay_tree_key): New typedef. > (struct gomp_device_descr): Add mem_map_rev member. > * oacc-host.c (host_dispatch): NULL init .mem_map_rev. > * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim > support for GOMP_REQUIRES_REVERSE_OFFLOAD. > * splay-tree.h (splay_tree_callback_stop): New typedef; like > splay_tree_callback but returning int not void. > (splay_tree_foreach_lazy): Define; like splay_tree_foreach but > taking splay_tree_callback_stop as argument. > * splay-tree.c (splay_tree_foreach_internal_lazy, > splay_tree_foreach_lazy): New; but early exit if callback returns > nonzero. > * target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'. > (gomp_map_lookup_rev): New. > (gomp_load_image_to_device): Handle reverse-offload function > lookup table. > (gomp_unload_image_from_device): Free devicep->mem_map_rev. > (struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup, > gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int, > gomp_map_cdata_lookup): New auxiliary structs and functions for > gomp_target_rev. > (gomp_target_rev): Implement reverse offloading and its mapping. > (gomp_target_init): Init current_device.mem_map_rev.root. > * testsuite/libgomp.fortran/reverse-offload-2.f90: New test. > * testsuite/libgomp.fortran/reverse-offload-3.f90: New test. > * testsuite/libgomp.fortran/reverse-offload-4.f90: New test. > * testsuite/libgomp.fortran/reverse-offload-5.f90: New test. > * testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without > mapping of on-device allocated variables. > + /* Likeverse for the reverse lookup device->host for reverse offload. */ Likewise > + reverse_splay_tree_node rev_array; Do we need reverse_splay_tree* stuff in libgomp.h? As splay_tree_node is just a pointer, perhaps just struct reverse_splay_tree_node_s; early and struct reverse_splay_tree_node_s *rev_array; in libgomp.h and include the extra splay-tree.h only in target.c? Unless one needs it anywhere else... Otherwise LGTM. Jakub