Hi Andrew,
On 29.11.23 17:25, Andrew Stubbs wrote:
On 08/09/2023 10:04, Tobias Burnus wrote:
Regarding patch 2/3 and MEMSPACE_VALIDATE.
In general, I wonder how to handle memory spaces (and traits) that
aren't supported. Namely, when to return 0L and when to silently use
ignore the trait / use another memory space.
The current omp_init_allocator code only returns omp_null_allocator for
invalid value – or for pinned memory (as it is unsupported). [RFC: Shall
we keep doing so – or return omp_null_mem_alloc more often? →
https://gcc.gnu.org/PR111044 for this question, improving libmemkind
usage, and extending the allocator-related documentation.]
As we do it on the host, I think auto-fallback to omp_default_mem_space
is is also find for nvptx (and gcn), but not as done in 2/3 but slightly
different:
(a) In omp_init_allocator, there should be a check whether it is
supported, if not, we can fallback to using default memory space. (In
line with the current code host + 1/2+2/3 nvptx behaviour.)
Note: That's not the same as the current 2/3 patch. Currently, if
MEMSPACE_VALIDATE fails, a retry is attempted – but the outcome depends
on the value for 'fallback'. When changing the memory space during
omp_init_allocator, only failed 'malloc' will give abort with abort_fb.
(b) For nvptx_memspace_validate, I think an additional check should be
done based on the __PTX_ISA_VERSION* as it feels off if plugin first
claims support for it but later unconditionally uses malloc at runtime.
I have looked at moving the MEMSPACE_VALIDATE call into
omp_init_allocator so that we can't even create allocators that would
be invalid, but that changes the semantics of the fall-back traits.
Here's the example from testcase omp_alloc-traits.c:
omp_alloctrait_t traits_all[2]
= { { omp_atk_fallback, omp_atv_null_fb },
{ omp_atk_access, omp_atv_all } };
omp_allocator_handle_t lowlat_all
= omp_init_allocator (omp_low_lat_mem_space, 2, traits_all);
/* ... */
void *b = omp_alloc (1, lowlat_all);
With my patch as proposed, "lowlat_all" is a valid allocator, but
allocating low-latency memory fails in omp_alloc, so "b" ends up NULL
(the fall-back setting).
With the proposed change, "lowlat_all" becomes omp_null_allocator, and
"b" is non-NULL, pointing to default memory. This is probably
surprising to the user because they thought they specified
"low-latency or nothing".
Well, a proper code would do instead:
omp_allocator_handle_t lowlat_all
= omp_init_allocator (omp_low_lat_mem_space, 2, traits_all);
if (lowlat_all == omp_null_allocator)
{
omp_allocator_handle_t lowlat_all
= omp_init_allocator (omp_low_lat_mem_space, 2, traits_all);
/* At least, preserve the traits (okay, not very useful here). */
lowlat_all
= omp_init_allocator (omp_default_mem_space, 2, traits_all);
/* Giving up: */
if (lowlat_all == omp_null_allocator)
lowlat_all = omp_default_mem_alloc;
}
OpenMP explicitly states:
"if an allocator based on the requirements cannot be created then the special
omp_null_allocator handle is returned."
Thus, if the user is surprised it is their fault!
(c) We also need to handle omp_low_lat_mem_alloc. I think the spec
implies access:all but nvptx/gcn only support cgroup (+ pteams +
thread), potentially leading to wrong code.
If we're not allowed to default to "cgroup" then surely
omp_low_lat_mem_alloc is useless on all GPU devices (that I am aware
of) on all toolchains? There may be some use on some specialist NUMA
host devices, but that's it.
Granted, but a user-specified allocator would still work, wouldn't it?
Thus, the question is whether being a bit less safe but more useful or
being safer but slower/less useful makes more sense – in either case, an
alert user that has read the documentation would know what to do, but a
programmer might not be aware of such issues and a mere user does not
know what a programmer did.
I think I agree with Jakub that "without the user telling us that" (the
only access is within a team, "i.e. requesting cgroup access), we don't
know what it will be used for and so we need to assume worst".
BTW: Unless I missed something, LLVM does not yet support it:
KMP_WARNING(OmpNoAllocator, "omp_low_lat_mem_alloc");
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht
München, HRB 106955