Re: [PATCH v2 1/3] libgomp, nvptx: low-latency memory allocator

Tobias Burnus Thu, 30 Nov 2023 03:59:25 -0800

Hi Andrew,

On 29.11.23 17:25, Andrew Stubbs wrote:

On 08/09/2023 10:04, Tobias Burnus wrote:

Regarding patch 2/3 and MEMSPACE_VALIDATE.

In general, I wonder how to handle memory spaces (and traits) that
aren't supported. Namely, when to return 0L and when to silently use
ignore the trait / use another memory space.

The current omp_init_allocator code only returns omp_null_allocator for
invalid value – or for pinned memory (as it is unsupported). [RFC: Shall
we keep doing so – or return omp_null_mem_alloc more often? →
https://gcc.gnu.org/PR111044 for this question, improving libmemkind
usage, and extending the allocator-related documentation.]

As we do it on the host, I think auto-fallback to omp_default_mem_space
is is also find for nvptx (and gcn), but not as done in 2/3 but slightly
different:

(a) In omp_init_allocator, there should be a check whether it is
supported, if not, we can fallback to using default memory space. (In
line with the current code host + 1/2+2/3 nvptx behaviour.)

Note: That's not the same as the current 2/3 patch. Currently, if
MEMSPACE_VALIDATE fails, a retry is attempted – but the outcome depends
on the value for 'fallback'. When changing the memory space during
omp_init_allocator, only failed 'malloc' will give abort with abort_fb.

(b) For nvptx_memspace_validate, I think an additional check should be
done based on the __PTX_ISA_VERSION* as it feels off if plugin first
claims support for it but later unconditionally uses malloc at runtime.


I have looked at moving the MEMSPACE_VALIDATE call into
omp_init_allocator so that we can't even create allocators that would
be invalid, but that changes the semantics of the fall-back traits.
Here's the example from testcase omp_alloc-traits.c:

  omp_alloctrait_t traits_all[2]
    = { { omp_atk_fallback, omp_atv_null_fb },
        { omp_atk_access, omp_atv_all } };
  omp_allocator_handle_t lowlat_all
    = omp_init_allocator (omp_low_lat_mem_space, 2, traits_all);

  /* ... */

  void *b = omp_alloc (1, lowlat_all);

With my patch as proposed, "lowlat_all" is a valid allocator, but
allocating low-latency memory fails in omp_alloc, so "b" ends up NULL
(the fall-back setting).

With the proposed change, "lowlat_all" becomes omp_null_allocator, and
"b" is non-NULL, pointing to default memory. This is probably
surprising to the user because they thought they specified
"low-latency or nothing".


Well, a proper code would do instead:

  omp_allocator_handle_t lowlat_all
    = omp_init_allocator (omp_low_lat_mem_space, 2, traits_all);

  if (lowlat_all == omp_null_allocator)
    {
      omp_allocator_handle_t lowlat_all
        = omp_init_allocator (omp_low_lat_mem_space, 2, traits_all);

      /* At least, preserve the traits (okay, not very useful here). */
      lowlat_all
         = omp_init_allocator (omp_default_mem_space, 2, traits_all);

     /* Giving up:  */
     if (lowlat_all == omp_null_allocator)
       lowlat_all = omp_default_mem_alloc;
    }

OpenMP explicitly states:
"if an allocator based on the requirements cannot be created then the special 
omp_null_allocator handle is returned."

Thus, if the user is surprised it is their fault!

(c) We also need to handle omp_low_lat_mem_alloc. I think the spec
implies access:all but nvptx/gcn only support cgroup (+ pteams +
thread), potentially leading to wrong code.

If we're not allowed to default to "cgroup" then surely
omp_low_lat_mem_alloc is useless on all GPU devices (that I am aware
of) on all toolchains? There may be some use on some specialist NUMA
host devices, but that's it.


Granted, but a user-specified allocator would still work, wouldn't it?

Thus, the question is whether being a bit less safe but more useful or
being safer but slower/less useful makes more sense – in either case, an
alert user that has read the documentation would know what to do, but a
programmer might not be aware of such issues and a mere user does not
know what a programmer did.

I think I agree with Jakub that "without the user telling us that" (the
only access is within a team, "i.e. requesting cgroup access), we don't
know what it will be used for and so we need to assume worst".

BTW: Unless I missed something, LLVM does not yet support it:
KMP_WARNING(OmpNoAllocator, "omp_low_lat_mem_alloc");

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH v2 1/3] libgomp, nvptx: low-latency memory allocator

Reply via email to