Hi all, dear Andrew,

Andrew Stubbs:
As mentioned by Andrew, a future patch will enable the CUDA support
for this.

Not quite; the CUDA patch actually comes before this one in the series, and I plan to keep it that way.

OK — that was not really clear from your ping email. The CUDA patch was mostly approved, but it depended on other patches that are supposed to be dropped for now.

The CUDA patch (and a link to the patch series) is at https://patchwork.sourceware.org/project/gcc/patch/[email protected]/

5/6 is CUDA, 6/6 is the patch discussed in this thread – and those two will land first, skipping 3/6 and 4/6 for now. (And the others aren't relevant any more and not part of v5 of the patch series).

* * *

[On the comment: I wonder whether implementing the memcopy on the
plugin side (cudaMemcpy) might be faster in some cases than a
normal memcpy, at least for to-device migrated data (e.g. managed
memory). For pinned memory, handling it locally as memcpy should be
fine.
In any case, that's for later and I don't have a firm opinion on
this - and I have not searched for the 'else' code in the other
patch series.]

That sounds plausible, but out-of-scope for this patch.

That might become relevant for (a) managed-memory support as gnu extension (other patch set,¹ if we work on this) – and (b) OpenMP 6.0's multi-device memory spaces.

[1] https://patchwork.sourceware.org/project/gcc/list/?series=35669

* * *

I wondered whether we handle numa/memkind correctly. I think
we don't if we use a pool. I think we need the following there:

--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -1204,2 +1204,5 @@ retry:
        && free_allocator_data == allocator_data
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
+       && memkind == free_memkind
+#endif
        && new_alignment == sizeof (void *)

Answer: It is fine. free_allocator_data == allocator_data checks that the allocator is the same; if it is, then obviously also the memkind must be the same – and using realloc is fine. (If not, malloc + copy + free is required; that happens in the else branch.)

* * *

I thought this was checked at the higher level? It's also inactive for the pinned memory trait.

I think it is active of one does 'realloc(..., allocator1, allocator2)' and one of them is a pinned-memory allocator. But as that's separate from pinned memory and seems to be fine, there is no need to worry about this part.

* * *


Otherwise, I think the patch is okay.

Thanks. I plan to repost the two-patch series in a few days.

Thanks!

Tobias

Reply via email to