On 28/07/2025 14:36, Tobias Burnus wrote:
When initially adding MI300 support, the buffer invalidation
before atomics was messed up - it should have been buffer_wbl2
(wbl2 = write back L2). With this patch in place, most test
cases work on MI300A :-)

Without this change, there were several multi-teams issues.

MI300A testing shows: larger programs now work :-)

OK for mainline?

OK.

* * *

For libgomp testing, I see the fails:

FAIL: libgomp.c/../libgomp.c-c++-common/declare-target-indirect-2.c execution test FAIL: libgomp.c++/../libgomp.c-c++-common/declare-target-indirect-2.c execution test
FAIL: libgomp.fortran/declare-target-indirect-2.f90   -O…  execution tests

→ PR114445, I presume

FAIL: libgomp.c/interop-hsa.c execution test
FAIL: libgomp.c/omp_alloc-3.c execution test
FAIL: libgomp.c/target-52.c execution test
FAIL: libgomp.c/target-53.c execution test
FAIL: libgomp.c/target-54.c output pattern test
FAIL: libgomp.c/target-49.c output pattern test
FAIL: libgomp.c++/target-has-device-addr-2.C execution test
FAIL: libgomp.c++/target-has-device-addr-4.C execution test
FAIL: libgomp.c++/target-has-device-addr-5.C execution test
FAIL: libgomp.c++/target-has-device-addr-6.C execution test
FAIL: libgomp.c++/target-has-device-addr-8.C execution test
FAIL: libgomp.c++/target-has-device-addr-9.C execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/deep-copy-10.c - DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  - O{0,2}  execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c - DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  - O2  (test for excess errors) FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/static-variable-1.c - DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  - O{0,2}  execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c - DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  - O{0,2}  execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/deep-copy-10.c - DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  - O{0,2}  execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  - O{0,2}  execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/static-variable-1.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  - O{0,2}  execution test FAIL: libgomp.oacc-c++/pr96835-1.C -DACC_DEVICE_TYPE_radeon=1 - DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  (internal compiler error: verify_gimple failed)

→ To be checked - some are known issues, other seem to be true
issues.

* * *

Tobias

PS: I think we eventually have to revisit the atomics/scope topic.
In particular, we don't support system-global atomics properly.
(In OpenMP: 'memscope(all)'; the default is 'memscope(device)';
additionally, 'memscope(cgroup)' exists.)
And going over them and checking shouldn't harm in general.

I believe we support all the "usual" atomic builtins, although the cache controls are necessarily conservative in those cases because the compiler cannot know what threads are involved in the synchronization. Adding team-aware OpenMP atomics sounds like a good plan.

Andrew

Reply via email to