On 28/07/2025 14:36, Tobias Burnus wrote:
When initially adding MI300 support, the buffer invalidation
before atomics was messed up - it should have been buffer_wbl2
(wbl2 = write back L2). With this patch in place, most test
cases work on MI300A :-)
Without this change, there were several multi-teams issues.
MI300A testing shows: larger programs now work :-)
OK for mainline?
OK.
* * *
For libgomp testing, I see the fails:
FAIL: libgomp.c/../libgomp.c-c++-common/declare-target-indirect-2.c
execution test
FAIL: libgomp.c++/../libgomp.c-c++-common/declare-target-indirect-2.c
execution test
FAIL: libgomp.fortran/declare-target-indirect-2.f90 -O… execution tests
→ PR114445, I presume
FAIL: libgomp.c/interop-hsa.c execution test
FAIL: libgomp.c/omp_alloc-3.c execution test
FAIL: libgomp.c/target-52.c execution test
FAIL: libgomp.c/target-53.c execution test
FAIL: libgomp.c/target-54.c output pattern test
FAIL: libgomp.c/target-49.c output pattern test
FAIL: libgomp.c++/target-has-device-addr-2.C execution test
FAIL: libgomp.c++/target-has-device-addr-4.C execution test
FAIL: libgomp.c++/target-has-device-addr-5.C execution test
FAIL: libgomp.c++/target-has-device-addr-6.C execution test
FAIL: libgomp.c++/target-has-device-addr-8.C execution test
FAIL: libgomp.c++/target-has-device-addr-9.C execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/deep-copy-10.c -
DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -
O{0,2} execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c -
DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -
O2 (test for excess errors)
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/static-variable-1.c -
DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -
O{0,2} execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c -
DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -
O{0,2} execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/deep-copy-10.c -
DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -
O{0,2} execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -
O{0,2} execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/static-variable-1.c
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -
O{0,2} execution test
FAIL: libgomp.oacc-c++/pr96835-1.C -DACC_DEVICE_TYPE_radeon=1 -
DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (internal compiler
error: verify_gimple failed)
→ To be checked - some are known issues, other seem to be true
issues.
* * *
Tobias
PS: I think we eventually have to revisit the atomics/scope topic.
In particular, we don't support system-global atomics properly.
(In OpenMP: 'memscope(all)'; the default is 'memscope(device)';
additionally, 'memscope(cgroup)' exists.)
And going over them and checking shouldn't harm in general.
I believe we support all the "usual" atomic builtins, although the cache
controls are necessarily conservative in those cases because the
compiler cannot know what threads are involved in the synchronization.
Adding team-aware OpenMP atomics sounds like a good plan.
Andrew