https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100678
Bug ID: 100678 Summary: [OpenACC/nvptx] 'libgomp.oacc-c-c++-common/private-atomic-1.c' FAILs (differently) in certain configurations Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: openacc Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tschwinge at gcc dot gnu.org CC: jules at gcc dot gnu.org, vries at gcc dot gnu.org Target Milestone: --- Target: nvptx For OpenACC/nvptx offloading, the testcase 'libgomp.oacc-c-c++-common/private-atomic-1.c' that I've just pushed as commit r12-908-g1467100fc72562a59f70cdd4e05f6c810d1fadcc "Add 'libgomp.oacc-c-c++-common/private-atomic-1.c' [PR83812]" has been expected to fail with "operation not supported on global/shared address space" (see PR83812). However, I now found that on an x86_64 GNU/Linux system, Nvidia TITAN V GPU, CUDA Driver 455.23.05, it *doesn't* fail in that way: the device kernel execution completes normally -- but it instead returns a wrong reduction result: zero. At this point, it's (a) unclear whether the PR83812 restriction indeed is supposed to be lifted for certain modern GPU hardware/SM levels/CUDA Driver releases, and (b) what is then instead going wrong so that we don't compute the expected reduction result. Assuming that (a) has been done in good faith, I can see how (b) might happen if the 'v' variable would in fact *not* be thread-private (but instead device-global, I suppose), thus all threads atomically incrementing the device-global variable concurrently, thus the '(v == -222 + 121)' expression never being true?