https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102856
Bug ID: 102856 Summary: [nvptx] Misaligned accesses with cheap vectorization enabled Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jules at gcc dot gnu.org Target Milestone: --- Since revision 2b8453c401b699ed93c085d0413ab4b5030bcdb8 I am seeing several OpenMP tests fail with misaligned access errors: PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-11.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-12.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-16.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-3.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-5.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-6.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c++/../libgomp.c-c++-common/for-9.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-11.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-12.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-16.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-3.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-5.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-6.c execution test PASS -> FAIL: nvidia-1/libgomp.sum:libgomp.c/../libgomp.c-c++-common/for-9.c execution test These look like, e.g.: $ ./for-11.exe libgomp: cuCtxSynchronize error: misaligned address libgomp: cuMemFree_v2 error: misaligned address libgomp: device finalization failed I suspect the reason is that an operation that is now being vectorized (e.g. "st.v2.u64 [%frame], %r28;") requires higher alignment than the original scalar accesses it replaces. I haven't spotted an obvious culprit for the problem in the nvptx backend. This is OpenMP, so it could be the soft stack handling -- or it could be something else.