https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98215
Bug ID: 98215 Summary: Coalescing memory in target region creates slower code Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: rene.jacobsen at deic dot dk CC: jakub at gcc dot gnu.org Target Milestone: --- Created attachment 49714 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49714&action=edit Code that produces the bug Exact compiler version: g++ (Ubuntu 10.2.0-13ubuntu1) 10.2.0 System: Ubuntu 20.10 Command line: g++ -fopenmp -fcf-protection=none -foffload=nvptx-none -fno-stack-protector -foffload=-misa=sm_35 -foffload=-lm gcc_coalescing_bug.cpp The attached code shows two possible ways of running code on the GPU. The coalesced function should be faster, due to coalesced memory access, but is ~4x slower. When running it on our system we get the following output: non_coalesced: Elapsed time: 0.13381 coalesced: Elapsed time: 0.48244 non_coalesced: Elapsed time: 0.133868 coalesced: Elapsed time: 0.481802 non_coalesced: Elapsed time: 0.133794 coalesced: Elapsed time: 0.481685 non_coalesced: Elapsed time: 0.133875 coalesced: Elapsed time: 0.481841