https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98215

            Bug ID: 98215
           Summary: Coalescing memory in target region creates slower code
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rene.jacobsen at deic dot dk
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Created attachment 49714
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49714&action=edit
Code that produces the bug

Exact compiler version: g++ (Ubuntu 10.2.0-13ubuntu1) 10.2.0
System: Ubuntu 20.10
Command line: g++ -fopenmp -fcf-protection=none -foffload=nvptx-none
-fno-stack-protector -foffload=-misa=sm_35 -foffload=-lm gcc_coalescing_bug.cpp

The attached code shows two possible ways of running code on the GPU. The
coalesced function should be faster, due to coalesced memory access, but is ~4x
slower.

When running it on our system we get the following output:
non_coalesced: 
  Elapsed time: 0.13381

coalesced: 
  Elapsed time: 0.48244

non_coalesced: 
  Elapsed time: 0.133868

coalesced: 
  Elapsed time: 0.481802

non_coalesced: 
  Elapsed time: 0.133794

coalesced: 
  Elapsed time: 0.481685

non_coalesced: 
  Elapsed time: 0.133875

coalesced: 
  Elapsed time: 0.481841

Reply via email to