https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110813

            Bug ID: 110813
           Summary: [OpenMP] omp_target_memcpy_rect (+ strided 'target
                    update'): Improve GCN performance and contiguous
                    subranges
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, openmp
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: jakub at gcc dot gnu.org, jules at gcc dot gnu.org
  Target Milestone: ---

omp_target_memcpy_rect_worker is used by omp_target_memcpy_rect and
omp_target_memcpy_rect_async.

It is also used when passing strided memory to 'target update' - either on OG13
or when applying the patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623502.html - as can be
seen on OG13:
https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-13/libgomp/target.c#L5689-L5843
(links to omp_target_memcpy_rect_worker; lines might be off when the file was
changed after I linked there.)


ISSUES:

* The current algorithm always loops until dim == 1,
  even if the referenced memory is contiguous

That's the case for _rect if src_dim == dst_dim == volume such as:
 volume=[V1,N2,N3], ..., dst_dimension=[D1,N2,N3], ... src_dimension=[S1,N2,N3]
the inner two dimensions are contiguous, only the outermost isn't.

Likewise for  '!$omp target update to(cont_array(:,:,::2)'


* While for nvptx, a patch exists (see below) that handles _rect copying
for dim=2 and dim=3 more efficiently (CUDA functions), for GCN such a feature
is currently missing.

EXPECTED:
* Improve performance if partially contiguous
* Improve performance on GCN


Cross ref:
- "[patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for
omp_target_memcpy_rect"
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625465.html
(as mentioned in that patch, cross ref to:
- PR101581 - [OpenMP] omp_target_memcpy – support inter-device memcpy )

Reply via email to