https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

            Bug ID: 110082
           Summary: Coverage analysis vs. offloading compilation
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: openacc, openmp, wrong-code
          Severity: normal
          Priority: P3
         Component: gcov-profile
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tschwinge at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org, jakub at gcc dot gnu.org,
                    marxin at gcc dot gnu.org, rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: amdgcn-amdhsa, nvptx-none

(Via a customer report) we've determined that offloading compilation fails in
combination with '-fprofile-arcs' (as implied by '--coverage', coverage
analysis):

    <built-in>: error: variable ‘__gcov0.main._omp_fn.0’ has been referenced in
offloaded code but hasn’t been marked to be included in the offloaded code
    lto1: fatal error: errors during merging of translation units
    compilation terminated.
    nvptx mkoffload: fatal error:
build-gcc/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
    [...]

Per my quick look, during early host compilation, via
'gcc/tree-profile.cc:gimple_gen_edge_profiler', via 'pass_ipa_tree_profile', as
visible in '*.069i.profile', '__atomic_fetch_add_8 (&__gcov0.main._omp_fn.0[0],
1, 0);' etc. statements are added.  (Or, different statements in case that the
target cannot "utilize atomic update operations", or 'PROFILE_UPDATE_SINGLE' is
used.)

That's part of the IPA passes, so before offloading compilation split-off. 
(... as per the error, evidently).

As we've got no mechanism implemented currently to move any device-side
coverage data from the device execution back to the host, and integrate it with
the host-side coverage data, we propose to not do coverage analysis for
offloading compilation.

My idea is to abstract the "increment the edge execution count" operations into
some new GIMPLE/IFN code (?), and then later, once the offloading code has been
split off, lower it to the current form (host-side), or no-op (device-side). 
I'd appreciate a quick review if that approach makes sense?

---

We've seen and dealt with a few already, over the past decade, but still more
such similar issues certainly exist for other scenarios where GCC "early"
(before the offloading code split-off) does code transformations that device
compilation may choke on, and thus has to explicitly handle.  A full review of
GCC "early" transformations doesn't seem feasible, so we shall continue
addressing these incrementally, as encountered.

For another example (that I shall soon be working on), see
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101544#c9>.

Reply via email to