https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119833
Bug ID: 119833
Summary: Clarify which semantics offloading compilation does
(not) inherit from using the LTO infrastructure
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: openacc, openmp
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: tschwinge at gcc dot gnu.org
CC: burnus at gcc dot gnu.org, jakub at gcc dot gnu.org,
rguenth at gcc dot gnu.org, tschwinge at gcc dot gnu.org
Target Milestone: ---
+++ This bug was initially created as a clone of Bug #117010 +++
(In reply to myself from bug 117010, comment #4)
> Jakub, Richi, C++/offloading question. For the small test case posted here,
> for 'V<0>::V()' I see in the '-O0' x86_64 host code:
>
> .section
> .text._ZN1VILi0EEC2Ev,"axG",@progbits,_ZN1VILi0EEC5Ev,comdat
> .align 2
> .weak _ZN1VILi0EEC2Ev
> .type _ZN1VILi0EEC2Ev, @function
> _ZN1VILi0EEC2Ev:
> [...]
> .size _ZN1VILi0EEC2Ev, .-_ZN1VILi0EEC2Ev
> .weak _ZN1VILi0EEC1Ev
> .set _ZN1VILi0EEC1Ev,_ZN1VILi0EEC2Ev
>
> That is, weak definitions of '_ZN1VILi0EEC2Ev' and its alias
> '_ZN1VILi0EEC1Ev' (which gets called from 'foo').
>
> Likewise, I see weak definitions, if compiling such code for GCN target:
>
> .section
> .text._ZN1VILi0EEC2Ev,"axG",@progbits,_ZN1VILi0EEC5Ev,comdat
> .align 4
> .weak _ZN1VILi0EEC2Ev
> .type _ZN1VILi0EEC2Ev,@function
> _ZN1VILi0EEC2Ev:
> [...]
> .size _ZN1VILi0EEC2Ev, .-_ZN1VILi0EEC2Ev
> .weak _ZN1VILi0EEC1Ev
> .set _ZN1VILi0EEC1Ev,_ZN1VILi0EEC2Ev
>
> ..., so that appears consistent.
>
> For nvptx target (with '-malias'), I see:
>
> .weak .func _ZN1VILi0EEC1Ev (.param.u64 %in_ar0)
> {
> [...]
> }
>
> That is, it directly emits the (used) '_ZN1VILi0EEC1Ev' constructor, instead
> of emitting '_ZN1VILi0EEC2Ev' and then aliasing the former to the latter.
(See bug 117010, comment #9 for the x86_64, or GCN vs. nvptx target
difference.)
> Now, the observation/question: compiling this code for offloading (as
> originally reported), I see for GCN offloading:
>
> .text
> [...]
> .type _ZN1VILi0EEC2Ev,@function
> _ZN1VILi0EEC2Ev:
> [...]
> .size _ZN1VILi0EEC2Ev, .-_ZN1VILi0EEC2Ev
> .set _ZN1VILi0EEC1Ev,_ZN1VILi0EEC2Ev
>
> That is, '_ZN1VILi0EEC2Ev' and its alias '_ZN1VILi0EEC1Ev' are now strong
> instead of weak definitions.
Similarly for nvptx offloading:
.func _ZN1VILi0EEC2Ev (.param.u64 %in_ar0)
{
[...]
... is then non-'.weak'.
> Is this expected, or unexpected, and
> potentially problematic?
(In reply to myself from bug 117010, comment #6)
> [Looking for an explanation] why "weak" and "comdat" get lost in the GCN
> offloading path? GCN
> (ELF) does support all these things (to the best of my knowledge). (Let's
> ignore nvptx for this moment.) I'll thus analyze offload stream-out,
> stream-in etc.
(In reply to myself from bug 117010, comment #7)
> First observation: the same (per my understanding) happens with LTO: compile
> this code, still at '-O0' with '-foffload=disable' but with '-flto', and see
> the x86_64 '[...].ltrans0.ltrans.s' file:
>
> .text
> [...]
> .type _ZN1VILi0EEC2Ev, @function
> _ZN1VILi0EEC2Ev:
> [...]
> .size _ZN1VILi0EEC2Ev, .-_ZN1VILi0EEC2Ev
> .set _ZN1VILi0EEC1Ev,_ZN1VILi0EEC2Ev
>
> Could this be due to whole-program optimization, enabled by LTO? (But
> '-O0'?)
(In reply to myself from bug 117010, comment #8)
> Well, indeed. Offloading code generation uses the LTO machinery, including
> the 'lto1' front end, and thus has 'gcc/common.opt:in_lto_p' set to 'true':
>
> ; True if this is the lto front end. This is used to disable gimple
> ; generation and lowering passes that are normally run on the output
> ; of a front end. These passes must be bypassed for lto since they
> ; have already been done before the gimple was written.
> Variable
> bool in_lto_p = false
>
> The "weak", "comdat" transformations are described at the high level in
> 'gcc/doc/lto.texi':
>
> The whole program mode assumptions are slightly more complex in
> C++, where inline functions in headers are put into @emph{COMDAT}
> sections. COMDAT function and variables can be defined by
> multiple object files and their bodies are unified at link-time
> and dynamic link-time. COMDAT functions are changed to local only
> when their address is not taken and thus un-sharing them with a
> library is not harmful. [...]
>
> If I force-disable 'pass_ipa_whole_program_visibility':
>
> --- gcc/ipa-visibility.cc
> +++ gcc/ipa-visibility.cc
> @@ -993,4 +993,7 @@ public:
> unsigned int execute (function *) final override
> {
> +#ifdef ACCEL_COMPILER
> + return 0;
> +#endif
> return whole_program_function_and_variable_visibility ();
> }
>
> ..., then we get the expected 'diff' for GCN offloading compilation's
> '[...].xamdgcn-amdhsa.mkoffload.082i.whole-program' (and similar for nvptx
> offloading compilation's '[...].xnvptx-none.mkoffload.082i.whole-program'):
>
> -Marking local functions: __ct_comp /2 __ct_base /1
> [...]
> @@ -49,22 +40,24 @@
> _ZN1VILi0EEC1Ev/2 (__ct_comp )
> Type: function definition analyzed alias
> - Visibility: semantic_interposition prevailing_def_ironly
> + Visibility: externally_visible semantic_interposition public weak
> comdat comdat_group:_ZN1VILi0EEC5Ev one_only
> + Same comdat group as: _ZN1VILi0EEC2Ev/1
> References: _ZN1VILi0EEC2Ev/1 (alias)
> Referring:
> Read from file: pr117010-1_.o
> - Availability: local
> + Availability: available
> Unit id: 1
> - Function flags: local
> + Function flags:
> Called by: _Z3foov/3
> Calls:
> _ZN1VILi0EEC2Ev/1 (__ct_base )
> Type: function definition analyzed
> - Visibility: semantic_interposition no_reorder prevailing_def_ironly
> + Visibility: externally_visible semantic_interposition no_reorder
> public weak comdat comdat_group:_ZN1VILi0EEC5Ev one_only
> + Same comdat group as: _ZN1VILi0EEC1Ev/2
> References:
> Referring: _ZN1VILi0EEC1Ev/2 (alias)
> Read from file: pr117010-1_.o
> - Availability: local
> + Availability: available
> Unit id: 1
> - Function flags: local
> + Function flags:
> Called by:
> Calls:
>
> ..., and we get the expected 'diff' for the GCN offloading code,
> '[...].xamdgcn-amdhsa.mkoffload.2.s' (and similar for the nvptx offloading
> code, 'pr117010-1_.xnvptx-none.mkoffload.s'):
>
> + .section
> .text._ZN1VILi0EEC2Ev,"axG",@progbits,_ZN1VILi0EEC5Ev,comdat
> .align 2
> + .weak _ZN1VILi0EEC2Ev
> .type _ZN1VILi0EEC2Ev,@function
> [...]
> .size _ZN1VILi0EEC2Ev, .-_ZN1VILi0EEC2Ev
> + .weak _ZN1VILi0EEC1Ev
> .set _ZN1VILi0EEC1Ev,_ZN1VILi0EEC2Ev
>
> Now, so much for the mechanics. What this means semantically: whether
> 'in_lto_p' should vs. shouldn't actually be set for offloading compilation,
> I/we have to spend more thought on, whether all these
> transformations/optimizations guarded by 'in_lto_p' are generally applicable
> to offloading compilation or not?
That shall be the topic of this new PR here.