https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125382
Bug ID: 125382
Summary: Missed optimization: IPA-CP aggregate replacements
stranded when clone is fully inlined
Product: gcc
Version: 17.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: ipa
Assignee: unassigned at gcc dot gnu.org
Reporter: ptomsich at gcc dot gnu.org
Target Milestone: ---
Created attachment 64498
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64498&action=edit
pr-stranded-aggvals.cc (reproducer)
Reduced from SPEC CPU 2026 marian_r; reproduces target-independently
on trunk (also tested with 17.0.0 20260423).
For the testcase below, IPA-CP creates loop4.constprop with aggregate
replacements
Aggregate replacements: 0[16]=_M_manager(by_ref), 0[24]=_M_invoke(by_ref)
(plus 0[0]=mul/add per caller), and correctly tags the indirect call in its
body as
indirect aggregate callsite, calling param 0, offset 192, by reference
but the replacements are never applied to the GIMPLE.
The aarch64 asm contains 6 blr instructions and 2 surviving calls to
__throw_bad_function_call -- both should have been eliminated by the inliner
once the indirect call resolves to _M_invoke.
Suspected cause: scalar replacements (clone_info->tree_map) survive cloning +
inlining because tree_function_versioning applies them on every versioning
copy. Aggregate replacements (ipcp_transformation->m_agg_values) are only
applied by ipcp_transform_function, which the pass manager schedules via
ipa_transforms_to_apply and only runs when the clone is separately compiled.
When the clone is fully inlined, it is never separately compiled; thus, the
inliner pulls the body via get_untransformed_body()
(ipa-inline-transform.cc:420, 790), and the aggvals are silently dropped.
cgraph.cc:get_body asserts !clone_of (around line 4667) with a comment
that calls this design intentional, but the resulting "fully-inlined
clones lose their aggvals" case is a missed optimization in the common
std::function-by-const-ref pattern.
Reproducer (see also in the attachement):
#include <functional>
struct vec4 { float v[4]; };
struct OpsFloat {
static inline float mul(const float& x, const float& y) { return x * y; }
static inline float add(const float& x, const float& y) { return x + y; }
};
static inline vec4
loop4(const std::function<float(const float&, const float&)>& f,
const vec4& x, const vec4& y)
{
vec4 out;
for (int i = 0; i < 4; ++i)
out.v[i] = f(x.v[i], y.v[i]);
return out;
}
__attribute__((noinline))
vec4 vec_mul(const vec4& x, const vec4& y) { return loop4(OpsFloat::mul, x, y);
}
__attribute__((noinline))
vec4 vec_add(const vec4& x, const vec4& y) { return loop4(OpsFloat::add, x, y);
}
Build: g++ -O3 -S -c repro.cc
Expected: no indirect call instructions in vec_mul / vec_add; no calls
to __throw_bad_function_call (provably dead -- _M_invoker != NULL).
Actual: 6 indirect calls and 2 __throw_bad_function_call calls survive.