https://github.com/xroche created
https://github.com/llvm/llvm-project/pull/199351
**Heads-up before you read:** this patch was developed with substantial Claude
(Anthropic) assistance. Claude did the literature search across the bug tracker
and Discourse, the IR/asm analysis, and most of the writing. I curated, ran the
cross-arch validation, and stand behind the technical content. The LLVM AI Tool
Policy `Assisted-by:` trailer is at the end.
## What this fixes
When a C function passes a struct by value to a `musttail` callee, Clang's
frontend implements "by value" with a local `alloca + memcpy + pass-pointer`
pattern (the byval-temp). The alloca lives in the caller's frame. The tail call
deallocates the frame before the callee dereferences the pointer. The callee
reads freed stack memory.
This reproduces on RV64, AArch64, ARM, LoongArch64, SystemZ at every
optimization level, and partially on x86_64 (the X86 backend's
`ByValTemporaries` machinery hides simple cases but not all -- see #190429).
Minimal C reproducer (RV64/AArch64 fail at runtime; with a stack-clobber probe
to make the dangling memory observable, every arch fails):
```c
typedef struct { unsigned long long parts[4]; } S;
static const S g_in = {{0x11ULL, 0x22ULL, 0x33ULL, 0x44ULL}};
__attribute__((noinline)) static void clobber_stack(void) {
volatile unsigned long long buf[8];
for (int i = 0; i < 8; i++) buf[i] = 0xDEADULL;
__asm__ volatile("" : : "r"(&buf) : "memory");
}
__attribute__((noinline)) static int callee(S a) {
clobber_stack();
return a.parts[0] == g_in.parts[0] ? 0 : 1;
}
__attribute__((noinline)) static int caller(S a) {
__attribute__((musttail)) return callee(a);
}
int main(void) { return caller(g_in); }
```
## The fix
For `musttail` calls in `CGCall.cpp`'s `ABIArgInfo::Indirect` case, when the
call argument's source LValue resolves to a forwarded incoming `Indirect`
parameter of the current function with a matching ABI shape, forward the
incoming `llvm::Argument` directly instead of creating a byval-temp. Falls
through to the existing byval-temp path for any other source.
Three safety guards in the helper:
- **ABI-attribute match (Verifier V7).** Refuse to forward when the incoming
parameter and the call slot disagree on `byval`. The Verifier rejects musttail
across an ABI-attribute mismatch; falling through is the safe behavior.
- **`noalias` deduplication.** If the user writes `musttail callee(a, a)` with
`a` an incoming `noalias` Indirect parameter, do not forward the same Argument
to both slots; pre-fix gave two distinct allocas and we must not regress
aliasing.
- **AddrSpaceCast peek-through.** `EmitParmDecl` wraps incoming Indirect
parameters in `AddrSpaceCastInst` on NVPTX, AMDGPU, and SPIR. Peek through one
cast. Do NOT unwrap loads; a load through a local alloca means the source is a
local and the fix must not engage.
## Why this differs from the SRet fix
SRet (return slot) is write-only from the callee, so forwarding has no aliasing
surface. The SRet musttail fix (`a96c14eeb8fc`, "Always forward sret parameters
to musttail calls", Kiran 2024-08-19) is the direct precedent. Indirect
arguments are read by the callee, which is why the `noalias` deduplication
matters.
## Scope: C only in this PR
In C, the call argument for `caller(S a) { musttail callee(a); }` lowers
through `EmitCallArg`'s `LValueToRValue` path at `CGCall.cpp:5133`
(`addUncopiedAggregate`). The LValue address resolves directly to the
`llvm::Argument`, and this fix engages.
In C++, the AST for the same call wraps the argument in a `CXXConstructExpr`
(the implicit copy constructor) even for trivially-copyable types. That path
goes through `EmitAnyExprToTemp` which materializes an `agg.tmp` alloca before
`EmitCall` runs. By the time we reach `ABIArgInfo::Indirect`, the source is the
local `%agg.tmp`, not the `llvm::Argument`. The fix correctly falls through (it
is not an Argument of CurFn) but does not eliminate the dangling-pointer bug in
the C++ case.
The C++ extension is a separate change (it requires plumbing `IsMustTail`
through `EmitCallArg` / `EmitAnyExprToTemp`, or special-casing the
trivial-copy-constructor pattern). I plan to follow up.
## Tests
`clang/test/CodeGen/musttail-indirect-arg.c`, modeled on `musttail-sret.cpp`:
- 4 positive cases (plain forward, two-arg forward, swapped args, mixed direct
+ indirect): assert no byval-temp and forwarded `Argument`.
- 2 negative cases (local source, computed local copy): assert byval-temp
remains.
- 1 modify-and-forward case (caller writes through `%a` then tails): assert no
temp (existing forwarded path).
- 1 regression check on non-musttail tail calls.
- 4 triples: riscv64, aarch64, loongarch64, s390x. (x86_64 has its own
machinery; ARM is covered by runtime sweep.)
## Runtime cross-arch validation (local)
The minimal C reproducer above, swept across `-O0`/`-O1`/`-O2`/`-O3` with this
clang:
- x86_64 (native): pass
- aarch64 (qemu-aarch64): pass
- riscv64 (qemu-riscv64): pass
- arm (qemu-arm): pass
- loongarch64 (qemu-loongarch64): pass
- s390x (qemu-s390x): pass
Pre-existing musttail support gaps on PPC and MIPS produce build-time errors as
before; not regressions.
## Prior art
This bug class is acknowledged in long-open issues but the frontend extension
here was not previously attempted. The closest prior work split out the SRet
half of an ARM-omnibus PR.
- Open issues directly related: #46402 (X86 byval, since 2020), #56908
(riscv64), #72555 (subtle musttail), #116568 (WebKit struct overlap), #157814
(RISC-V vector_size, Sep 2025), #190429 (x86_64 stack address, Apr 2026),
#56435 (DSE removes byval memcpy).
- Closed/stalled PRs in this area: #102896 (Kiran, Aug 2024) and #109943
(Stannard, Sep 2024). Both ARM-omnibus PRs. The Clang frontend portion of
#102896 was extracted by reviewer request and became #104795 (the SRet fix);
the rest closed. The argument-side extension did not come up in either review
thread.
- Discourse threads: [Aug 2022 X86 byval
miscompile](https://discourse.llvm.org/t/musttail-generating-incorrect-output-for-function-with-pass-by-value-arguments/64778),
[Apr 2026 tailcc +
byval](https://discourse.llvm.org/t/interaction-between-tailcc-and-byval-parameters/90510).
- Backend complement: PR #185094 (merged May 2026) fixes the IR value-type
Indirect path on RISC-V; the LoongArch port `0be65bac6907` followed. Neither
covers the Clang frontend byval-temp pattern.
## What this PR does NOT do
- C++ extension (covered above; planned follow-up).
- Backend changes (the fix is purely in the frontend; backends see simpler IR
and unchanged behavior on non-musttail calls).
- The OrigArgIndex normalization (separate sandbox follow-up).
Fixes #56908. Helps #116568, #157814, #46402, #190429. Backend complement of
#185094.
Assisted-by: Claude (Anthropic), per the [LLVM AI Tool
Policy](https://github.com/llvm/llvm-project/blob/main/llvm/docs/DeveloperPolicy.rst#ai-generated-contributions).
>From cf30821b4da56f53854ceb81903dd276ffbf68fe Mon Sep 17 00:00:00 2001
From: Xavier Roche <[email protected]>
Date: Sat, 23 May 2026 15:37:12 +0200
Subject: [PATCH] [Clang] Forward incoming Indirect parameters across musttail
calls
When a C function passes a struct by value to a musttail callee,
Clang's frontend implements "by value" with a local
alloca + memcpy + pass-pointer pattern (the byval-temp). The alloca
lives in the caller's frame, which the tail call deallocates before
the callee dereferences the pointer. The callee reads freed stack
memory. Reproduces on RV64, AArch64, ARM, LoongArch64, and SystemZ at
every optimization level.
This is the argument-side analog of the SRet forwarding fix in
a96c14eeb8fc ("[Clang] Always forward sret parameters to musttail
calls", Kiran 2024-08-19). For musttail calls in the
ABIArgInfo::Indirect case, when the call argument's source LValue
resolves to a forwarded incoming Indirect parameter of the current
function with a matching ABI shape, forward the incoming llvm::Argument
directly instead of creating a byval-temp. Falls through to the
existing byval-temp path for any other source.
Three safety guards in the helper:
- ABI-attribute match (Verifier V7). Refuse to forward when the
incoming parameter and the call slot disagree on byval.
- noalias deduplication. If the user writes
`musttail callee(a, a)` with `a` a noalias Indirect parameter,
do not forward the same Argument to both slots; pre-fix gave two
distinct allocas and aliasing must not regress.
- AddrSpaceCast peek-through. EmitParmDecl wraps incoming Indirect
parameters in addrspacecast on NVPTX/AMDGPU/SPIR. Peek through one
cast; do NOT unwrap loads (a load through a local alloca means the
source is a local and the fix must not engage).
Scope: this PR fixes the C source case. C++ source for the same
construct routes through CXXConstructExpr + EmitAnyExprToTemp which
materializes an agg.tmp before EmitCall runs. The fix correctly falls
through in that case (the source is the local alloca, not the
Argument) but does not eliminate the dangle. A follow-up PR will plumb
IsMustTail through EmitCallArg to cover the C++ case.
Test: clang/test/CodeGen/musttail-indirect-arg.c covers plain forward,
two-arg forward, swapped args, mixed direct+indirect, modify-then-
forward, and negative cases (local source, computed copy, non-
musttail). Runs on riscv64, aarch64, loongarch64, s390x.
Fixes #56908. Helps #116568 #157814 #46402 #190429 #56435 #72555.
Complement of the backend fix in #185094 (RISC-V) and 0be65bac6907
(LoongArch).
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
---
clang/lib/CodeGen/CGCall.cpp | 86 +++++++++++++++++++++
clang/test/CodeGen/musttail-indirect-arg.c | 90 ++++++++++++++++++++++
2 files changed, 176 insertions(+)
create mode 100644 clang/test/CodeGen/musttail-indirect-arg.c
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 40cc275d40273..89aa89bfb26a4 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5448,6 +5448,69 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) {
return MaxVectorWidth;
}
+/// For a musttail call argument lowered as ABIArgInfo::Indirect, returns the
+/// incoming llvm::Argument of the current function when the call argument's
+/// source is a forwarded incoming Indirect parameter with a matching ABI
+/// shape. Returns nullptr to fall through to the normal byval-temp path.
+///
+/// Forwarding is safe under musttail's prototype-match invariant: the
+/// incoming pointer points into the caller's caller's frame and stays valid
+/// across the tail call, whereas a local alloca would dangle. This mirrors
+/// the SRet forwarding in the return path (see commit a96c14eeb8fc,
+/// "Always forward sret parameters to musttail calls").
+///
+/// Guards:
+/// - The source LValue must be the IR-level Argument of CurFn (peek through
+/// one AddrSpaceCastInst for non-default alloca address spaces; do NOT
+/// unwrap loads, since a load through a local alloca means the source
+/// IS a local).
+/// - The incoming parameter must be passed indirectly with byval-ness
+/// matching the call slot (Verifier V7).
+/// - The Argument must not already have been forwarded by a sibling call
+/// argument in this same call (noalias deduplication).
+static llvm::Argument *getForwardableIncomingMustTailArg(
+ CodeGenFunction &CGF, const CallArg &CallArgument,
+ const ABIArgInfo &CallSlotInfo,
+ llvm::SmallPtrSetImpl<llvm::Argument *> &AlreadyForwarded) {
+ // The call argument can be either an LValue (DeclRefExpr to a parameter)
+ // or an RValue aggregate (typical for struct args lowered by CGCall). Both
+ // expose the underlying address; we just need the IR-level pointer.
+ Address SrcAddr = Address::invalid();
+ if (CallArgument.hasLValue())
+ SrcAddr = CallArgument.getKnownLValue().getAddress();
+ else if (CallArgument.getKnownRValue().isAggregate())
+ SrcAddr = CallArgument.getKnownRValue().getAggregateAddress();
+ else
+ return nullptr;
+ llvm::Value *SrcPtr = SrcAddr.emitRawPointer(CGF);
+
+ // Peek through one AddrSpaceCastInst. EmitParmDecl wraps incoming Indirect
+ // parameters in addrspacecast on targets whose alloca address space differs
+ // from the parameter's pointer address space (NVPTX / AMDGPU / SPIR).
+ if (auto *ASC = llvm::dyn_cast<llvm::AddrSpaceCastInst>(SrcPtr))
+ SrcPtr = ASC->getOperand(0);
+
+ auto *IncomingArg = llvm::dyn_cast<llvm::Argument>(SrcPtr);
+ if (!IncomingArg || IncomingArg->getParent() != CGF.CurFn)
+ return nullptr;
+
+ // byval-ness must match between the incoming parameter and the call slot.
+ // The Verifier rejects musttail across an ABI-attribute mismatch (V7), so
+ // producing IR with a mismatch is a verification failure. Falling through
+ // to byval-temp is the safe behavior.
+ if (IncomingArg->hasByValAttr() != CallSlotInfo.getIndirectByVal())
+ return nullptr;
+
+ // noalias deduplication: a noalias incoming parameter must not be
+ // forwarded to two slots in the same call. Pre-fix, each slot got its
+ // own byval-temp; we must not regress that aliasing guarantee.
+ if (IncomingArg->hasNoAliasAttr() &&
+ !AlreadyForwarded.insert(IncomingArg).second)
+ return nullptr;
+
+ return IncomingArg;
+}
+
RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
const CGCallee &Callee,
ReturnValueSlot ReturnValue,
@@ -5571,6 +5634,12 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo
&CallInfo,
// markers that need to be ended right after the call.
SmallVector<CallLifetimeEnd, 2> CallLifetimeEndAfterCall;
+ // For musttail calls forwarding Indirect parameters: tracks incoming
+ // Arguments already forwarded to a slot in this call, so a noalias
+ // incoming Argument is not forwarded to two slots (see
+ // getForwardableIncomingMustTailArg).
+ llvm::SmallPtrSet<llvm::Argument *, 4> ForwardedMustTailArgs;
+
// Translate all of the arguments as necessary to match the IR lowering.
assert(CallInfo.arg_size() == CallArgs.size() &&
"Mismatch between function signature & arguments.");
@@ -5643,6 +5712,23 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo
&CallInfo,
case ABIArgInfo::Indirect:
case ABIArgInfo::IndirectAliased: {
assert(NumIRArgs == 1);
+
+ // For musttail calls, forward an incoming Indirect parameter directly
+ // instead of creating a byval-temp. A local alloca would be deallocated
+ // by the tail call before the callee dereferences the pointer. The
+ // incoming pointer points into the caller's caller's frame, which
+ // remains valid. Mirrors the SRet forwarding above (a96c14eeb8fc).
+ if (IsMustTail) {
+ if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg(
+ *this, *I, ArgInfo, ForwardedMustTailArgs)) {
+ llvm::Value *Val = FwdArg;
+ if (ArgHasMaybeUndefAttr)
+ Val = Builder.CreateFreeze(Val);
+ IRCallArgs[FirstIRArg] = Val;
+ break;
+ }
+ }
+
if (I->isAggregate()) {
// We want to avoid creating an unnecessary temporary+copy here;
// however, we need one in three cases:
diff --git a/clang/test/CodeGen/musttail-indirect-arg.c
b/clang/test/CodeGen/musttail-indirect-arg.c
new file mode 100644
index 0000000000000..70ef193493e27
--- /dev/null
+++ b/clang/test/CodeGen/musttail-indirect-arg.c
@@ -0,0 +1,90 @@
+// Test that Clang forwards incoming Indirect parameters across musttail calls
+// instead of creating a byval-temp alloca that would dangle after the tail
call
+// deallocates the caller's frame.
+//
+// Companion to musttail-sret.cpp (commit a96c14eeb8fc): same idea, applied to
+// incoming arguments rather than the sret return slot.
+
+// RUN: %clang_cc1 -triple=riscv64-linux-gnu %s -emit-llvm -O1 -o - |
FileCheck %s --check-prefix=COMMON
+// RUN: %clang_cc1 -triple=aarch64-linux-gnu %s -emit-llvm -O1 -o - |
FileCheck %s --check-prefix=COMMON
+// RUN: %clang_cc1 -triple=loongarch64-linux-gnu %s -emit-llvm -O1 -o - |
FileCheck %s --check-prefix=COMMON
+// RUN: %clang_cc1 -triple=s390x-linux-gnu %s -emit-llvm -O1 -o - | FileCheck
%s --check-prefix=COMMON
+
+// A struct large enough to land on the indirect-arg path on RV64 (>2*XLEN=16
+// bytes), AArch64 (>16 bytes), LoongArch64, SystemZ.
+struct Big {
+ unsigned long long a, b, c, d;
+};
+
+// Plain forward: caller(B) musttails callee(B). The fix should emit no
+// byval-temp alloca; the call should forward the incoming parameter %a.
+struct Big C1(struct Big a);
+struct Big P1(struct Big a) {
+ __attribute__((musttail)) return C1(a);
+}
+// COMMON-LABEL: define {{.*}} @P1(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C1({{.*}} %a
+
+// Two indirect args, same forwarding: each forwards its own incoming param.
+struct Big C2(struct Big a, struct Big b);
+struct Big P2(struct Big a, struct Big b) {
+ __attribute__((musttail)) return C2(a, b);
+}
+// COMMON-LABEL: define {{.*}} @P2(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C2({{.*}} %a, {{.*}} %b
+
+// Swapped args: caller(a, b) musttails callee(b, a). Each forwarded slot
+// must resolve to the correct incoming Argument, not by position.
+struct Big C3(struct Big x, struct Big y);
+struct Big P3(struct Big a, struct Big b) {
+ __attribute__((musttail)) return C3(b, a);
+}
+// COMMON-LABEL: define {{.*}} @P3(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C3({{.*}} %b, {{.*}} %a
+
+// Mixed direct + indirect: only the indirect arg is affected by the fix.
+struct Big C4(int n, struct Big a);
+struct Big P4(int n, struct Big a) {
+ __attribute__((musttail)) return C4(n, a);
+}
+// COMMON-LABEL: define {{.*}} @P4(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C4({{.*}} %n, {{.*}} %a
+
+// Negative: local source. Caller takes Big a, but musttails with a LOCAL
+// Big initialized in caller's frame. The byval-temp must remain because the
+// source lives in caller's frame and would dangle if forwarded. The fix
+// must NOT engage in this case.
+struct Big C5(struct Big a);
+struct Big P5(struct Big a) {
+ struct Big local = {1, 2, 3, 4};
+ __attribute__((musttail)) return C5(local);
+}
+// COMMON-LABEL: define {{.*}} @P5(
+// COMMON: alloca
+// COMMON: musttail call {{.*}} @C5(
+
+// Negative: computed value (caller modifies the parameter then musttails).
+// The IR will use %a directly (Clang lowers writes through the incoming
+// pointer for Indirect params) so the fix does engage on the formal param,
+// but a fresh alloca is not created either way -- existing behavior.
+struct Big C6(struct Big a);
+struct Big P6(struct Big a) {
+ a.a += 1;
+ __attribute__((musttail)) return C6(a);
+}
+// COMMON-LABEL: define {{.*}} @P6(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C6({{.*}} %a
+
+// Non-musttail tail call: the fix must NOT engage. Existing path emits
+// the byval-temp as before.
+struct Big C7(struct Big a);
+struct Big P7(struct Big a) {
+ return C7(a);
+}
+// COMMON-LABEL: define {{.*}} @P7(
+// COMMON-NOT: musttail
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits