https://github.com/xroche updated https://github.com/llvm/llvm-project/pull/199351
>From cf30821b4da56f53854ceb81903dd276ffbf68fe Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Sat, 23 May 2026 15:37:12 +0200 Subject: [PATCH 1/2] [Clang] Forward incoming Indirect parameters across musttail calls When a C function passes a struct by value to a musttail callee, Clang's frontend implements "by value" with a local alloca + memcpy + pass-pointer pattern (the byval-temp). The alloca lives in the caller's frame, which the tail call deallocates before the callee dereferences the pointer. The callee reads freed stack memory. Reproduces on RV64, AArch64, ARM, LoongArch64, and SystemZ at every optimization level. This is the argument-side analog of the SRet forwarding fix in a96c14eeb8fc ("[Clang] Always forward sret parameters to musttail calls", Kiran 2024-08-19). For musttail calls in the ABIArgInfo::Indirect case, when the call argument's source LValue resolves to a forwarded incoming Indirect parameter of the current function with a matching ABI shape, forward the incoming llvm::Argument directly instead of creating a byval-temp. Falls through to the existing byval-temp path for any other source. Three safety guards in the helper: - ABI-attribute match (Verifier V7). Refuse to forward when the incoming parameter and the call slot disagree on byval. - noalias deduplication. If the user writes `musttail callee(a, a)` with `a` a noalias Indirect parameter, do not forward the same Argument to both slots; pre-fix gave two distinct allocas and aliasing must not regress. - AddrSpaceCast peek-through. EmitParmDecl wraps incoming Indirect parameters in addrspacecast on NVPTX/AMDGPU/SPIR. Peek through one cast; do NOT unwrap loads (a load through a local alloca means the source is a local and the fix must not engage). Scope: this PR fixes the C source case. C++ source for the same construct routes through CXXConstructExpr + EmitAnyExprToTemp which materializes an agg.tmp before EmitCall runs. The fix correctly falls through in that case (the source is the local alloca, not the Argument) but does not eliminate the dangle. A follow-up PR will plumb IsMustTail through EmitCallArg to cover the C++ case. Test: clang/test/CodeGen/musttail-indirect-arg.c covers plain forward, two-arg forward, swapped args, mixed direct+indirect, modify-then- forward, and negative cases (local source, computed copy, non- musttail). Runs on riscv64, aarch64, loongarch64, s390x. Fixes #56908. Helps #116568 #157814 #46402 #190429 #56435 #72555. Complement of the backend fix in #185094 (RISC-V) and 0be65bac6907 (LoongArch). Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --- clang/lib/CodeGen/CGCall.cpp | 86 +++++++++++++++++++++ clang/test/CodeGen/musttail-indirect-arg.c | 90 ++++++++++++++++++++++ 2 files changed, 176 insertions(+) create mode 100644 clang/test/CodeGen/musttail-indirect-arg.c diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index 40cc275d40273..89aa89bfb26a4 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5448,6 +5448,69 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) { return MaxVectorWidth; } +/// For a musttail call argument lowered as ABIArgInfo::Indirect, returns the +/// incoming llvm::Argument of the current function when the call argument's +/// source is a forwarded incoming Indirect parameter with a matching ABI +/// shape. Returns nullptr to fall through to the normal byval-temp path. +/// +/// Forwarding is safe under musttail's prototype-match invariant: the +/// incoming pointer points into the caller's caller's frame and stays valid +/// across the tail call, whereas a local alloca would dangle. This mirrors +/// the SRet forwarding in the return path (see commit a96c14eeb8fc, +/// "Always forward sret parameters to musttail calls"). +/// +/// Guards: +/// - The source LValue must be the IR-level Argument of CurFn (peek through +/// one AddrSpaceCastInst for non-default alloca address spaces; do NOT +/// unwrap loads, since a load through a local alloca means the source +/// IS a local). +/// - The incoming parameter must be passed indirectly with byval-ness +/// matching the call slot (Verifier V7). +/// - The Argument must not already have been forwarded by a sibling call +/// argument in this same call (noalias deduplication). +static llvm::Argument *getForwardableIncomingMustTailArg( + CodeGenFunction &CGF, const CallArg &CallArgument, + const ABIArgInfo &CallSlotInfo, + llvm::SmallPtrSetImpl<llvm::Argument *> &AlreadyForwarded) { + // The call argument can be either an LValue (DeclRefExpr to a parameter) + // or an RValue aggregate (typical for struct args lowered by CGCall). Both + // expose the underlying address; we just need the IR-level pointer. + Address SrcAddr = Address::invalid(); + if (CallArgument.hasLValue()) + SrcAddr = CallArgument.getKnownLValue().getAddress(); + else if (CallArgument.getKnownRValue().isAggregate()) + SrcAddr = CallArgument.getKnownRValue().getAggregateAddress(); + else + return nullptr; + llvm::Value *SrcPtr = SrcAddr.emitRawPointer(CGF); + + // Peek through one AddrSpaceCastInst. EmitParmDecl wraps incoming Indirect + // parameters in addrspacecast on targets whose alloca address space differs + // from the parameter's pointer address space (NVPTX / AMDGPU / SPIR). + if (auto *ASC = llvm::dyn_cast<llvm::AddrSpaceCastInst>(SrcPtr)) + SrcPtr = ASC->getOperand(0); + + auto *IncomingArg = llvm::dyn_cast<llvm::Argument>(SrcPtr); + if (!IncomingArg || IncomingArg->getParent() != CGF.CurFn) + return nullptr; + + // byval-ness must match between the incoming parameter and the call slot. + // The Verifier rejects musttail across an ABI-attribute mismatch (V7), so + // producing IR with a mismatch is a verification failure. Falling through + // to byval-temp is the safe behavior. + if (IncomingArg->hasByValAttr() != CallSlotInfo.getIndirectByVal()) + return nullptr; + + // noalias deduplication: a noalias incoming parameter must not be + // forwarded to two slots in the same call. Pre-fix, each slot got its + // own byval-temp; we must not regress that aliasing guarantee. + if (IncomingArg->hasNoAliasAttr() && + !AlreadyForwarded.insert(IncomingArg).second) + return nullptr; + + return IncomingArg; +} + RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, const CGCallee &Callee, ReturnValueSlot ReturnValue, @@ -5571,6 +5634,12 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, // markers that need to be ended right after the call. SmallVector<CallLifetimeEnd, 2> CallLifetimeEndAfterCall; + // For musttail calls forwarding Indirect parameters: tracks incoming + // Arguments already forwarded to a slot in this call, so a noalias + // incoming Argument is not forwarded to two slots (see + // getForwardableIncomingMustTailArg). + llvm::SmallPtrSet<llvm::Argument *, 4> ForwardedMustTailArgs; + // Translate all of the arguments as necessary to match the IR lowering. assert(CallInfo.arg_size() == CallArgs.size() && "Mismatch between function signature & arguments."); @@ -5643,6 +5712,23 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, case ABIArgInfo::Indirect: case ABIArgInfo::IndirectAliased: { assert(NumIRArgs == 1); + + // For musttail calls, forward an incoming Indirect parameter directly + // instead of creating a byval-temp. A local alloca would be deallocated + // by the tail call before the callee dereferences the pointer. The + // incoming pointer points into the caller's caller's frame, which + // remains valid. Mirrors the SRet forwarding above (a96c14eeb8fc). + if (IsMustTail) { + if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg( + *this, *I, ArgInfo, ForwardedMustTailArgs)) { + llvm::Value *Val = FwdArg; + if (ArgHasMaybeUndefAttr) + Val = Builder.CreateFreeze(Val); + IRCallArgs[FirstIRArg] = Val; + break; + } + } + if (I->isAggregate()) { // We want to avoid creating an unnecessary temporary+copy here; // however, we need one in three cases: diff --git a/clang/test/CodeGen/musttail-indirect-arg.c b/clang/test/CodeGen/musttail-indirect-arg.c new file mode 100644 index 0000000000000..70ef193493e27 --- /dev/null +++ b/clang/test/CodeGen/musttail-indirect-arg.c @@ -0,0 +1,90 @@ +// Test that Clang forwards incoming Indirect parameters across musttail calls +// instead of creating a byval-temp alloca that would dangle after the tail call +// deallocates the caller's frame. +// +// Companion to musttail-sret.cpp (commit a96c14eeb8fc): same idea, applied to +// incoming arguments rather than the sret return slot. + +// RUN: %clang_cc1 -triple=riscv64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON +// RUN: %clang_cc1 -triple=aarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON +// RUN: %clang_cc1 -triple=loongarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON +// RUN: %clang_cc1 -triple=s390x-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON + +// A struct large enough to land on the indirect-arg path on RV64 (>2*XLEN=16 +// bytes), AArch64 (>16 bytes), LoongArch64, SystemZ. +struct Big { + unsigned long long a, b, c, d; +}; + +// Plain forward: caller(B) musttails callee(B). The fix should emit no +// byval-temp alloca; the call should forward the incoming parameter %a. +struct Big C1(struct Big a); +struct Big P1(struct Big a) { + __attribute__((musttail)) return C1(a); +} +// COMMON-LABEL: define {{.*}} @P1( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C1({{.*}} %a + +// Two indirect args, same forwarding: each forwards its own incoming param. +struct Big C2(struct Big a, struct Big b); +struct Big P2(struct Big a, struct Big b) { + __attribute__((musttail)) return C2(a, b); +} +// COMMON-LABEL: define {{.*}} @P2( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C2({{.*}} %a, {{.*}} %b + +// Swapped args: caller(a, b) musttails callee(b, a). Each forwarded slot +// must resolve to the correct incoming Argument, not by position. +struct Big C3(struct Big x, struct Big y); +struct Big P3(struct Big a, struct Big b) { + __attribute__((musttail)) return C3(b, a); +} +// COMMON-LABEL: define {{.*}} @P3( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C3({{.*}} %b, {{.*}} %a + +// Mixed direct + indirect: only the indirect arg is affected by the fix. +struct Big C4(int n, struct Big a); +struct Big P4(int n, struct Big a) { + __attribute__((musttail)) return C4(n, a); +} +// COMMON-LABEL: define {{.*}} @P4( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C4({{.*}} %n, {{.*}} %a + +// Negative: local source. Caller takes Big a, but musttails with a LOCAL +// Big initialized in caller's frame. The byval-temp must remain because the +// source lives in caller's frame and would dangle if forwarded. The fix +// must NOT engage in this case. +struct Big C5(struct Big a); +struct Big P5(struct Big a) { + struct Big local = {1, 2, 3, 4}; + __attribute__((musttail)) return C5(local); +} +// COMMON-LABEL: define {{.*}} @P5( +// COMMON: alloca +// COMMON: musttail call {{.*}} @C5( + +// Negative: computed value (caller modifies the parameter then musttails). +// The IR will use %a directly (Clang lowers writes through the incoming +// pointer for Indirect params) so the fix does engage on the formal param, +// but a fresh alloca is not created either way -- existing behavior. +struct Big C6(struct Big a); +struct Big P6(struct Big a) { + a.a += 1; + __attribute__((musttail)) return C6(a); +} +// COMMON-LABEL: define {{.*}} @P6( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C6({{.*}} %a + +// Non-musttail tail call: the fix must NOT engage. Existing path emits +// the byval-temp as before. +struct Big C7(struct Big a); +struct Big P7(struct Big a) { + return C7(a); +} +// COMMON-LABEL: define {{.*}} @P7( +// COMMON-NOT: musttail >From a1c3fd3e175cd6c26c3868460c03b931382099f9 Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Sat, 23 May 2026 16:18:46 +0200 Subject: [PATCH 2/2] [Clang][NFC] Trim comments on musttail Indirect forwarding helper Reduce comment volume on getForwardableIncomingMustTailArg and the call site to match the SRet precedent (a96c14eeb8fc). Same code, shorter comments. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --- clang/lib/CodeGen/CGCall.cpp | 56 +++++++++--------------------------- 1 file changed, 14 insertions(+), 42 deletions(-) diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index 89aa89bfb26a4..b130df61c18b8 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5448,33 +5448,14 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) { return MaxVectorWidth; } -/// For a musttail call argument lowered as ABIArgInfo::Indirect, returns the -/// incoming llvm::Argument of the current function when the call argument's -/// source is a forwarded incoming Indirect parameter with a matching ABI -/// shape. Returns nullptr to fall through to the normal byval-temp path. -/// -/// Forwarding is safe under musttail's prototype-match invariant: the -/// incoming pointer points into the caller's caller's frame and stays valid -/// across the tail call, whereas a local alloca would dangle. This mirrors -/// the SRet forwarding in the return path (see commit a96c14eeb8fc, -/// "Always forward sret parameters to musttail calls"). -/// -/// Guards: -/// - The source LValue must be the IR-level Argument of CurFn (peek through -/// one AddrSpaceCastInst for non-default alloca address spaces; do NOT -/// unwrap loads, since a load through a local alloca means the source -/// IS a local). -/// - The incoming parameter must be passed indirectly with byval-ness -/// matching the call slot (Verifier V7). -/// - The Argument must not already have been forwarded by a sibling call -/// argument in this same call (noalias deduplication). +/// Returns the incoming llvm::Argument of the current function if this +/// musttail call argument forwards an incoming Indirect parameter with a +/// matching ABI shape; nullptr to fall through to the byval-temp path. +/// Argument-side analog of a96c14eeb8fc (SRet musttail forwarding). static llvm::Argument *getForwardableIncomingMustTailArg( CodeGenFunction &CGF, const CallArg &CallArgument, const ABIArgInfo &CallSlotInfo, llvm::SmallPtrSetImpl<llvm::Argument *> &AlreadyForwarded) { - // The call argument can be either an LValue (DeclRefExpr to a parameter) - // or an RValue aggregate (typical for struct args lowered by CGCall). Both - // expose the underlying address; we just need the IR-level pointer. Address SrcAddr = Address::invalid(); if (CallArgument.hasLValue()) SrcAddr = CallArgument.getKnownLValue().getAddress(); @@ -5484,9 +5465,9 @@ static llvm::Argument *getForwardableIncomingMustTailArg( return nullptr; llvm::Value *SrcPtr = SrcAddr.emitRawPointer(CGF); - // Peek through one AddrSpaceCastInst. EmitParmDecl wraps incoming Indirect - // parameters in addrspacecast on targets whose alloca address space differs - // from the parameter's pointer address space (NVPTX / AMDGPU / SPIR). + // Peek through one AddrSpaceCastInst (NVPTX / AMDGPU / SPIR wrap incoming + // Indirect params via EmitParmDecl). Do not unwrap loads: a load through + // a local alloca means the source is a local. if (auto *ASC = llvm::dyn_cast<llvm::AddrSpaceCastInst>(SrcPtr)) SrcPtr = ASC->getOperand(0); @@ -5494,16 +5475,11 @@ static llvm::Argument *getForwardableIncomingMustTailArg( if (!IncomingArg || IncomingArg->getParent() != CGF.CurFn) return nullptr; - // byval-ness must match between the incoming parameter and the call slot. - // The Verifier rejects musttail across an ABI-attribute mismatch (V7), so - // producing IR with a mismatch is a verification failure. Falling through - // to byval-temp is the safe behavior. + // Verifier V7 requires matching ABI attributes across musttail. if (IncomingArg->hasByValAttr() != CallSlotInfo.getIndirectByVal()) return nullptr; - // noalias deduplication: a noalias incoming parameter must not be - // forwarded to two slots in the same call. Pre-fix, each slot got its - // own byval-temp; we must not regress that aliasing guarantee. + // Do not forward the same noalias Argument to two slots in one call. if (IncomingArg->hasNoAliasAttr() && !AlreadyForwarded.insert(IncomingArg).second) return nullptr; @@ -5634,10 +5610,8 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, // markers that need to be ended right after the call. SmallVector<CallLifetimeEnd, 2> CallLifetimeEndAfterCall; - // For musttail calls forwarding Indirect parameters: tracks incoming - // Arguments already forwarded to a slot in this call, so a noalias - // incoming Argument is not forwarded to two slots (see - // getForwardableIncomingMustTailArg). + // Tracks incoming Arguments already forwarded by a musttail Indirect arg, + // for noalias deduplication in getForwardableIncomingMustTailArg. llvm::SmallPtrSet<llvm::Argument *, 4> ForwardedMustTailArgs; // Translate all of the arguments as necessary to match the IR lowering. @@ -5713,11 +5687,9 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, case ABIArgInfo::IndirectAliased: { assert(NumIRArgs == 1); - // For musttail calls, forward an incoming Indirect parameter directly - // instead of creating a byval-temp. A local alloca would be deallocated - // by the tail call before the callee dereferences the pointer. The - // incoming pointer points into the caller's caller's frame, which - // remains valid. Mirrors the SRet forwarding above (a96c14eeb8fc). + // For musttail, forward an incoming Indirect parameter directly. A + // local alloca would dangle after the tail call. Mirrors the SRet + // forwarding above (a96c14eeb8fc). if (IsMustTail) { if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg( *this, *I, ArgInfo, ForwardedMustTailArgs)) { _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
