[clang] [CIR] Coerce Direct args and returns in CallConvLowering (PR #195879)

via cfe-commits Tue, 02 Jun 2026 16:49:35 -0700

https://github.com/adams381 updated 
https://github.com/llvm/llvm-project/pull/195879


>From 50f0da8d6c03257a9646702c804a39307288d11b Mon Sep 17 00:00:00 2001
From: Adam Smith <[email protected]>
Date: Tue, 5 May 2026 09:26:10 -0700
Subject: [PATCH 1/3] [CIR] Add Direct coerce-in-registers +
 cir.reinterpret_cast op

Fourth PR in the split of #192119/#192124. Implements the
Direct-with-coercion path in CallConvLowering and picks off
andykaylor's five inline review comments from the original PR.

The new cir.reinterpret_cast op is for same-bit-width in-register
reinterpretation (vector<2 x float> <-> complex<float>).
emitCoercion uses it when source and destination differ only in
vector-vs-non-vector shape and have identical bit width, instead
of going through memory.  For everything else (records, or shape
doesn't match) the helper still does alloca/store/ptr-cast/load.

Andy's comments, in order:
- Temporary alloca alignment is now max(srcAlign, dstAlign) from
  DataLayout instead of hardcoded.
- The alloca lives in the entry block via InsertionGuard so it
  composes with HoistAllocas regardless of pipeline order.
- isVolatile kept as UnitAttr-absence with an inline comment.
- vector<->complex now uses cir.reinterpret_cast.
- Memory path has three new .cir tests covering it.

In-body coercion (insertArgCoercion / insertReturnCoercion) folds
into the existing per-function rewriteFunctionDefinition method
introduced by the prior Direct/Ignore PR's review fixup.  It runs
ahead of the Ignore-arg drop in the same per-function inner-loop
window the pass driver already uses, so the per-function
window-of-invalidity invariant is unchanged: F's signature and
F's body coerce together; F's callers update inside the same
inner loop iteration.  The Ignore-arg drop reuses the existing
poison-stub idiom (the alloca/load-fallback that earlier drafts
used is unnecessary once the drop happens after coercion in the
same per-function window).

LowerToLLVM gets a stub for the new op: bitcast for same-shape
converted types, error-with-message for aggregates.  We don't
produce aggregates from CallConvLowering today, so the error
path is only reachable from hand-written IR; follow-up patch can
add an extract/insert lowering if needed.

Co-authored-by: Cursor <[email protected]>
---
 clang/include/clang/CIR/Dialect/IR/CIROps.td  |  48 ++++
 clang/lib/CIR/Dialect/IR/CIRDialect.cpp       |  22 ++
 .../Transforms/CallConvLoweringPass.cpp       |   2 +-
 .../TargetLowering/CIRABIRewriteContext.cpp   | 228 ++++++++++++++++--
 .../TargetLowering/CIRABIRewriteContext.h     |  15 +-
 .../CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp |  39 +++
 clang/test/CIR/IR/reinterpret-cast.cir        |  28 +++
 .../abi-lowering/coerce-int-to-record.cir     |  59 +++++
 .../abi-lowering/coerce-record-to-int.cir     |  50 ++++
 .../coerce-record-to-record-via-memory.cir    |  34 +++
 .../coerce-vector-to-complex-reinterpret.cir  |  42 ++++
 11 files changed, 537 insertions(+), 30 deletions(-)
 create mode 100644 clang/test/CIR/IR/reinterpret-cast.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir

diff --git a/clang/include/clang/CIR/Dialect/IR/CIROps.td 
b/clang/include/clang/CIR/Dialect/IR/CIROps.td
index 0353da7715191..89391da873a73 100644
--- a/clang/include/clang/CIR/Dialect/IR/CIROps.td
+++ b/clang/include/clang/CIR/Dialect/IR/CIROps.td
@@ -288,6 +288,54 @@ def CIR_CastOp : CIR_Op<"cast", [
 
 }
 
+//===----------------------------------------------------------------------===//
+// ReinterpretCastOp
+//===----------------------------------------------------------------------===//
+
+def CIR_ReinterpretCastOp : CIR_Op<"reinterpret_cast", [Pure]> {
+  let summary = "Reinterpret a value as a different same-bit-width type";
+  let description = [{
+    The `cir.reinterpret_cast` operation reinterprets the bits of its source
+    value as a different type, with no IR-level cost.  It is used by the
+    calling-convention lowering pass to coerce between same-bit-width types
+    that have an LLVM-IR-level shape mismatch but identical in-register
+    representation -- for example, between `!cir.vector<2 x !cir.float>` and
+    `!cir.complex<!cir.float>`, both of which lower to the same LLVM IR
+    representation but have distinct CIR types.
+
+    Unlike `cir.cast bitcast`, which is overloaded for pointer-to-pointer
+    bitcasts and several other use cases, `cir.reinterpret_cast` is reserved
+    for in-register value reinterpretation only.  The result type must
+    differ from the source type; otherwise the op is meaningless and the
+    folder removes it.
+
+    **Invariant** (not currently enforced by the verifier): the source and
+    destination types must have the same bit width per the module's
+    DataLayout, and they must use the same in-register lane order on the
+    target.  Producers (e.g. CallConvLowering's coerce-in-registers path)
+    are responsible for ensuring this; a follow-up patch will move the
+    bit-width check into the verifier once the design question of
+    DataLayout-aware op verifiers is resolved.
+
+    Example:
+
+    ```
+    %c = cir.reinterpret_cast %v
+       : !cir.vector<2 x !cir.float> -> !cir.complex<!cir.float>
+    ```
+  }];
+
+  let arguments = (ins CIR_AnyType:$src);
+  let results = (outs CIR_AnyType:$result);
+
+  let assemblyFormat = [{
+    $src `:` type($src) `->` type($result) attr-dict
+  }];
+
+  let hasVerifier = 1;
+  let hasFolder = 1;
+}
+
 
//===----------------------------------------------------------------------===//
 // DynamicCastOp
 
//===----------------------------------------------------------------------===//
diff --git a/clang/lib/CIR/Dialect/IR/CIRDialect.cpp 
b/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
index cf07fc4f0833a..fe390720bd047 100644
--- a/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+++ b/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
@@ -915,6 +915,28 @@ static Value tryFoldCastChain(cir::CastOp op) {
   return {};
 }
 
+//===----------------------------------------------------------------------===//
+// ReinterpretCastOp
+//===----------------------------------------------------------------------===//
+
+LogicalResult cir::ReinterpretCastOp::verify() {
+  // The op is meaningless for identical types -- the folder is the right
+  // way to remove it -- but we accept it at the verifier level so that
+  // peephole code (e.g. pattern rewriters that round-trip values) doesn't
+  // need a type-equality guard.  Producers should still avoid emitting
+  // it for matching types.
+  //
+  // The same-bit-width invariant is documented on the op but not yet
+  // checked here; see the op description for the rationale.
+  return success();
+}
+
+OpFoldResult cir::ReinterpretCastOp::fold(FoldAdaptor adaptor) {
+  if (getSrc().getType() == getType())
+    return getSrc();
+  return {};
+}
+
 OpFoldResult cir::CastOp::fold(FoldAdaptor adaptor) {
   if (mlir::isa_and_present<cir::PoisonAttr>(adaptor.getSrc())) {
     // Propagate poison value
diff --git a/clang/lib/CIR/Dialect/Transforms/CallConvLoweringPass.cpp 
b/clang/lib/CIR/Dialect/Transforms/CallConvLoweringPass.cpp
index 838125037afd5..c00947593517e 100644
--- a/clang/lib/CIR/Dialect/Transforms/CallConvLoweringPass.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/CallConvLoweringPass.cpp
@@ -137,7 +137,7 @@ void CallConvLoweringPass::runOnOperation() {
   }
 
   DataLayout dl(moduleOp);
-  CIRABIRewriteContext rewriteCtx(moduleOp);
+  CIRABIRewriteContext rewriteCtx(moduleOp, dl);
   SymbolTable symbolTable(moduleOp);
 
   // Classify every cir.func up front.  No IR mutation happens here, so
diff --git 
a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp 
b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
index 2c29c83b999ba..29ac1d371cd64 100644
--- a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
@@ -54,12 +54,11 @@ LogicalResult buildNewArgTypes(ArrayRef<Type> oldArgTypes,
     Type origTy = oldArgTypes[idx];
     switch (ac.kind) {
     case ArgKind::Direct:
-      if (ac.coercedType) {
-        emitError() << "Direct with coerced type at arg " << idx
-                    << " not yet implemented in CallConvLowering";
-        return failure();
-      }
-      newArgTypes.push_back(origTy);
+      // Direct with a coerced type means the wire signature uses the
+      // coerced type; the body still expects origTy and we'll insert a
+      // reinterpret/coercion at the entry block.  Direct without a
+      // coerced type is a true pass-through.
+      newArgTypes.push_back(ac.coercedType ? ac.coercedType : origTy);
       break;
     case ArgKind::Ignore:
       break;
@@ -93,12 +92,9 @@ Type computeNewReturnType(Type origRetTy, const 
ArgClassification &retInfo,
                           function_ref<InFlightDiagnostic()> emitError) {
   switch (retInfo.kind) {
   case ArgKind::Direct:
-    if (retInfo.coercedType) {
-      emitError() << "Direct return with coerced type not yet implemented "
-                  << "in CallConvLowering";
-      return nullptr;
-    }
-    return origRetTy;
+    // Direct return with a coerced type uses the coerced type on the wire;
+    // the rewriter inserts a coercion before each cir.return.
+    return retInfo.coercedType ? retInfo.coercedType : origRetTy;
   case ArgKind::Ignore:
     return cir::VoidType::get(ctx);
   case ArgKind::Expand:
@@ -176,6 +172,157 @@ ArrayAttr updateResAttrs(MLIRContext *ctx, ArrayAttr 
existingResAttrs,
   return ArrayAttr::get(ctx, {DictionaryAttr::get(ctx, attrs)});
 }
 
+/// Coerce \p src to type \p dstTy at the current builder insertion point.
+///
+/// Three strategies, in order of preference:
+///   - If src and dst are the same type, return src unchanged and leave
+///     \p createdOps empty.
+///   - If both are non-aggregate same-bit-width values that just differ in
+///     vector-vs-scalar shape (e.g. !cir.vector<2 x !cir.float> ↔
+///     !cir.complex<!cir.float>), use cir.reinterpret_cast which is free at
+///     the IR level.
+///   - Otherwise go through memory: allocate a slot of the source type
+///     (using max(srcAlign, dstAlign) for the alloca alignment), store
+///     the source, bitcast the pointer to the destination type, load the
+///     destination type back.
+///
+/// The temporary alloca is placed at the start of the enclosing function's
+/// entry block so that it composes correctly with the HoistAllocas pass
+/// regardless of pipeline ordering.
+///
+/// Any operations the helper creates are appended to \p createdOps so the
+/// caller can pass them to replaceAllUsesExcept and avoid clobbering the
+/// store's value operand when later rewiring the source value.
+Value emitCoercion(OpBuilder &rewriter, Location loc, Type dstTy, Value src,
+                   FunctionOpInterface funcOp, const DataLayout &dl,
+                   SmallPtrSetImpl<Operation *> &createdOps) {
+  Type srcTy = src.getType();
+  if (srcTy == dstTy)
+    return src;
+
+  // Reinterpret path: same total bit width, neither side is a record, and
+  // the shapes differ only in vector-vs-non-vector.  Going through memory
+  // is wasteful for these — they have the same in-register representation.
+  bool isAggregate = isa<cir::RecordType>(srcTy) || 
isa<cir::RecordType>(dstTy);
+  bool vectorMismatch =
+      isa<cir::VectorType>(srcTy) != isa<cir::VectorType>(dstTy);
+  if (!isAggregate && vectorMismatch &&
+      dl.getTypeSizeInBits(srcTy) == dl.getTypeSizeInBits(dstTy)) {
+    auto reinterpret =
+        cir::ReinterpretCastOp::create(rewriter, loc, dstTy, src);
+    createdOps.insert(reinterpret);
+    return reinterpret;
+  }
+
+  // Memory path: alloca + store + ptr-cast + load.  The alloca goes in the
+  // entry block (Andy's review comment #3 on the original PR), with
+  // alignment = max(srcAlign, dstAlign) to satisfy both the store and the
+  // load (review comment #1).
+  uint64_t srcAlign = dl.getTypeABIAlignment(srcTy);
+  uint64_t dstAlign = dl.getTypeABIAlignment(dstTy);
+  uint64_t allocaAlign = std::max(srcAlign, dstAlign);
+
+  auto srcPtrTy = cir::PointerType::get(srcTy);
+  auto dstPtrTy = cir::PointerType::get(dstTy);
+
+  cir::AllocaOp alloca;
+  {
+    OpBuilder::InsertionGuard guard(rewriter);
+    Block &entry = funcOp->getRegion(0).front();
+    rewriter.setInsertionPointToStart(&entry);
+    alloca = cir::AllocaOp::create(rewriter, loc, srcPtrTy, srcTy,
+                                   rewriter.getStringAttr("coerce"),
+                                   rewriter.getI64IntegerAttr(allocaAlign));
+  }
+  createdOps.insert(alloca);
+
+  auto store = cir::StoreOp::create(rewriter, loc, src, alloca,
+                                    /*isVolatile=*/UnitAttr(),
+                                    /*alignment=*/IntegerAttr(),
+                                    /*sync_scope=*/cir::SyncScopeKindAttr(),
+                                    /*mem_order=*/cir::MemOrderAttr());
+  createdOps.insert(store);
+
+  auto ptrCast = cir::CastOp::create(rewriter, loc, dstPtrTy,
+                                     cir::CastKind::bitcast, alloca);
+  createdOps.insert(ptrCast);
+
+  auto load = cir::LoadOp::create(rewriter, loc, dstTy, ptrCast,
+                                  /*isDeref=*/UnitAttr(),
+                                  /*isVolatile=*/UnitAttr(),
+                                  /*alignment=*/IntegerAttr(),
+                                  /*sync_scope=*/cir::SyncScopeKindAttr(),
+                                  /*mem_order=*/cir::MemOrderAttr());
+  createdOps.insert(load);
+  return load;
+}
+
+/// Convenience overload for callers that don't need the createdOps set
+/// (e.g. call-site coercion where we don't replaceAllUsesExcept).
+Value emitCoercion(OpBuilder &rewriter, Location loc, Type dstTy, Value src,
+                   FunctionOpInterface funcOp, const DataLayout &dl) {
+  SmallPtrSet<Operation *, 4> ignored;
+  return emitCoercion(rewriter, loc, dstTy, src, funcOp, dl, ignored);
+}
+
+/// Insert coercion before each cir.return so the returned value matches the
+/// new (coerced) return type.
+void insertReturnCoercion(FunctionOpInterface funcOp, Type origRetTy,
+                          Type coercedRetTy, OpBuilder &rewriter,
+                          const DataLayout &dl) {
+  SmallVector<cir::ReturnOp> returns;
+  funcOp.walk([&](cir::ReturnOp r) { returns.push_back(r); });
+  for (cir::ReturnOp r : returns) {
+    if (r.getInput().empty())
+      continue;
+    Value origVal = r.getInput()[0];
+    if (origVal.getType() == coercedRetTy)
+      continue;
+    rewriter.setInsertionPoint(r);
+    Value coerced =
+        emitCoercion(rewriter, r.getLoc(), coercedRetTy, origVal, funcOp, dl);
+    r->setOperand(0, coerced);
+  }
+}
+
+/// For each Direct arg with a coerced type, change the block argument's type
+/// to the coerced type and insert a coercion at function entry that maps it
+/// back to the original type for body uses.
+void insertArgCoercion(FunctionOpInterface funcOp,
+                       const FunctionClassification &fc, OpBuilder &rewriter,
+                       const DataLayout &dl) {
+  Region &body = funcOp->getRegion(0);
+  if (body.empty())
+    return;
+  Block &entry = body.front();
+
+  for (auto [idx, ac] : llvm::enumerate(fc.argInfos)) {
+    if (ac.kind != ArgKind::Direct || !ac.coercedType)
+      continue;
+    if (idx >= entry.getNumArguments())
+      continue;
+
+    BlockArgument blockArg = entry.getArgument(idx);
+    Type oldArgTy = blockArg.getType();
+    Type newArgTy = ac.coercedType;
+    if (oldArgTy == newArgTy)
+      continue;
+
+    blockArg.setType(newArgTy);
+
+    rewriter.setInsertionPointToStart(&entry);
+    SmallPtrSet<Operation *, 4> coercionOps;
+    Value adapted = emitCoercion(rewriter, funcOp.getLoc(), oldArgTy, blockArg,
+                                 funcOp, dl, coercionOps);
+
+    // Replace blockArg uses with the adapted value, except inside the helper
+    // ops we just created.  This is critical: the StoreOp's value operand is
+    // blockArg, and if we naively replaceAllUses it gets swapped to adapted
+    // (now of the original type != the alloca's pointee type).
+    blockArg.replaceAllUsesExcept(adapted, coercionOps);
+  }
+}
+
 } // namespace
 
 LogicalResult CIRABIRewriteContext::rewriteFunctionDefinition(
@@ -217,6 +364,23 @@ LogicalResult 
CIRABIRewriteContext::rewriteFunctionDefinition(
   if (funcOp.isDefinition()) {
     Region &body = funcOp->getRegion(0);
     if (!body.empty()) {
+      // In-body coercion for Direct-with-coerce / Extend args: change
+      // block-arg types to the coerced types and insert a
+      // cir.reinterpret_cast at the top of the entry block that converts
+      // each coerced value back to its original type, then route existing
+      // body uses (including in-body cir.call operands) through the cast.
+      // Done before the Ignore-drop below so the entry block argument
+      // indices used here still refer to the original positions.
+      insertArgCoercion(funcOp, fc, builder, dl);
+
+      // Direct return with coerced type: insert a coercion at every
+      // cir.return so the returned value matches the (coerced) return
+      // type in the new function signature set below.
+      if (fc.returnInfo.kind == ArgKind::Direct && fc.returnInfo.coercedType &&
+          !oldResultTypes.empty() && fc.returnInfo.coercedType != origRetTy)
+        insertReturnCoercion(funcOp, origRetTy, fc.returnInfo.coercedType,
+                             builder, dl);
+
       Block &entry = body.front();
 
       // For each Ignored argument: drop the block argument and, if the
@@ -302,23 +466,19 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
            << "indirect call not yet implemented in CallConvLowering";
 
   MLIRContext *ctx = callOp->getContext();
+  auto enclosingFunc = call->getParentOfType<FunctionOpInterface>();
 
   for (auto [idx, ac] : llvm::enumerate(fc.argInfos)) {
     switch (ac.kind) {
     case ArgKind::Direct:
-      if (ac.coercedType)
-        return call.emitOpError()
-               << "Direct with coerced type at call-site arg " << idx
-               << " not yet implemented in CallConvLowering";
-      break;
     case ArgKind::Ignore:
       break;
     case ArgKind::Expand:
       return call.emitOpError() << "Expand at call-site arg " << idx
                                 << " not yet implemented in CallConvLowering";
     case ArgKind::Extend:
-      // Extend at the call site is just an attribute change (llvm.signext /
-      // llvm.zeroext on the call's arg_attrs); no IR-level cast.
+      // Direct (with or without coercion), Ignore, Expand, and Extend are
+      // all handled below.  Extend is attribute-only at the IR level.
       break;
     case ArgKind::Indirect:
       return call.emitOpError() << "Indirect at call-site arg " << idx
@@ -326,6 +486,8 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
     }
   }
 
+  builder.setInsertionPoint(call);
+
   SmallVector<Value> newArgs;
   ValueRange argOperands = call.getArgOperands();
   newArgs.reserve(argOperands.size());
@@ -337,7 +499,12 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
   for (auto [idx, ac] : llvm::enumerate(fc.argInfos)) {
     if (ac.kind == ArgKind::Ignore)
       continue;
-    newArgs.push_back(argOperands[idx]);
+    Value arg = argOperands[idx];
+    if (ac.kind == ArgKind::Direct && ac.coercedType &&
+        arg.getType() != ac.coercedType)
+      arg = emitCoercion(builder, call.getLoc(), ac.coercedType, arg,
+                         enclosingFunc, dl);
+    newArgs.push_back(arg);
   }
 
   bool hasResult = call.getNumResults() > 0;
@@ -346,10 +513,11 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
   Type callRetTy = origRetTy;
   if (fc.returnInfo.kind == ArgKind::Ignore && hasResult)
     callRetTy = cir::VoidType::get(ctx);
-  if (fc.returnInfo.kind == ArgKind::Direct && fc.returnInfo.coercedType)
-    return call.emitOpError() << "Direct return with coerced type at "
-                              << "call-site not yet implemented in "
-                              << "CallConvLowering";
+  bool returnNeedsCoercion =
+      hasResult && fc.returnInfo.kind == ArgKind::Direct &&
+      fc.returnInfo.coercedType && fc.returnInfo.coercedType != origRetTy;
+  if (returnNeedsCoercion)
+    callRetTy = fc.returnInfo.coercedType;
 
   builder.setInsertionPoint(call);
   auto newCall = cir::CallOp::create(builder, call.getLoc(),
@@ -358,6 +526,15 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
     if (!newCall->hasAttr(attr.getName()))
       newCall->setAttr(attr.getName(), attr.getValue());
 
+  // Direct return with coercion: the new call returns the coerced type;
+  // emit a coercion back to the original type for the call's existing uses.
+  if (returnNeedsCoercion) {
+    builder.setInsertionPointAfter(newCall);
+    Value coercedBack = emitCoercion(builder, call.getLoc(), origRetTy,
+                                     newCall.getResult(), enclosingFunc, dl);
+    call.getResult().replaceAllUsesWith(coercedBack);
+  }
+
   // Layer llvm.signext / llvm.zeroext onto the new call's arg_attrs and
   // res_attrs for Extend args/return.  Ignore args also require a rebuild
   // because their slots are dropped from the output array.
@@ -384,7 +561,8 @@ LogicalResult CIRABIRewriteContext::rewriteCallSite(
       Value poison = createIgnoredValue(builder, call.getLoc(), origRetTy);
       call.getResult().replaceAllUsesWith(poison);
     }
-  } else if (hasResult) {
+  } else if (hasResult && !returnNeedsCoercion) {
+    // returnNeedsCoercion already wired up the coerced result above.
     call.getResult().replaceAllUsesWith(newCall.getResult());
   }
 
diff --git 
a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.h 
b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.h
index 7a0c0b8a2f22c..038e81026784c 100644
--- a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.h
+++ b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.h
@@ -11,9 +11,10 @@
 // rewrites a cir.func signature, the function body, and call sites to match
 // the ABI-lowered shape.
 //
-// This file currently handles only Direct (pass-through) and Ignore.  Other
-// ArgKind handlers (Extend, Direct-with-coercion, Indirect, Expand) are
-// added by subsequent PRs in the calling-convention-lowering split series.
+// This file currently handles Direct (pass-through and coerce-in-registers),
+// Extend, and Ignore.  The remaining ArgKind handlers (Indirect, Expand)
+// are added by subsequent PRs in the calling-convention-lowering split
+// series.
 //
 
//===----------------------------------------------------------------------===//
 
@@ -22,6 +23,7 @@
 
 #include "mlir/ABI/ABIRewriteContext.h"
 #include "mlir/IR/BuiltinOps.h"
+#include "mlir/Interfaces/DataLayoutInterfaces.h"
 #include "clang/CIR/Dialect/IR/CIRDialect.h"
 
 namespace cir {
@@ -31,9 +33,13 @@ namespace cir {
 /// The driver pass (CallConvLoweringPass) computes a FunctionClassification
 /// for each cir.func / cir.call and dispatches to this class to perform the
 /// actual IR rewriting using cir dialect operations.
+///
+/// Holds a reference to the module's DataLayout for coercion alignment
+/// queries.  The DataLayout must outlive the rewrite context.
 class CIRABIRewriteContext : public mlir::abi::ABIRewriteContext {
 public:
-  explicit CIRABIRewriteContext(mlir::ModuleOp module) : module(module) {}
+  CIRABIRewriteContext(mlir::ModuleOp module, const mlir::DataLayout &dl)
+      : module(module), dl(dl) {}
 
   mlir::LogicalResult
   rewriteFunctionDefinition(mlir::FunctionOpInterface funcOp,
@@ -49,6 +55,7 @@ class CIRABIRewriteContext : public 
mlir::abi::ABIRewriteContext {
 
 private:
   mlir::ModuleOp module;
+  const mlir::DataLayout &dl;
 };
 
 } // namespace cir
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp 
b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index 8c7e1406d6567..616e7347fff06 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -1667,6 +1667,45 @@ mlir::LogicalResult 
CIRToLLVMReturnOpLowering::matchAndRewrite(
   return mlir::LogicalResult::success();
 }
 
+mlir::LogicalResult CIRToLLVMReinterpretCastOpLowering::matchAndRewrite(
+    cir::ReinterpretCastOp op, OpAdaptor adaptor,
+    mlir::ConversionPatternRewriter &rewriter) const {
+  // After type conversion, source and destination LLVM types may be:
+  //   (a) Identical: trivially replace uses with the source value (the
+  //       op was a CIR-level type rename only; LLVM sees no change).
+  //   (b) Same scalar / vector category, same bit width: emit
+  //       LLVM::BitcastOp.
+  //   (c) Aggregate vs scalar / aggregate vs vector: LLVM::BitcastOp
+  //       does not allow aggregate types.  We currently emit an error
+  //       directing the producer to go through memory.  A future patch
+  //       will add an extract/insert lowering for the aggregate case so
+  //       the LLVM IR avoids the memory roundtrip too.
+  mlir::Type llvmDstTy = getTypeConverter()->convertType(op.getType());
+  mlir::Value llvmSrc = adaptor.getSrc();
+  mlir::Type llvmSrcTy = llvmSrc.getType();
+
+  if (llvmSrcTy == llvmDstTy) {
+    rewriter.replaceOp(op, llvmSrc);
+    return mlir::success();
+  }
+
+  bool srcIsAggregate =
+      mlir::isa<mlir::LLVM::LLVMStructType, mlir::LLVM::LLVMArrayType>(
+          llvmSrcTy);
+  bool dstIsAggregate =
+      mlir::isa<mlir::LLVM::LLVMStructType, mlir::LLVM::LLVMArrayType>(
+          llvmDstTy);
+  if (srcIsAggregate || dstIsAggregate)
+    return op.emitOpError()
+           << "lowering cir.reinterpret_cast to LLVM with aggregate type "
+           << "not yet implemented; producer should fall back to memory "
+           << "coercion until a follow-up patch adds extract/insert "
+           << "lowering";
+
+  rewriter.replaceOpWithNewOp<mlir::LLVM::BitcastOp>(op, llvmDstTy, llvmSrc);
+  return mlir::success();
+}
+
 mlir::LogicalResult CIRToLLVMRotateOpLowering::matchAndRewrite(
     cir::RotateOp op, OpAdaptor adaptor,
     mlir::ConversionPatternRewriter &rewriter) const {
diff --git a/clang/test/CIR/IR/reinterpret-cast.cir 
b/clang/test/CIR/IR/reinterpret-cast.cir
new file mode 100644
index 0000000000000..94742e15cda42
--- /dev/null
+++ b/clang/test/CIR/IR/reinterpret-cast.cir
@@ -0,0 +1,28 @@
+// RUN: cir-opt %s --verify-roundtrip | FileCheck %s
+
+!s32i = !cir.int<s, 32>
+
+module {
+  // Vector ↔ complex same-bit-width reinterpret (the canonical use case
+  // from cir-call-conv-lowering's coerce-in-registers path).
+  cir.func @vec_to_complex(%v : !cir.vector<2 x !cir.float>)
+      -> !cir.complex<!cir.float> {
+    %c = cir.reinterpret_cast %v
+       : !cir.vector<2 x !cir.float> -> !cir.complex<!cir.float>
+    cir.return %c : !cir.complex<!cir.float>
+  }
+
+  // Reverse direction.
+  cir.func @complex_to_vec(%c : !cir.complex<!cir.float>)
+      -> !cir.vector<2 x !cir.float> {
+    %v = cir.reinterpret_cast %c
+       : !cir.complex<!cir.float> -> !cir.vector<2 x !cir.float>
+    cir.return %v : !cir.vector<2 x !cir.float>
+  }
+}
+
+// CHECK:      cir.func{{.*}} @vec_to_complex
+// CHECK:        cir.reinterpret_cast %{{.*}} : !cir.vector<2 x !cir.float> -> 
!cir.complex<!cir.float>
+
+// CHECK:      cir.func{{.*}} @complex_to_vec
+// CHECK:        cir.reinterpret_cast %{{.*}} : !cir.complex<!cir.float> -> 
!cir.vector<2 x !cir.float>
diff --git a/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
new file mode 100644
index 0000000000000..f90427bf68b4c
--- /dev/null
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
@@ -0,0 +1,59 @@
+// Direct return with coerced type going from a small record to a same-bit-
+// width integer.  Mirror of coerce-record-to-int.cir but exercising the
+// return-side coercion code path: every cir.return gets the original
+// record value coerced to the integer type before being returned.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!s32i = !cir.int<s, 32>
+!s64i = !cir.int<s, 64>
+!rec_Pair = !cir.record<struct "Pair" {!s32i, !s32i}>
+
+#coerce_pair_return_to_i64 = {
+  return = { kind = "direct", coerced_type = !s64i },
+  args   = [ ]
+}
+
+#all_direct_no_args = {
+  return = { kind = "direct" },
+  args   = [ ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>,
+    #dlti.dl_entry<i64, dense<64>: vector<2xi64>>>
+} {
+
+  cir.func @returns_pair() -> !rec_Pair
+      attributes { test_classify = #coerce_pair_return_to_i64 } {
+    %0 = cir.const #cir.zero : !rec_Pair
+    cir.return %0 : !rec_Pair
+  }
+
+  // Signature changes to !s64i return; the cir.return's record operand
+  // gets coerced via memory roundtrip before being returned.  The alloca
+  // is hoisted to the entry-block start (Andy's review comment #3 from the
+  // original PR) so it sits ahead of the const that produces the value.
+  // CHECK:      cir.func{{.*}} @returns_pair() -> !s64i
+  // CHECK:        %[[SLOT:.*]] = cir.alloca !rec_Pair, !cir.ptr<!rec_Pair>, 
["coerce"]
+  // CHECK:        %[[VAL:.*]] = cir.const #cir.zero : !rec_Pair
+  // CHECK:        cir.store %[[VAL]], %[[SLOT]] : !rec_Pair, 
!cir.ptr<!rec_Pair>
+  // CHECK:        %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : 
!cir.ptr<!rec_Pair> -> !cir.ptr<!s64i>
+  // CHECK:        %[[COERCED:.*]] = cir.load %[[CAST]] : !cir.ptr<!s64i>, 
!s64i
+  // CHECK:        cir.return %[[COERCED]] : !s64i
+
+  cir.func @caller() -> !rec_Pair
+      attributes { test_classify = #coerce_pair_return_to_i64 } {
+    %0 = cir.call @returns_pair() : () -> !rec_Pair
+    cir.return %0 : !rec_Pair
+  }
+
+  // At the call site the lowered call returns !s64i; the rewriter coerces
+  // it back to !rec_Pair for downstream uses (the caller's own return
+  // also needs the coerce-back-then-coerce-forward chain since caller's
+  // return is also Direct-with-coerce).
+  // CHECK:      cir.func{{.*}} @caller() -> !s64i
+  // CHECK:        %{{.*}} = cir.call @returns_pair() : () -> !s64i
+
+}
diff --git a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
new file mode 100644
index 0000000000000..f31f09181710e
--- /dev/null
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
@@ -0,0 +1,50 @@
+// Direct with coerced type going from a small record to a same-bit-width
+// integer.  The shapes don't match (record vs scalar) so the rewriter
+// emits a memory roundtrip: alloca in the entry block + store + ptr-cast +
+// load.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!s32i = !cir.int<s, 32>
+!s64i = !cir.int<s, 64>
+!rec_Pair = !cir.record<struct "Pair" {!s32i, !s32i}>
+
+#coerce_pair_to_i64 = {
+  return = { kind = "direct" },
+  args   = [ { kind = "direct", coerced_type = !s64i } ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>,
+    #dlti.dl_entry<i64, dense<64>: vector<2xi64>>>
+} {
+
+  cir.func @takes_pair(%arg0: !rec_Pair)
+      attributes { test_classify = #coerce_pair_to_i64 } {
+    cir.return
+  }
+
+  // Signature changes to !s64i; entry block grows an alloca + store + cast
+  // + load chain that recovers the original !rec_Pair value.  The alloca
+  // lands at the very start of the entry block so this composes correctly
+  // with cir-hoist-allocas regardless of pipeline ordering.
+  // CHECK:      cir.func{{.*}} @takes_pair(%[[ARG:.*]]: !s64i)
+  // CHECK:        %[[SLOT:.*]] = cir.alloca !s64i, !cir.ptr<!s64i>, ["coerce"]
+  // CHECK:        cir.store %[[ARG]], %[[SLOT]] : !s64i, !cir.ptr<!s64i>
+  // CHECK:        %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : !cir.ptr<!s64i> 
-> !cir.ptr<!rec_Pair>
+  // CHECK:        %{{.*}} = cir.load %[[CAST]] : !cir.ptr<!rec_Pair>, 
!rec_Pair
+
+  cir.func @caller(%arg0: !rec_Pair)
+      attributes { test_classify = #coerce_pair_to_i64 } {
+    cir.call @takes_pair(%arg0) : (!rec_Pair) -> ()
+    cir.return
+  }
+
+  // At the call site, the original !rec_Pair gets coerced to !s64i via the
+  // same memory roundtrip before being passed.  Caller's own arg coercion
+  // chain runs first (it shares the pattern), then the call.
+  // CHECK:      cir.func{{.*}} @caller(%[[ARG:.*]]: !s64i)
+  // CHECK:        cir.call @takes_pair(%{{.*}}) : (!s64i) -> ()
+
+}
diff --git 
a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
new file mode 100644
index 0000000000000..1669bf1232d28
--- /dev/null
+++ 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
@@ -0,0 +1,34 @@
+// Direct with a coerced type that's a different record (record-to-record):
+// neither side is a vector and at least one is a record, so the rewriter
+// uses the memory-roundtrip path even though both types are aggregates.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!s32i = !cir.int<s, 32>
+!s64i = !cir.int<s, 64>
+!rec_Pair  = !cir.record<struct "Pair"  {!s32i, !s32i}>
+!rec_Single = !cir.record<struct "Single" {!s64i}>
+
+#coerce_pair_to_single = {
+  return = { kind = "direct" },
+  args   = [ { kind = "direct", coerced_type = !rec_Single } ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>,
+    #dlti.dl_entry<i64, dense<64>: vector<2xi64>>>
+} {
+
+  cir.func @takes_pair(%arg0: !rec_Pair)
+      attributes { test_classify = #coerce_pair_to_single } {
+    cir.return
+  }
+
+  // CHECK: cir.func{{.*}} @takes_pair(%[[ARG:.*]]: !rec_Single)
+  // CHECK:   %[[SLOT:.*]] = cir.alloca !rec_Single, !cir.ptr<!rec_Single>, 
["coerce"]
+  // CHECK:   cir.store %[[ARG]], %[[SLOT]] : !rec_Single, 
!cir.ptr<!rec_Single>
+  // CHECK:   %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : 
!cir.ptr<!rec_Single> -> !cir.ptr<!rec_Pair>
+  // CHECK:   %{{.*}} = cir.load %[[CAST]] : !cir.ptr<!rec_Pair>, !rec_Pair
+
+}
diff --git 
a/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
 
b/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
new file mode 100644
index 0000000000000..ceb1f9e364466
--- /dev/null
+++ 
b/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
@@ -0,0 +1,42 @@
+// Direct with coerced type that differs from the original only in
+// vector-vs-non-vector shape (same total bit width, neither side a record):
+// the rewriter emits cir.reinterpret_cast instead of going through memory.
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+#coerce_complex_to_vec2 = {
+  return = { kind = "direct" },
+  args   = [ { kind = "direct",
+               coerced_type = !cir.vector<2 x !cir.float> } ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<f32, dense<32>: vector<2xi64>>>
+} {
+
+  cir.func @takes_complex(%arg0: !cir.complex<!cir.float>)
+      attributes { test_classify = #coerce_complex_to_vec2 } {
+    cir.return
+  }
+
+  // The signature changes to the coerced (vector) type; the body still
+  // expects the complex, so a reinterpret_cast lands at function entry to
+  // adapt the new block argument back to the original type.
+  // CHECK: cir.func{{.*}} @takes_complex(%[[ARG:.*]]: !cir.vector<2 x 
!cir.float>)
+  // CHECK:   %{{.*}} = cir.reinterpret_cast %[[ARG]] : !cir.vector<2 x 
!cir.float> -> !cir.complex<!cir.float>
+
+  cir.func @caller(%arg0: !cir.complex<!cir.float>)
+      attributes { test_classify = #coerce_complex_to_vec2 } {
+    cir.call @takes_complex(%arg0) : (!cir.complex<!cir.float>) -> ()
+    cir.return
+  }
+
+  // At the call site the rewriter coerces the original (complex) value to
+  // the vector type before passing it through.
+  // CHECK: cir.func{{.*}} @caller(%[[ARG:.*]]: !cir.vector<2 x !cir.float>)
+  // CHECK:   %[[COMPLEX:.*]] = cir.reinterpret_cast %[[ARG]] : !cir.vector<2 
x !cir.float> -> !cir.complex<!cir.float>
+  // CHECK:   %[[COERCED:.*]] = cir.reinterpret_cast %[[COMPLEX]] : 
!cir.complex<!cir.float> -> !cir.vector<2 x !cir.float>
+  // CHECK:   cir.call @takes_complex(%[[COERCED]]) : (!cir.vector<2 x 
!cir.float>) -> ()
+
+}

>From 43edb395d1d07109bfd4080ed78ebc5443454c0f Mon Sep 17 00:00:00 2001
From: Adam Smith <[email protected]>
Date: Mon, 1 Jun 2026 11:07:19 -0700
Subject: [PATCH 2/3] [CIR] Coerce Direct args and returns through memory

CallConvLowering needs to bridge the gap between a function's CIR
signature and the ABI-coerced wire types for Direct-classified
arguments and returns: the wire signature uses the coerced type while
the function body and call sites still operate on the original type.

An earlier approach introduced a cir.reinterpret_cast op for the
same-bit-width cases, on the premise that types like
!cir.vector<2 x !cir.float> and !cir.complex<!cir.float> share an
in-register representation.  They do not: complex lowers to an LLVM
struct { float, float } and the vector to <2 x float>, so the op's
own LLVM lowering errored on the aggregate case and the documented
use was unreachable.  Classic CodeGen coerces these through memory
(CGCall.cpp CreateCoercedLoad/CreateCoercedStore), as does the CIR
incubator.

Remove the op and coerce everything the same way: allocate a slot of
the source type at the start of the entry block, store the value,
bitcast the pointer to the destination type, and load it back.  The
entry-block placement composes with HoistAllocas regardless of pass
ordering, and the alloca takes max(srcAlign, dstAlign) to satisfy both
accesses.  Scalars, vectors, and records all share this path, so the
previously special-cased same-size scalar and vector/complex pairs need
no distinct handling.

cir.store gains a builder taking just the value and address so the
coercion code does not have to spell out the four optional attributes;
cir.load already infers its result type from the address pointee.
---
 clang/include/clang/CIR/Dialect/IR/CIROps.td  | 58 +++-------------
 clang/lib/CIR/Dialect/IR/CIRDialect.cpp       | 22 ------
 .../TargetLowering/CIRABIRewriteContext.cpp   | 69 ++++++-------------
 .../CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp | 39 -----------
 clang/test/CIR/IR/reinterpret-cast.cir        | 28 --------
 .../abi-lowering/coerce-int-to-record.cir     | 34 +++++----
 .../abi-lowering/coerce-record-to-int.cir     | 42 ++++++-----
 .../coerce-record-to-record-via-memory.cir    | 51 ++++++++++----
 .../coerce-vector-to-complex-reinterpret.cir  | 42 -----------
 .../abi-lowering/coerce-vector-to-complex.cir | 42 +++++++++++
 10 files changed, 150 insertions(+), 277 deletions(-)
 delete mode 100644 clang/test/CIR/IR/reinterpret-cast.cir
 delete mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex.cir

diff --git a/clang/include/clang/CIR/Dialect/IR/CIROps.td 
b/clang/include/clang/CIR/Dialect/IR/CIROps.td
index 89391da873a73..c4d08d5337031 100644
--- a/clang/include/clang/CIR/Dialect/IR/CIROps.td
+++ b/clang/include/clang/CIR/Dialect/IR/CIROps.td
@@ -288,54 +288,6 @@ def CIR_CastOp : CIR_Op<"cast", [
 
 }
 
-//===----------------------------------------------------------------------===//
-// ReinterpretCastOp
-//===----------------------------------------------------------------------===//
-
-def CIR_ReinterpretCastOp : CIR_Op<"reinterpret_cast", [Pure]> {
-  let summary = "Reinterpret a value as a different same-bit-width type";
-  let description = [{
-    The `cir.reinterpret_cast` operation reinterprets the bits of its source
-    value as a different type, with no IR-level cost.  It is used by the
-    calling-convention lowering pass to coerce between same-bit-width types
-    that have an LLVM-IR-level shape mismatch but identical in-register
-    representation -- for example, between `!cir.vector<2 x !cir.float>` and
-    `!cir.complex<!cir.float>`, both of which lower to the same LLVM IR
-    representation but have distinct CIR types.
-
-    Unlike `cir.cast bitcast`, which is overloaded for pointer-to-pointer
-    bitcasts and several other use cases, `cir.reinterpret_cast` is reserved
-    for in-register value reinterpretation only.  The result type must
-    differ from the source type; otherwise the op is meaningless and the
-    folder removes it.
-
-    **Invariant** (not currently enforced by the verifier): the source and
-    destination types must have the same bit width per the module's
-    DataLayout, and they must use the same in-register lane order on the
-    target.  Producers (e.g. CallConvLowering's coerce-in-registers path)
-    are responsible for ensuring this; a follow-up patch will move the
-    bit-width check into the verifier once the design question of
-    DataLayout-aware op verifiers is resolved.
-
-    Example:
-
-    ```
-    %c = cir.reinterpret_cast %v
-       : !cir.vector<2 x !cir.float> -> !cir.complex<!cir.float>
-    ```
-  }];
-
-  let arguments = (ins CIR_AnyType:$src);
-  let results = (outs CIR_AnyType:$result);
-
-  let assemblyFormat = [{
-    $src `:` type($src) `->` type($result) attr-dict
-  }];
-
-  let hasVerifier = 1;
-  let hasFolder = 1;
-}
-
 
//===----------------------------------------------------------------------===//
 // DynamicCastOp
 
//===----------------------------------------------------------------------===//
@@ -868,6 +820,16 @@ def CIR_StoreOp : CIR_Op<"store", [
     $value `,` $addr attr-dict `:` type($value) `,` qualified(type($addr))
   }];
 
+  let builders = [
+    // Non-volatile, non-atomic store with default alignment.
+    OpBuilder<(ins "mlir::Value":$value, "mlir::Value":$addr), [{
+      build($_builder, $_state, value, addr, /*is_volatile=*/mlir::UnitAttr(),
+            /*alignment=*/mlir::IntegerAttr(),
+            /*sync_scope=*/cir::SyncScopeKindAttr(),
+            /*mem_order=*/cir::MemOrderAttr());
+    }]>
+  ];
+
   // FIXME: add verifier.
 }
 
diff --git a/clang/lib/CIR/Dialect/IR/CIRDialect.cpp 
b/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
index fe390720bd047..cf07fc4f0833a 100644
--- a/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+++ b/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
@@ -915,28 +915,6 @@ static Value tryFoldCastChain(cir::CastOp op) {
   return {};
 }
 
-//===----------------------------------------------------------------------===//
-// ReinterpretCastOp
-//===----------------------------------------------------------------------===//
-
-LogicalResult cir::ReinterpretCastOp::verify() {
-  // The op is meaningless for identical types -- the folder is the right
-  // way to remove it -- but we accept it at the verifier level so that
-  // peephole code (e.g. pattern rewriters that round-trip values) doesn't
-  // need a type-equality guard.  Producers should still avoid emitting
-  // it for matching types.
-  //
-  // The same-bit-width invariant is documented on the op but not yet
-  // checked here; see the op description for the rationale.
-  return success();
-}
-
-OpFoldResult cir::ReinterpretCastOp::fold(FoldAdaptor adaptor) {
-  if (getSrc().getType() == getType())
-    return getSrc();
-  return {};
-}
-
 OpFoldResult cir::CastOp::fold(FoldAdaptor adaptor) {
   if (mlir::isa_and_present<cir::PoisonAttr>(adaptor.getSrc())) {
     // Propagate poison value
diff --git 
a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp 
b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
index 29ac1d371cd64..517994dea9d4a 100644
--- a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
@@ -56,8 +56,8 @@ LogicalResult buildNewArgTypes(ArrayRef<Type> oldArgTypes,
     case ArgKind::Direct:
       // Direct with a coerced type means the wire signature uses the
       // coerced type; the body still expects origTy and we'll insert a
-      // reinterpret/coercion at the entry block.  Direct without a
-      // coerced type is a true pass-through.
+      // coercion at the entry block.  Direct without a coerced type is a
+      // true pass-through.
       newArgTypes.push_back(ac.coercedType ? ac.coercedType : origTy);
       break;
     case ArgKind::Ignore:
@@ -174,17 +174,13 @@ ArrayAttr updateResAttrs(MLIRContext *ctx, ArrayAttr 
existingResAttrs,
 
 /// Coerce \p src to type \p dstTy at the current builder insertion point.
 ///
-/// Three strategies, in order of preference:
-///   - If src and dst are the same type, return src unchanged and leave
-///     \p createdOps empty.
-///   - If both are non-aggregate same-bit-width values that just differ in
-///     vector-vs-scalar shape (e.g. !cir.vector<2 x !cir.float> ↔
-///     !cir.complex<!cir.float>), use cir.reinterpret_cast which is free at
-///     the IR level.
-///   - Otherwise go through memory: allocate a slot of the source type
-///     (using max(srcAlign, dstAlign) for the alloca alignment), store
-///     the source, bitcast the pointer to the destination type, load the
-///     destination type back.
+/// If src and dst are the same type, returns src unchanged and leaves
+/// \p createdOps empty.  Otherwise coerces through memory: allocate a slot
+/// of the source type (using max(srcAlign, dstAlign) for the alloca
+/// alignment), store the source, bitcast the pointer to the destination
+/// type, and load the destination type back.  This mirrors classic
+/// CodeGen's coerce-through-memory behavior for ABI argument and return
+/// coercion and lowers uniformly for scalar, vector, and record types.
 ///
 /// The temporary alloca is placed at the start of the enclosing function's
 /// entry block so that it composes correctly with the HoistAllocas pass
@@ -200,24 +196,10 @@ Value emitCoercion(OpBuilder &rewriter, Location loc, 
Type dstTy, Value src,
   if (srcTy == dstTy)
     return src;
 
-  // Reinterpret path: same total bit width, neither side is a record, and
-  // the shapes differ only in vector-vs-non-vector.  Going through memory
-  // is wasteful for these — they have the same in-register representation.
-  bool isAggregate = isa<cir::RecordType>(srcTy) || 
isa<cir::RecordType>(dstTy);
-  bool vectorMismatch =
-      isa<cir::VectorType>(srcTy) != isa<cir::VectorType>(dstTy);
-  if (!isAggregate && vectorMismatch &&
-      dl.getTypeSizeInBits(srcTy) == dl.getTypeSizeInBits(dstTy)) {
-    auto reinterpret =
-        cir::ReinterpretCastOp::create(rewriter, loc, dstTy, src);
-    createdOps.insert(reinterpret);
-    return reinterpret;
-  }
-
-  // Memory path: alloca + store + ptr-cast + load.  The alloca goes in the
-  // entry block (Andy's review comment #3 on the original PR), with
-  // alignment = max(srcAlign, dstAlign) to satisfy both the store and the
-  // load (review comment #1).
+  // Coerce through memory: alloca + store + ptr-cast + load.  The alloca
+  // goes at the start of the entry block so it composes with the
+  // HoistAllocas pass, with alignment = max(srcAlign, dstAlign) to satisfy
+  // both the store and the load.
   uint64_t srcAlign = dl.getTypeABIAlignment(srcTy);
   uint64_t dstAlign = dl.getTypeABIAlignment(dstTy);
   uint64_t allocaAlign = std::max(srcAlign, dstAlign);
@@ -236,23 +218,14 @@ Value emitCoercion(OpBuilder &rewriter, Location loc, 
Type dstTy, Value src,
   }
   createdOps.insert(alloca);
 
-  auto store = cir::StoreOp::create(rewriter, loc, src, alloca,
-                                    /*isVolatile=*/UnitAttr(),
-                                    /*alignment=*/IntegerAttr(),
-                                    /*sync_scope=*/cir::SyncScopeKindAttr(),
-                                    /*mem_order=*/cir::MemOrderAttr());
+  auto store = cir::StoreOp::create(rewriter, loc, src, alloca);
   createdOps.insert(store);
 
   auto ptrCast = cir::CastOp::create(rewriter, loc, dstPtrTy,
                                      cir::CastKind::bitcast, alloca);
   createdOps.insert(ptrCast);
 
-  auto load = cir::LoadOp::create(rewriter, loc, dstTy, ptrCast,
-                                  /*isDeref=*/UnitAttr(),
-                                  /*isVolatile=*/UnitAttr(),
-                                  /*alignment=*/IntegerAttr(),
-                                  /*sync_scope=*/cir::SyncScopeKindAttr(),
-                                  /*mem_order=*/cir::MemOrderAttr());
+  auto load = cir::LoadOp::create(rewriter, loc, ptrCast.getResult());
   createdOps.insert(load);
   return load;
 }
@@ -365,12 +338,12 @@ LogicalResult 
CIRABIRewriteContext::rewriteFunctionDefinition(
     Region &body = funcOp->getRegion(0);
     if (!body.empty()) {
       // In-body coercion for Direct-with-coerce / Extend args: change
-      // block-arg types to the coerced types and insert a
-      // cir.reinterpret_cast at the top of the entry block that converts
-      // each coerced value back to its original type, then route existing
-      // body uses (including in-body cir.call operands) through the cast.
-      // Done before the Ignore-drop below so the entry block argument
-      // indices used here still refer to the original positions.
+      // block-arg types to the coerced types and insert a memory roundtrip
+      // at the top of the entry block that converts each coerced value back
+      // to its original type, then route existing body uses (including
+      // in-body cir.call operands) through the recovered value.  Done before
+      // the Ignore-drop below so the entry block argument indices used here
+      // still refer to the original positions.
       insertArgCoercion(funcOp, fc, builder, dl);
 
       // Direct return with coerced type: insert a coercion at every
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp 
b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index 616e7347fff06..8c7e1406d6567 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -1667,45 +1667,6 @@ mlir::LogicalResult 
CIRToLLVMReturnOpLowering::matchAndRewrite(
   return mlir::LogicalResult::success();
 }
 
-mlir::LogicalResult CIRToLLVMReinterpretCastOpLowering::matchAndRewrite(
-    cir::ReinterpretCastOp op, OpAdaptor adaptor,
-    mlir::ConversionPatternRewriter &rewriter) const {
-  // After type conversion, source and destination LLVM types may be:
-  //   (a) Identical: trivially replace uses with the source value (the
-  //       op was a CIR-level type rename only; LLVM sees no change).
-  //   (b) Same scalar / vector category, same bit width: emit
-  //       LLVM::BitcastOp.
-  //   (c) Aggregate vs scalar / aggregate vs vector: LLVM::BitcastOp
-  //       does not allow aggregate types.  We currently emit an error
-  //       directing the producer to go through memory.  A future patch
-  //       will add an extract/insert lowering for the aggregate case so
-  //       the LLVM IR avoids the memory roundtrip too.
-  mlir::Type llvmDstTy = getTypeConverter()->convertType(op.getType());
-  mlir::Value llvmSrc = adaptor.getSrc();
-  mlir::Type llvmSrcTy = llvmSrc.getType();
-
-  if (llvmSrcTy == llvmDstTy) {
-    rewriter.replaceOp(op, llvmSrc);
-    return mlir::success();
-  }
-
-  bool srcIsAggregate =
-      mlir::isa<mlir::LLVM::LLVMStructType, mlir::LLVM::LLVMArrayType>(
-          llvmSrcTy);
-  bool dstIsAggregate =
-      mlir::isa<mlir::LLVM::LLVMStructType, mlir::LLVM::LLVMArrayType>(
-          llvmDstTy);
-  if (srcIsAggregate || dstIsAggregate)
-    return op.emitOpError()
-           << "lowering cir.reinterpret_cast to LLVM with aggregate type "
-           << "not yet implemented; producer should fall back to memory "
-           << "coercion until a follow-up patch adds extract/insert "
-           << "lowering";
-
-  rewriter.replaceOpWithNewOp<mlir::LLVM::BitcastOp>(op, llvmDstTy, llvmSrc);
-  return mlir::success();
-}
-
 mlir::LogicalResult CIRToLLVMRotateOpLowering::matchAndRewrite(
     cir::RotateOp op, OpAdaptor adaptor,
     mlir::ConversionPatternRewriter &rewriter) const {
diff --git a/clang/test/CIR/IR/reinterpret-cast.cir 
b/clang/test/CIR/IR/reinterpret-cast.cir
deleted file mode 100644
index 94742e15cda42..0000000000000
--- a/clang/test/CIR/IR/reinterpret-cast.cir
+++ /dev/null
@@ -1,28 +0,0 @@
-// RUN: cir-opt %s --verify-roundtrip | FileCheck %s
-
-!s32i = !cir.int<s, 32>
-
-module {
-  // Vector ↔ complex same-bit-width reinterpret (the canonical use case
-  // from cir-call-conv-lowering's coerce-in-registers path).
-  cir.func @vec_to_complex(%v : !cir.vector<2 x !cir.float>)
-      -> !cir.complex<!cir.float> {
-    %c = cir.reinterpret_cast %v
-       : !cir.vector<2 x !cir.float> -> !cir.complex<!cir.float>
-    cir.return %c : !cir.complex<!cir.float>
-  }
-
-  // Reverse direction.
-  cir.func @complex_to_vec(%c : !cir.complex<!cir.float>)
-      -> !cir.vector<2 x !cir.float> {
-    %v = cir.reinterpret_cast %c
-       : !cir.complex<!cir.float> -> !cir.vector<2 x !cir.float>
-    cir.return %v : !cir.vector<2 x !cir.float>
-  }
-}
-
-// CHECK:      cir.func{{.*}} @vec_to_complex
-// CHECK:        cir.reinterpret_cast %{{.*}} : !cir.vector<2 x !cir.float> -> 
!cir.complex<!cir.float>
-
-// CHECK:      cir.func{{.*}} @complex_to_vec
-// CHECK:        cir.reinterpret_cast %{{.*}} : !cir.complex<!cir.float> -> 
!cir.vector<2 x !cir.float>
diff --git a/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
index f90427bf68b4c..f090f96e867e8 100644
--- a/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
@@ -1,7 +1,3 @@
-// Direct return with coerced type going from a small record to a same-bit-
-// width integer.  Mirror of coerce-record-to-int.cir but exercising the
-// return-side coercion code path: every cir.return gets the original
-// record value coerced to the integer type before being returned.
 // RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
 // RUN:   | FileCheck %s
 
@@ -14,7 +10,7 @@
   args   = [ ]
 }
 
-#all_direct_no_args = {
+#caller_ret_i32 = {
   return = { kind = "direct" },
   args   = [ ]
 }
@@ -31,10 +27,7 @@ module attributes {
     cir.return %0 : !rec_Pair
   }
 
-  // Signature changes to !s64i return; the cir.return's record operand
-  // gets coerced via memory roundtrip before being returned.  The alloca
-  // is hoisted to the entry-block start (Andy's review comment #3 from the
-  // original PR) so it sits ahead of the const that produces the value.
+  // Record return operand coerced to !s64i through an entry-block memory slot.
   // CHECK:      cir.func{{.*}} @returns_pair() -> !s64i
   // CHECK:        %[[SLOT:.*]] = cir.alloca !rec_Pair, !cir.ptr<!rec_Pair>, 
["coerce"]
   // CHECK:        %[[VAL:.*]] = cir.const #cir.zero : !rec_Pair
@@ -43,17 +36,22 @@ module attributes {
   // CHECK:        %[[COERCED:.*]] = cir.load %[[CAST]] : !cir.ptr<!s64i>, 
!s64i
   // CHECK:        cir.return %[[COERCED]] : !s64i
 
-  cir.func @caller() -> !rec_Pair
-      attributes { test_classify = #coerce_pair_return_to_i64 } {
+  cir.func @caller() -> !s32i
+      attributes { test_classify = #caller_ret_i32 } {
     %0 = cir.call @returns_pair() : () -> !rec_Pair
-    cir.return %0 : !rec_Pair
+    %1 = cir.alloca !rec_Pair, !cir.ptr<!rec_Pair>, ["r"] {alignment = 4 : i64}
+    cir.store %0, %1 : !rec_Pair, !cir.ptr<!rec_Pair>
+    %2 = cir.get_member %1[0] {name = "first"} : !cir.ptr<!rec_Pair> -> 
!cir.ptr<!s32i>
+    %3 = cir.load %2 : !cir.ptr<!s32i>, !s32i
+    cir.return %3 : !s32i
   }
 
-  // At the call site the lowered call returns !s64i; the rewriter coerces
-  // it back to !rec_Pair for downstream uses (the caller's own return
-  // also needs the coerce-back-then-coerce-forward chain since caller's
-  // return is also Direct-with-coerce).
-  // CHECK:      cir.func{{.*}} @caller() -> !s64i
-  // CHECK:        %{{.*}} = cir.call @returns_pair() : () -> !s64i
+  // The !s64i call result is coerced back to !rec_Pair through memory.
+  // CHECK:      cir.func{{.*}} @caller() -> !s32i
+  // CHECK:        %[[RET:.*]] = cir.call @returns_pair() : () -> !s64i
+  // CHECK:        cir.store %[[RET]], %{{.*}} : !s64i, !cir.ptr<!s64i>
+  // CHECK:        %[[RCAST:.*]] = cir.cast bitcast %{{.*}} : !cir.ptr<!s64i> 
-> !cir.ptr<!rec_Pair>
+  // CHECK:        %[[REC:.*]] = cir.load %[[RCAST]] : !cir.ptr<!rec_Pair>, 
!rec_Pair
+  // CHECK:        %[[MEMBER:.*]] = cir.get_member %{{.*}}[0] {name = "first"} 
: !cir.ptr<!rec_Pair> -> !cir.ptr<!s32i>
 
 }
diff --git a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
index f31f09181710e..19d7054bc6134 100644
--- a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
@@ -1,7 +1,3 @@
-// Direct with coerced type going from a small record to a same-bit-width
-// integer.  The shapes don't match (record vs scalar) so the rewriter
-// emits a memory roundtrip: alloca in the entry block + store + ptr-cast +
-// load.
 // RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
 // RUN:   | FileCheck %s
 
@@ -14,6 +10,11 @@
   args   = [ { kind = "direct", coerced_type = !s64i } ]
 }
 
+#caller_no_args = {
+  return = { kind = "direct" },
+  args   = [ ]
+}
+
 module attributes {
   dlti.dl_spec = #dlti.dl_spec<
     #dlti.dl_entry<i32, dense<32>: vector<2xi64>>,
@@ -22,29 +23,34 @@ module attributes {
 
   cir.func @takes_pair(%arg0: !rec_Pair)
       attributes { test_classify = #coerce_pair_to_i64 } {
+    %0 = cir.alloca !rec_Pair, !cir.ptr<!rec_Pair>, ["p"] {alignment = 4 : i64}
+    cir.store %arg0, %0 : !rec_Pair, !cir.ptr<!rec_Pair>
     cir.return
   }
 
-  // Signature changes to !s64i; entry block grows an alloca + store + cast
-  // + load chain that recovers the original !rec_Pair value.  The alloca
-  // lands at the very start of the entry block so this composes correctly
-  // with cir-hoist-allocas regardless of pipeline ordering.
+  // Entry-block memory roundtrip recovers !rec_Pair from the !s64i argument.
   // CHECK:      cir.func{{.*}} @takes_pair(%[[ARG:.*]]: !s64i)
   // CHECK:        %[[SLOT:.*]] = cir.alloca !s64i, !cir.ptr<!s64i>, ["coerce"]
   // CHECK:        cir.store %[[ARG]], %[[SLOT]] : !s64i, !cir.ptr<!s64i>
   // CHECK:        %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : !cir.ptr<!s64i> 
-> !cir.ptr<!rec_Pair>
-  // CHECK:        %{{.*}} = cir.load %[[CAST]] : !cir.ptr<!rec_Pair>, 
!rec_Pair
-
-  cir.func @caller(%arg0: !rec_Pair)
-      attributes { test_classify = #coerce_pair_to_i64 } {
-    cir.call @takes_pair(%arg0) : (!rec_Pair) -> ()
+  // CHECK:        %[[REC:.*]] = cir.load %[[CAST]] : !cir.ptr<!rec_Pair>, 
!rec_Pair
+  // CHECK:        %[[P:.*]] = cir.alloca !rec_Pair, !cir.ptr<!rec_Pair>, ["p"]
+  // CHECK:        cir.store %[[REC]], %[[P]] : !rec_Pair, !cir.ptr<!rec_Pair>
+
+  cir.func @caller()
+      attributes { test_classify = #caller_no_args } {
+    %0 = cir.alloca !rec_Pair, !cir.ptr<!rec_Pair>, ["p"] {alignment = 4 : i64}
+    %1 = cir.load %0 : !cir.ptr<!rec_Pair>, !rec_Pair
+    cir.call @takes_pair(%1) : (!rec_Pair) -> ()
     cir.return
   }
 
-  // At the call site, the original !rec_Pair gets coerced to !s64i via the
-  // same memory roundtrip before being passed.  Caller's own arg coercion
-  // chain runs first (it shares the pattern), then the call.
-  // CHECK:      cir.func{{.*}} @caller(%[[ARG:.*]]: !s64i)
-  // CHECK:        cir.call @takes_pair(%{{.*}}) : (!s64i) -> ()
+  // Call site coerces the !rec_Pair operand to !s64i through memory.
+  // CHECK:      cir.func{{.*}} @caller()
+  // CHECK:        %[[VAL:.*]] = cir.load %{{.*}} : !cir.ptr<!rec_Pair>, 
!rec_Pair
+  // CHECK:        cir.store %[[VAL]], %{{.*}} : !rec_Pair, !cir.ptr<!rec_Pair>
+  // CHECK:        %[[ARGCAST:.*]] = cir.cast bitcast %{{.*}} : 
!cir.ptr<!rec_Pair> -> !cir.ptr<!s64i>
+  // CHECK:        %[[ARGINT:.*]] = cir.load %[[ARGCAST]] : !cir.ptr<!s64i>, 
!s64i
+  // CHECK:        cir.call @takes_pair(%[[ARGINT]]) : (!s64i) -> ()
 
 }
diff --git 
a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
index 1669bf1232d28..7cd190a679ce8 100644
--- 
a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
+++ 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
@@ -1,17 +1,19 @@
-// Direct with a coerced type that's a different record (record-to-record):
-// neither side is a vector and at least one is a record, so the rewriter
-// uses the memory-roundtrip path even though both types are aggregates.
 // RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
 // RUN:   | FileCheck %s
 
 !s32i = !cir.int<s, 32>
 !s64i = !cir.int<s, 64>
-!rec_Pair  = !cir.record<struct "Pair"  {!s32i, !s32i}>
-!rec_Single = !cir.record<struct "Single" {!s64i}>
+!rec_Vec4   = !cir.record<struct "Vec4"   {!s32i, !s32i, !s32i, !s32i}>
+!rec_TwoI64 = !cir.record<struct "TwoI64" {!s64i, !s64i}>
 
-#coerce_pair_to_single = {
+#coerce_vec4_to_twoi64 = {
   return = { kind = "direct" },
-  args   = [ { kind = "direct", coerced_type = !rec_Single } ]
+  args   = [ { kind = "direct", coerced_type = !rec_TwoI64 } ]
+}
+
+#caller_no_args = {
+  return = { kind = "direct" },
+  args   = [ ]
 }
 
 module attributes {
@@ -20,15 +22,36 @@ module attributes {
     #dlti.dl_entry<i64, dense<64>: vector<2xi64>>>
 } {
 
-  cir.func @takes_pair(%arg0: !rec_Pair)
-      attributes { test_classify = #coerce_pair_to_single } {
+  cir.func @takes_vec(%arg0: !rec_Vec4)
+      attributes { test_classify = #coerce_vec4_to_twoi64 } {
+    %0 = cir.alloca !rec_Vec4, !cir.ptr<!rec_Vec4>, ["v"] {alignment = 4 : i64}
+    cir.store %arg0, %0 : !rec_Vec4, !cir.ptr<!rec_Vec4>
+    cir.return
+  }
+
+  // 16-byte struct coerced to a two-eightbyte record through memory.
+  // CHECK:      cir.func{{.*}} @takes_vec(%[[ARG:.*]]: !rec_TwoI64)
+  // CHECK:        %[[SLOT:.*]] = cir.alloca !rec_TwoI64, 
!cir.ptr<!rec_TwoI64>, ["coerce"]
+  // CHECK:        cir.store %[[ARG]], %[[SLOT]] : !rec_TwoI64, 
!cir.ptr<!rec_TwoI64>
+  // CHECK:        %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : 
!cir.ptr<!rec_TwoI64> -> !cir.ptr<!rec_Vec4>
+  // CHECK:        %[[REC:.*]] = cir.load %[[CAST]] : !cir.ptr<!rec_Vec4>, 
!rec_Vec4
+  // CHECK:        %[[V:.*]] = cir.alloca !rec_Vec4, !cir.ptr<!rec_Vec4>, ["v"]
+  // CHECK:        cir.store %[[REC]], %[[V]] : !rec_Vec4, !cir.ptr<!rec_Vec4>
+
+  cir.func @caller()
+      attributes { test_classify = #caller_no_args } {
+    %0 = cir.alloca !rec_Vec4, !cir.ptr<!rec_Vec4>, ["v"] {alignment = 4 : i64}
+    %1 = cir.load %0 : !cir.ptr<!rec_Vec4>, !rec_Vec4
+    cir.call @takes_vec(%1) : (!rec_Vec4) -> ()
     cir.return
   }
 
-  // CHECK: cir.func{{.*}} @takes_pair(%[[ARG:.*]]: !rec_Single)
-  // CHECK:   %[[SLOT:.*]] = cir.alloca !rec_Single, !cir.ptr<!rec_Single>, 
["coerce"]
-  // CHECK:   cir.store %[[ARG]], %[[SLOT]] : !rec_Single, 
!cir.ptr<!rec_Single>
-  // CHECK:   %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : 
!cir.ptr<!rec_Single> -> !cir.ptr<!rec_Pair>
-  // CHECK:   %{{.*}} = cir.load %[[CAST]] : !cir.ptr<!rec_Pair>, !rec_Pair
+  // Call site coerces the !rec_Vec4 operand to !rec_TwoI64 through memory.
+  // CHECK:      cir.func{{.*}} @caller()
+  // CHECK:        %[[VAL:.*]] = cir.load %{{.*}} : !cir.ptr<!rec_Vec4>, 
!rec_Vec4
+  // CHECK:        cir.store %[[VAL]], %{{.*}} : !rec_Vec4, !cir.ptr<!rec_Vec4>
+  // CHECK:        %[[ARGCAST:.*]] = cir.cast bitcast %{{.*}} : 
!cir.ptr<!rec_Vec4> -> !cir.ptr<!rec_TwoI64>
+  // CHECK:        %[[ARGREC:.*]] = cir.load %[[ARGCAST]] : 
!cir.ptr<!rec_TwoI64>, !rec_TwoI64
+  // CHECK:        cir.call @takes_vec(%[[ARGREC]]) : (!rec_TwoI64) -> ()
 
 }
diff --git 
a/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
 
b/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
deleted file mode 100644
index ceb1f9e364466..0000000000000
--- 
a/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex-reinterpret.cir
+++ /dev/null
@@ -1,42 +0,0 @@
-// Direct with coerced type that differs from the original only in
-// vector-vs-non-vector shape (same total bit width, neither side a record):
-// the rewriter emits cir.reinterpret_cast instead of going through memory.
-// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
-// RUN:   | FileCheck %s
-
-#coerce_complex_to_vec2 = {
-  return = { kind = "direct" },
-  args   = [ { kind = "direct",
-               coerced_type = !cir.vector<2 x !cir.float> } ]
-}
-
-module attributes {
-  dlti.dl_spec = #dlti.dl_spec<
-    #dlti.dl_entry<f32, dense<32>: vector<2xi64>>>
-} {
-
-  cir.func @takes_complex(%arg0: !cir.complex<!cir.float>)
-      attributes { test_classify = #coerce_complex_to_vec2 } {
-    cir.return
-  }
-
-  // The signature changes to the coerced (vector) type; the body still
-  // expects the complex, so a reinterpret_cast lands at function entry to
-  // adapt the new block argument back to the original type.
-  // CHECK: cir.func{{.*}} @takes_complex(%[[ARG:.*]]: !cir.vector<2 x 
!cir.float>)
-  // CHECK:   %{{.*}} = cir.reinterpret_cast %[[ARG]] : !cir.vector<2 x 
!cir.float> -> !cir.complex<!cir.float>
-
-  cir.func @caller(%arg0: !cir.complex<!cir.float>)
-      attributes { test_classify = #coerce_complex_to_vec2 } {
-    cir.call @takes_complex(%arg0) : (!cir.complex<!cir.float>) -> ()
-    cir.return
-  }
-
-  // At the call site the rewriter coerces the original (complex) value to
-  // the vector type before passing it through.
-  // CHECK: cir.func{{.*}} @caller(%[[ARG:.*]]: !cir.vector<2 x !cir.float>)
-  // CHECK:   %[[COMPLEX:.*]] = cir.reinterpret_cast %[[ARG]] : !cir.vector<2 
x !cir.float> -> !cir.complex<!cir.float>
-  // CHECK:   %[[COERCED:.*]] = cir.reinterpret_cast %[[COMPLEX]] : 
!cir.complex<!cir.float> -> !cir.vector<2 x !cir.float>
-  // CHECK:   cir.call @takes_complex(%[[COERCED]]) : (!cir.vector<2 x 
!cir.float>) -> ()
-
-}
diff --git 
a/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex.cir
new file mode 100644
index 0000000000000..f1ccceb8541c8
--- /dev/null
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-vector-to-complex.cir
@@ -0,0 +1,42 @@
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+#coerce_complex_to_vec2 = {
+  return = { kind = "direct" },
+  args   = [ { kind = "direct",
+               coerced_type = !cir.vector<2 x !cir.float> } ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<f32, dense<32>: vector<2xi64>>>
+} {
+
+  cir.func @takes_complex(%arg0: !cir.complex<!cir.float>)
+      attributes { test_classify = #coerce_complex_to_vec2 } {
+    cir.return
+  }
+
+  // Entry-block memory roundtrip recovers the complex from the vector arg.
+  // CHECK: cir.func{{.*}} @takes_complex(%[[ARG:.*]]: !cir.vector<2 x 
!cir.float>)
+  // CHECK:   %[[SLOT:.*]] = cir.alloca !cir.vector<2 x !cir.float>, 
!cir.ptr<!cir.vector<2 x !cir.float>>, ["coerce"]
+  // CHECK:   cir.store %[[ARG]], %[[SLOT]] : !cir.vector<2 x !cir.float>, 
!cir.ptr<!cir.vector<2 x !cir.float>>
+  // CHECK:   %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : 
!cir.ptr<!cir.vector<2 x !cir.float>> -> !cir.ptr<!cir.complex<!cir.float>>
+  // CHECK:   %{{.*}} = cir.load %[[CAST]] : 
!cir.ptr<!cir.complex<!cir.float>>, !cir.complex<!cir.float>
+
+  cir.func @caller(%arg0: !cir.complex<!cir.float>)
+      attributes { test_classify = #coerce_complex_to_vec2 } {
+    cir.call @takes_complex(%arg0) : (!cir.complex<!cir.float>) -> ()
+    cir.return
+  }
+
+  // Call site coerces the complex value to the vector type through memory.
+  // CHECK: cir.func{{.*}} @caller(%[[ARG:.*]]: !cir.vector<2 x !cir.float>)
+  // CHECK:   %[[COMPLEX:.*]] = cir.load %{{.*}} : 
!cir.ptr<!cir.complex<!cir.float>>, !cir.complex<!cir.float>
+  // CHECK:   %[[CSLOT:.*]] = cir.alloca !cir.complex<!cir.float>, 
!cir.ptr<!cir.complex<!cir.float>>, ["coerce"]
+  // CHECK:   cir.store %[[COMPLEX]], %[[CSLOT]] : !cir.complex<!cir.float>, 
!cir.ptr<!cir.complex<!cir.float>>
+  // CHECK:   %[[CCAST:.*]] = cir.cast bitcast %[[CSLOT]] : 
!cir.ptr<!cir.complex<!cir.float>> -> !cir.ptr<!cir.vector<2 x !cir.float>>
+  // CHECK:   %[[COERCED:.*]] = cir.load %[[CCAST]] : !cir.ptr<!cir.vector<2 x 
!cir.float>>, !cir.vector<2 x !cir.float>
+  // CHECK:   cir.call @takes_complex(%[[COERCED]]) : (!cir.vector<2 x 
!cir.float>) -> ()
+
+}

>From e8f0c328fe3730a6d9db495c16a120baf5a95d74 Mon Sep 17 00:00:00 2001
From: Adam Smith <[email protected]>
Date: Tue, 2 Jun 2026 14:37:24 -0700
Subject: [PATCH 3/3] [CIR] Size the ABI coercion slot to the larger of the two
 types

emitCoercion allocated the temporary slot as the source type and loaded
the destination type back out of it.  When the coerced ABI type is
larger than the source -- e.g. a 12-byte aggregate returned as
{i64, i64} -- the load reads past the allocation.

Size the slot to the larger of the two types and access it through a
source-typed view for the store and a destination-typed view for the
load, so neither access runs past it in either direction.  The
same-size case is unchanged.

Also replace the dead srcTy == dstTy early return with an assert (every
caller already skips the no-op case) and drop the classic-CodeGen
reference from the doc comment.
---
 .../TargetLowering/CIRABIRewriteContext.cpp   | 55 +++++++++-------
 .../abi-lowering/coerce-int-to-record.cir     |  2 +-
 .../coerce-record-return-larger.cir           | 63 +++++++++++++++++++
 .../abi-lowering/coerce-record-to-int.cir     |  2 +-
 .../coerce-record-to-record-via-memory.cir    |  4 +-
 5 files changed, 101 insertions(+), 25 deletions(-)
 create mode 100644 
clang/test/CIR/Transforms/abi-lowering/coerce-record-return-larger.cir

diff --git 
a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp 
b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
index 517994dea9d4a..a559da51d4958 100644
--- a/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/TargetLowering/CIRABIRewriteContext.cpp
@@ -172,15 +172,18 @@ ArrayAttr updateResAttrs(MLIRContext *ctx, ArrayAttr 
existingResAttrs,
   return ArrayAttr::get(ctx, {DictionaryAttr::get(ctx, attrs)});
 }
 
-/// Coerce \p src to type \p dstTy at the current builder insertion point.
+/// Coerce \p src to type \p dstTy at the current builder insertion point by
+/// going through memory: allocate a slot, store the source, then load the
+/// destination type back out.  Lowers uniformly for scalar, vector, and
+/// record types.
 ///
-/// If src and dst are the same type, returns src unchanged and leaves
-/// \p createdOps empty.  Otherwise coerces through memory: allocate a slot
-/// of the source type (using max(srcAlign, dstAlign) for the alloca
-/// alignment), store the source, bitcast the pointer to the destination
-/// type, and load the destination type back.  This mirrors classic
-/// CodeGen's coerce-through-memory behavior for ABI argument and return
-/// coercion and lowers uniformly for scalar, vector, and record types.
+/// The slot is sized to the larger of the two types so that neither the
+/// store nor the load ever runs past it: the coerced ABI type can be larger
+/// than the original (e.g. a 12-byte aggregate returned as `{i64, i64}`), so
+/// loading the destination out of a source-sized slot would over-read.
+/// Alignment is max(srcAlign, dstAlign) to satisfy both accesses.  The slot
+/// is accessed through a source-typed view for the store and a
+/// destination-typed view for the load.
 ///
 /// The temporary alloca is placed at the start of the enclosing function's
 /// entry block so that it composes correctly with the HoistAllocas pass
@@ -193,17 +196,15 @@ Value emitCoercion(OpBuilder &rewriter, Location loc, 
Type dstTy, Value src,
                    FunctionOpInterface funcOp, const DataLayout &dl,
                    SmallPtrSetImpl<Operation *> &createdOps) {
   Type srcTy = src.getType();
-  if (srcTy == dstTy)
-    return src;
+  assert(srcTy != dstTy &&
+         "emitCoercion callers must pre-check that the types differ");
 
-  // Coerce through memory: alloca + store + ptr-cast + load.  The alloca
-  // goes at the start of the entry block so it composes with the
-  // HoistAllocas pass, with alignment = max(srcAlign, dstAlign) to satisfy
-  // both the store and the load.
   uint64_t srcAlign = dl.getTypeABIAlignment(srcTy);
   uint64_t dstAlign = dl.getTypeABIAlignment(dstTy);
   uint64_t allocaAlign = std::max(srcAlign, dstAlign);
+  Type slotTy = dl.getTypeSize(srcTy) >= dl.getTypeSize(dstTy) ? srcTy : dstTy;
 
+  auto slotPtrTy = cir::PointerType::get(slotTy);
   auto srcPtrTy = cir::PointerType::get(srcTy);
   auto dstPtrTy = cir::PointerType::get(dstTy);
 
@@ -212,20 +213,32 @@ Value emitCoercion(OpBuilder &rewriter, Location loc, 
Type dstTy, Value src,
     OpBuilder::InsertionGuard guard(rewriter);
     Block &entry = funcOp->getRegion(0).front();
     rewriter.setInsertionPointToStart(&entry);
-    alloca = cir::AllocaOp::create(rewriter, loc, srcPtrTy, srcTy,
+    alloca = cir::AllocaOp::create(rewriter, loc, slotPtrTy, slotTy,
                                    rewriter.getStringAttr("coerce"),
                                    rewriter.getI64IntegerAttr(allocaAlign));
   }
   createdOps.insert(alloca);
 
-  auto store = cir::StoreOp::create(rewriter, loc, src, alloca);
+  // Store through a source-typed view of the slot.
+  Value srcSlot = alloca;
+  if (slotTy != srcTy) {
+    auto srcCast = cir::CastOp::create(rewriter, loc, srcPtrTy,
+                                       cir::CastKind::bitcast, alloca);
+    createdOps.insert(srcCast);
+    srcSlot = srcCast;
+  }
+  auto store = cir::StoreOp::create(rewriter, loc, src, srcSlot);
   createdOps.insert(store);
 
-  auto ptrCast = cir::CastOp::create(rewriter, loc, dstPtrTy,
-                                     cir::CastKind::bitcast, alloca);
-  createdOps.insert(ptrCast);
-
-  auto load = cir::LoadOp::create(rewriter, loc, ptrCast.getResult());
+  // Load through a destination-typed view of the slot.
+  Value dstSlot = alloca;
+  if (slotTy != dstTy) {
+    auto dstCast = cir::CastOp::create(rewriter, loc, dstPtrTy,
+                                       cir::CastKind::bitcast, alloca);
+    createdOps.insert(dstCast);
+    dstSlot = dstCast;
+  }
+  auto load = cir::LoadOp::create(rewriter, loc, dstSlot);
   createdOps.insert(load);
   return load;
 }
diff --git a/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
index f090f96e867e8..94e6beb89c4e4 100644
--- a/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-int-to-record.cir
@@ -3,7 +3,7 @@
 
 !s32i = !cir.int<s, 32>
 !s64i = !cir.int<s, 64>
-!rec_Pair = !cir.record<struct "Pair" {!s32i, !s32i}>
+!rec_Pair = !cir.struct<"Pair" {!s32i, !s32i}>
 
 #coerce_pair_return_to_i64 = {
   return = { kind = "direct", coerced_type = !s64i },
diff --git 
a/clang/test/CIR/Transforms/abi-lowering/coerce-record-return-larger.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-return-larger.cir
new file mode 100644
index 0000000000000..e5a3986c0c4e1
--- /dev/null
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-record-return-larger.cir
@@ -0,0 +1,63 @@
+// RUN: cir-opt %s -cir-call-conv-lowering="classification-attr=test_classify" 
\
+// RUN:   | FileCheck %s
+
+!s32i = !cir.int<s, 32>
+!s64i = !cir.int<s, 64>
+!rec_Vec3   = !cir.struct<"Vec3"   {!s32i, !s32i, !s32i}>
+!rec_TwoI64 = !cir.struct<"TwoI64" {!s64i, !s64i}>
+
+#coerce_vec3_return_to_twoi64 = {
+  return = { kind = "direct", coerced_type = !rec_TwoI64 },
+  args   = [ ]
+}
+
+#caller_ret_i32 = {
+  return = { kind = "direct" },
+  args   = [ ]
+}
+
+module attributes {
+  dlti.dl_spec = #dlti.dl_spec<
+    #dlti.dl_entry<i32, dense<32>: vector<2xi64>>,
+    #dlti.dl_entry<i64, dense<64>: vector<2xi64>>>
+} {
+
+  cir.func @returns_vec3() -> !rec_Vec3
+      attributes { test_classify = #coerce_vec3_return_to_twoi64 } {
+    %0 = cir.const #cir.zero : !rec_Vec3
+    cir.return %0 : !rec_Vec3
+  }
+
+  // The coerced return type (16-byte !rec_TwoI64) is larger than the source
+  // (12-byte !rec_Vec3), so the coercion slot is sized to the larger type
+  // and the store goes through a source-typed view of it; loading !rec_TwoI64
+  // out of a !rec_Vec3-sized slot would read past the allocation.
+  // CHECK:      cir.func{{.*}} @returns_vec3() -> !rec_TwoI64
+  // CHECK:        %[[SLOT:.*]] = cir.alloca !rec_TwoI64, 
!cir.ptr<!rec_TwoI64>, ["coerce"]
+  // CHECK:        %[[VAL:.*]] = cir.const #cir.zero : !rec_Vec3
+  // CHECK:        %[[CAST:.*]] = cir.cast bitcast %[[SLOT]] : 
!cir.ptr<!rec_TwoI64> -> !cir.ptr<!rec_Vec3>
+  // CHECK:        cir.store %[[VAL]], %[[CAST]] : !rec_Vec3, 
!cir.ptr<!rec_Vec3>
+  // CHECK:        %[[COERCED:.*]] = cir.load %[[SLOT]] : 
!cir.ptr<!rec_TwoI64>, !rec_TwoI64
+  // CHECK:        cir.return %[[COERCED]] : !rec_TwoI64
+
+  cir.func @caller() -> !s32i
+      attributes { test_classify = #caller_ret_i32 } {
+    %0 = cir.call @returns_vec3() : () -> !rec_Vec3
+    %1 = cir.alloca !rec_Vec3, !cir.ptr<!rec_Vec3>, ["r"] {alignment = 4 : i64}
+    cir.store %0, %1 : !rec_Vec3, !cir.ptr<!rec_Vec3>
+    %2 = cir.get_member %1[0] {name = "first"} : !cir.ptr<!rec_Vec3> -> 
!cir.ptr<!s32i>
+    %3 = cir.load %2 : !cir.ptr<!s32i>, !s32i
+    cir.return %3 : !s32i
+  }
+
+  // The larger !rec_TwoI64 call result is coerced back to !rec_Vec3; here the
+  // source is the larger type so the slot is source-sized and the load reads
+  // the smaller destination back out.
+  // CHECK:      cir.func{{.*}} @caller() -> !s32i
+  // CHECK:        %[[RET:.*]] = cir.call @returns_vec3() : () -> !rec_TwoI64
+  // CHECK:        cir.store %[[RET]], %{{.*}} : !rec_TwoI64, 
!cir.ptr<!rec_TwoI64>
+  // CHECK:        %[[RCAST:.*]] = cir.cast bitcast %{{.*}} : 
!cir.ptr<!rec_TwoI64> -> !cir.ptr<!rec_Vec3>
+  // CHECK:        %[[REC:.*]] = cir.load %[[RCAST]] : !cir.ptr<!rec_Vec3>, 
!rec_Vec3
+  // CHECK:        %[[MEMBER:.*]] = cir.get_member %{{.*}}[0] {name = "first"} 
: !cir.ptr<!rec_Vec3> -> !cir.ptr<!s32i>
+
+}
diff --git a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
index 19d7054bc6134..fd9cacde30247 100644
--- a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
+++ b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-int.cir
@@ -3,7 +3,7 @@
 
 !s32i = !cir.int<s, 32>
 !s64i = !cir.int<s, 64>
-!rec_Pair = !cir.record<struct "Pair" {!s32i, !s32i}>
+!rec_Pair = !cir.struct<"Pair" {!s32i, !s32i}>
 
 #coerce_pair_to_i64 = {
   return = { kind = "direct" },
diff --git 
a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
index 7cd190a679ce8..62ea2378623fd 100644
--- 
a/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
+++ 
b/clang/test/CIR/Transforms/abi-lowering/coerce-record-to-record-via-memory.cir
@@ -3,8 +3,8 @@
 
 !s32i = !cir.int<s, 32>
 !s64i = !cir.int<s, 64>
-!rec_Vec4   = !cir.record<struct "Vec4"   {!s32i, !s32i, !s32i, !s32i}>
-!rec_TwoI64 = !cir.record<struct "TwoI64" {!s64i, !s64i}>
+!rec_Vec4   = !cir.struct<"Vec4"   {!s32i, !s32i, !s32i, !s32i}>
+!rec_TwoI64 = !cir.struct<"TwoI64" {!s64i, !s64i}>
 
 #coerce_vec4_to_twoi64 = {
   return = { kind = "direct" },

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CIR] Coerce Direct args and returns in CallConvLowering (PR #195879)

Reply via email to